summary.pam {pamfe}R Documentation

Summary for a PAM fit

Description

Takes a fitted pam object produced by pam() and produces various useful summaries from it. The code is based on the based on summary.gam function from package mgcv.

Usage

## S3 method for class 'pam'
summary(object, freq = FALSE, ...)

Arguments

object

A fitted pam object as produced by pam().

freq

By default p-values for parametric terms are calculated using the Bayesian estimated covariance matrix of the parameter estimators. If this is set to TRUE then the frequentist covariance matrix of the parameters is used instead.

...

Other arguments.

Details

Model degrees of freedom are taken as the trace of the influence (or hat) matrix A for the model fit. Residual degrees of freedom are taken as number of data minus model degrees of freedom. Let P_i be the matrix giving the parameters of the ith smooth when applied to the data (or pseudodata in the generalized case) and let X be the design matrix of the model. Then tr(XP_i) is the edf for the ith term. Clearly this definition causes the edf's to add up properly! An alternative version of EDF is more appropriate for p-value computation, and is based on the trace of 2A - AA.

P-values for smooth terms are usually based on a test statistic motivated by an extension of Nychka's (1988) analysis of the frequentist properties of Bayesian confidence intervals for smooths (Marra and Wood, 2012). These have better frequentist performance (in terms of power and distribution under the null) than the alternative strictly frequentist approximation. When the Bayesian intervals have good across the function properties then the p-values have close to the correct null distribution and reasonable power (but there are no optimality results for the power). Full details are in Wood (2013), although what is computed is actually a slight variant in which the components of the test statistic are weighted by the iterative fitting weights.

All p-values are computed without considering uncertainty in the smoothing parameter estimates.

In simulations the p-values have best behaviour under ML smoothness selection, with REML coming second. In general the p-values behave well, but neglecting smoothing parameter uncertainty means that they may be somewhat too low when smoothing parameters are highly uncertain. High uncertainty happens in particular when smoothing parameters are poorly identified, which can occur with nested smooths or highly correlated covariates (high concurvity).

By default the p-values for parametric model terms are also based on Wald tests using the Bayesian covariance matrix for the coefficients.

Value

summary.pam produces a list of summary information for a fitted pam object.

p.coeff

Is an array of estimates of the strictly parametric model coefficients.

p.t

Is an array of the p.coeff's divided by their standard errors.

p.pv

Is an array of p-values for the null hypothesis that the corresponding parameter is zero. Calculated with reference to the standard normal distribution.

m

The number of smooth terms in the model.

chi.sq

An array of test statistics for assessing the significance of model smooth terms. See details.

s.pv

An array of approximate p-values for the null hypotheses that each smooth term is zero. Be warned, these are only approximate.

se

Array of standard error estimates for all parameter estimates.

r.sq

The adjusted r-squared for the model. Defined as the proportion of variance explained, where original variance and residual variance are both estimated using unbiased estimators. This quantity can be negative if your model is worse than a one parameter constant model, and can be higher for the smaller of two nested models!

edf

Array of estimated degrees of freedom for the model terms.

residual.df

Estimated residual degrees of freedom.

n

Number of data used for fitting process (after applying the first-difference transformation.

np

Number of model coefficients (regression coefficients, not smoothing parameters or other parameters of likelihood).

method

The smoothing selection criterion used.

sp.criterion

The minimized value of the smoothness selection criterion. What is reported is the negative log marginal likelihood or negative log restricted likelihood depending on the estimation method.

scale

Estimated (or given) scale parameter.

family

The family (always gaussian) and link function (always identity link) used.

formula

The original PAM formula.

pTerms.df

The degrees of freedom associated with each parametric term .

pTerms.chi.sq

A Wald statistic for testing the null hypothesis that each parametric term is zero.

pTerms.pv

P-values associated with the tests that each term is zero. The reference distribution is an appropriate chi-squared when the scale parameter is known, and is based on an F when it is not.

cov.unscaled

The estimated covariance matrix of the parameters (or estimators if freq=TRUE), divided by scale parameter.

cov.scaled

The estimated covariance matrix of the parameters (estimators if freq=TRUE).

p.table

Significance table for parameters.

s.table

Significance table for smooths.

p.Terms

Significance table for parametric model terms.

Author(s)

Peter Puetz ppuetz@uni-goettingen.de

References

Marra, G and S.N. Wood (2012). Coverage Properties of Confidence Intervals for Generalized Additive Model Components. Scandinavian Journal of Statistics, 39(1), 53-74.

Nychka (1988). Bayesian Confidence Intervals for Smoothing Splines. Journal of the American Statistical Association 83:1134-1143.

Wood, S.N. (2013). On p-values for smooth components of an extended generalized additive model. Biometrika 100:221-228.

See Also

pam

Examples

# data generation: additive model with time constant indivdual fixed effects
id <- rep(1:50,each = 10)
years <- rep(1:10,50)
x1 <- runif(500)
x2 <- runif(500)
f1 <- sin(2 * pi * (x1 - 0.5)) ^ 2
f2 <- x2 * (1 - x2)
f1_s <- f1 / sd(f1)
f2_s <- f2 / sd(f2)
fe <- rep(sample(1:100,50),each = 10)
y <- fe + f1_s + f2_s + rnorm(500,sd = 0.5)
data <- as.data.frame(cbind(id,years,y,x1,x2))

# transform data set to panel data set from type "pdata.frame" from package "plm"
pdata <- pdata.frame (data, index = c("id", "years"),
                      row.names = TRUE)

# run first-difference penalized spline panel data model with generous amount of knots
mod <- pam(y ~ sfe(x1,k = 40) + sfe(x2,k = 40),data = pdata)
summary(mod)

[Package pamfe version 0.2 Index]