pam {pamfe}R Documentation

Fit an additive model for panel data with fixed effects

Description

Fits additive panel data models with fixed effects based on the gam function from package mgcv and the plm function from package plm. Nonparametric model components are represented by penalized B-splines with smoothing parameters selected by ML or REML. For more details see gam from package mgcv.

Usage

pam(formula, data = list(), weights = NULL, method = "REML", knots = NULL, 
optimizer = c("outer", "newton"), control = list(), sp = NULL, gls = TRUE,
 corMatrix = list(), ...)

Arguments

formula

A pam formula which is similar to the formula for a lm except that nonparametric terms via sfe can be added to the right hand side. Note that an intercept is never provided.

data

A data frame of class pdata.frame.

weights

An optional vector of weights to be used in the fitting process. Should be NULL or a numeric vector. If non-NULL, the overall magnitude of the log likelihood is not changed, i.e. the weights are normalized (weights <- weights/mean(weights)).

method

The smoothing parameter estimation method. "REML" for REML estimation, including of unknown scale, "P-REML" for REML estimation, but using a Pearson estimate of the scale. "ML" and "P-ML" are similar, but using maximum likelihood in place of REML. "REML" is the default.

knots

This is an optional list containing user specified knot values to be used for basis construction. The user simply supplies the knots to be used, which must match up with the k value supplied (note that the number of knots is not always just k).

optimizer

An array specifying the numerical optimization method to use to optimize the smoothing parameter estimation criterion (given by method). "perf" for performance iteration. "outer" for the more stable direct approach. "outer" can use several alternative optimizers, specified in the second element of optimizer: "newton" (default), "bfgs", "optim", "nlm" and "nlm.fd" (the latter is based entirely on finite differenced derivatives and is very slow).

control

A list of fit control parameters to replace defaults returned by gam.control. Values not set assume default values.

sp

A vector of smoothing parameters can be provided here. Smoothing parameters must be supplied in the order that the smooth terms appear in the model formula. Negative elements indicate that the parameter should be estimated, and hence a mixture of fixed and estimated parameters is possible. If smooths share smoothing parameters then length(sp) must correspond to the number of underlying smoothing parameters.

gls

If this argument is TRUE (the default value), then serial error correlation inherent to the first-difference transformation to remove fixed effects is accounted for via a generalized least squares approach.

corMatrix

This is an optional list containing matrices describing the within-individual variance and correlation structure for the errors of each individual. Such matrices can easily be generated with the help of corMatrix.corStruct. The matrices are then used via a generalized least squares approach to account for the respective error structure. For detecting specific error structures from residual checking, see pam.acf.

...

Further arguments for passing on e.g. to gam.fit from package mgcv which is used for the fitting process.

Details

An additive panel data models with fixed effects is a model which is capable to include individual-specific time constant effects, nonparametric effects and strictly parametric effects jointly. The fixed effects are removed by building first differences over time. The resulting dependence structure can be accounted for via a generalized least squares approach. Nonparametric effects are represented by penalized B-splines. The tradeoff between penalizing wiggliness and penalizing badness of fit is steered by associated smoothing parameters which are estimated by (restricted) maximum likelihood. For further information, see Puetz and Kneib (2016).

Note that gam from package mgcv is more comprehensive (e.g. it allows for generalized additive models) and offers more options to specify. The major difference is that the mgcv package is designed for cross-sectional data and panel data models with random effects.

Details of the default underlying fitting methods are given in Wood (2011 and 2004). A concise introduction to generalized additive models and their implementation in R is given by Wood (2006).

Value

An object of class pam, similar to a gam object from package mgcv. A pam object has has the following elements:

aic

AIC of the fitted model: bear in mind that the degrees of freedom used to calculate this are the effective degrees of freedom of the model, and the likelihood is evaluated at the maximum of the penalized likelihood in most cases, not at the MLE.

assign

Array whose elements indicate which model term (listed in pterms) each parameter relates to: applies only to non-smooth terms.

boundary

Did parameters end up at boundary of parameter space?

coefficients

The coefficients of the fitted model. Parametric coefficients are first, followed by coefficients for each spline term in turn.

control

The gam control list used in the fit.

converged

Indicates whether or not the iterative fitting method converged.

db.drho

Matrix of first derivatives of model coefficients w.r.t. log smoothing parameters.

df.null

Null degrees of freedom.

df.residual

Effective residual degrees of freedom of the model.

edf

Estimated degrees of freedom for each model parameter. Penalization means that many of these are less than 1.

edf1

Similar, but using alternative estimate of EDF. Useful for testing.

edf2

This edf accounts for smoothing parameter uncertainty. edf1 is a heuristic upper bound for edf2.

family

Family object specifying distribution (always gaussian) and link (always identity link) used.

fitted.values

The fitted values for the model. Note that the model is fitted on data transformed by first differences.

formula

The model formula.

gcv.ubre

The minimized smoothing parameter selection score: negative log marginal likelihood or negative log restricted likelihood.

gls

TRUE if serial error correlation inherent to the first-difference transformation to remove fixed effects was accounted for via a generalized least squares approach.

hat

Array of elements from the leading diagonal of the ‘hat’ (or ‘influence’) matrix. Same length as response data vector.

index_data

The individual dimension and the time dimension of the original panel data set.

index_diffdata

The individual dimension (the ids) of the first-differenced data set.

iter

How many iterations were required to find the smoothing parameters?

method

One of "REML", "P-REML", "ML", "P-ML", depending on the fitting criterion used.

model

Model frame containing all variables needed in original model fit.

n

Number of observation used for the fittind process, i.e. after the first-difference transformation.

nsdf

Number of parametric, non-smooth, model terms.

optimizer

optimizer argument to pam.

outer.info

If ‘outer’ iteration has been used to fit the model (see pam argument optimizer) then this is present and contains whatever was returned by the optimization routine used (currently nlm or optim).

prior.weights

Prior weights on observations.

pterms

terms object for strictly parametric part of model.

R

Factor R from QR decomposition of weighted model matrix, unpivoted to be in same column order as model matrix (so need not be upper triangular).

rank

Apparent rank of fitted model.

reml.scale

The scale (RE)ML scale parameter estimate.

residuals

The residuals for the fitted model. Note that the model is fitted on data transformed by first differences.

rV

If present, rV%*%t(rV)*sig2 gives the estimated Bayesian covariance matrix.

scale

When present, the scale (as sig2).

scale.estimated

TRUE if the scale parameter was estimated, FALSE otherwise.

sig2

Estimated or supplied variance/scale parameter.

smooth

List of smooth objects, containing the basis information for each term in the model formula in the order in which they appear.

sp

Estimated smoothing parameters for the model. These are the underlying smoothing parameters, subject to optimization.

terms

terms object of model model frame.

Vc

Under ML or REML smoothing parameter estimation it is possible to correct the covariance matrix Vp for smoothing parameter uncertainty. This is the corrected version.

Ve

Frequentist estimated covariance matrix for the parameter estimators. Particularly useful for testing whether terms are zero. Not so useful for CI's as smooths are usually biased.

Vp

Estimated covariance matrix for the parameters. This is a Bayesian posterior covariance matrix that results from adopting a particular Bayesian model of the smoothing process.

weights

Final weights used in IRLS iteration.

y

Response data used in the fitting process, i.e. after the first-difference transformation.

Author(s)

Peter Puetz ppuetz@uni-goettingen.de

References

Puetz, P., Kneib, T. (2016). A Penalized Spline Estimator For Fixed Effects Panel Data Models. https://www.uni-goettingen.de/de/Puetz_03_2016/534166.html

Wood, S.N. (2011). Fast stable restricted maximum likelihood and marginal likelihood estimation of semiparametric generalized linear models. Journal of the Royal Statistical Society (B) 73(1):3-36

. Wood, S.N. (2004). Stable and efficient multiple smoothing parameter estimation for generalized additive models. J. Amer. Statist. Ass. 99:673-686.

Wood S.N. (2006). Generalized Additive Models: An Introduction with R. Chapman and Hall/CRC.

See Also

sfe, summary.pam

Examples

# data generation: additive model with time constant indivdual fixed effects
library(pamfe)
id <- rep(1:50,each = 10)
years <- rep(1:10,50)
x1 <- runif(500)
x2 <- runif(500)
f1 <- sin(2 * pi * (x1 - 0.5)) ^ 2
f2 <- x2 * (1 - x2)
f1_s <- f1 / sd(f1)
f2_s <- f2 / sd(f2)
fe <- rep(sample(1:100,50),each = 10)
y <- fe + f1_s + f2_s + rnorm(500,sd = 0.5)
data <- as.data.frame(cbind(id,years,y,x1,x2))

# transform data set to panel data set from type "pdata.frame" from package "plm"
pdata <- pdata.frame (data, index = c("id", "years"),
                      row.names = TRUE)

# run first-difference penalized spline panel data model with generous amount of knots
mod <- pam(y ~ sfe(x1,k = 40) + sfe(x2,k = 40),data = pdata)
summary(mod)

[Package pamfe version 0.2 Index]