stepAIC                 package:MASS                 R Documentation

_C_h_o_o_s_e _a _m_o_d_e_l _b_y _A_I_C _i_n _a _S_t_e_p_w_i_s_e _A_l_g_o_r_i_t_h_m

_D_e_s_c_r_i_p_t_i_o_n:

     Performs stepwise model selection by exact AIC.

_U_s_a_g_e:

     stepAIC(object, scope, scale, direction=c("both", "backward", "forward"), 
             trace=1, keep=NULL, steps=1000, use.start=FALSE, k=2, ...)
     extractAIC(fit, scale, k=2, ...)

_A_r_g_u_m_e_n_t_s:

object fit: an object representing a model of an appropriate class.
          This is used as the initial model in the stepwise search. 

   scope: defines the range of models examined in the stepwise search.  

   scale: used in the definition of the AIC statistic for selecting the
          models, currently only for `lm', `aov' and `glm' models. 

direction: the mode of stepwise search, can be one of `"both"',
          `"backward"', or `"forward"', with a default of `"both"'.  If
          the `scope' argument is missing, the default for `direction'
          is `"backward"'. 

   trace: if positive, information is printed during the running of
          `stepAIC()'. Larger values may give more information on the
          fitting process. 

    keep: a filter function whose input is a fitted model object and
          the  associated `AIC' statistic, and whose output is
          arbitrary.  Typically `keep' will select a subset of the
          components of  the object and return them. The default is not
          to keep anything. 

   steps: the maximum number of steps to be considered.  The default is
          1000 (essentially as many as required).  It is typically used
          to stop the process early. 

use.start: if true the updated fits are done starting at the linear
          predictor for the currently selected model. This may speed up
          the iterative calculations for `glm' (and other fits), but it
          can also slow them down. 

       k: the multiple of the number of degrees of freedom used for the
          penalty. Only `k=2' gives the genuine AIC: `k = log(n)' is
          sometimes referred to as BIC or SBC. 

     ...: any additional arguments to `extractAIC'. (None are currently
          used.) 

_D_e_t_a_i_l_s:

     `stepAIC' differs from `step' and especially `step.glm' in using
     the exact AIC rather than potentially misleading one-step
     approximations. It is also much more widely applicable: all that
     is required is a method for `extractAIC', which should return a
     vector  `c(modeldf, AIC)'. The default method handles linear
     models (`lm', `aov' and `glm' of family `"Gaussian"' with identity
     link) using `addterm.lm' and `dropterm.lm': for these the results
     are similar to `step.glm' except that the AIC quoted is Akaike's
     not Hastie's. (The additive constant is chosen so that in that
     case AIC is identical to Mallows' Cp if the scale is known.)

     There is a potential problem in using `glm' fits with a variable
     `scale', as in that case the deviance is not simply related to the
     maximized log-likelihood. The function `extractAIC.glm' makes the
     appropriate adjustment for a `gaussian' family, but may need to be
     amended for other cases. (The `binomial' and `poisson' families
     have fixed `scale' by default and do not correspond to a
     particular maximum-likelihood problem for variable `scale'.)

     Where a conventional deviance exists (e.g. for `lm', `aov' and
     `glm' fits) this is quoted in the analysis of variance table: it
     is the unscaled deviance.

_V_a_l_u_e:

     the stepwise-selected model is returned, with up to two additional
     components.  There is an `"anova"' component corresponding to the
     steps taken in the search, as well as a `"keep"' component if the
     `keep=' argument was supplied in the call. The `"Resid. Dev"'
     column of the analysis of deviance table refers to a constant
     minus twice the maximized log likelihood: it will be a deviance
     only in cases where a saturated model is well-defined (thus
     excluding `lm', `aov' and `survreg' fits, for example).

_S_e_e _A_l_s_o:

     `addterm', `dropterm', `step'

_E_x_a_m_p_l_e_s:

     data(quine)
     quine.hi <- aov(log(Days + 2.5) ~ .^4, quine)
     quine.nxt <- update(quine.hi, . ~ . - Eth:Sex:Age:Lrn)
     quine.stp <- stepAIC(quine.nxt, 
         scope = list(upper = ~Eth*Sex*Age*Lrn, lower = ~1), 
         trace = FALSE)
     quine.stp$anova

     data(cpus)
     cpus1 <- cpus
     attach(cpus)
     for(v in names(cpus)[2:7]) 
       cpus1[[v]] <- cut(cpus[[v]], unique(quantile(cpus[[v]])), 
                         include.lowest = TRUE)
     detach()
     cpus0 <- cpus1[, 2:8]  # excludes names, authors' predictions
     cpus.samp <- sample(1:209, 100)
     cpus.lm <- lm(log10(perf) ~ ., data=cpus1[cpus.samp,2:8])
     cpus.lm2 <- stepAIC(cpus.lm, trace=FALSE)
     cpus.lm2$anova

     example(birthwt)
     birthwt.glm <- glm(low ~ ., family=binomial, data=bwt)
     birthwt.step <- stepAIC(birthwt.glm, trace=FALSE)
     birthwt.step$anova
     birthwt.step2 <- stepAIC(birthwt.glm, ~ .^2 + I(scale(age)^2)
         + I(scale(lwt)^2), trace=FALSE)
     birthwt.step2$anova

     quine.nb <- glm.nb(Days ~ .^4, data=quine)
     quine.nb2 <- stepAIC(quine.nb)
     quine.nb2$anova

