lda                   package:MASS                   R Documentation

_L_i_n_e_a_r _D_i_s_c_r_i_m_i_n_a_n_t _A_n_a_l_y_s_i_s

_D_e_s_c_r_i_p_t_i_o_n:

     Linear discriminant analysis.

_U_s_a_g_e:

     lda(formula, data, prior = proportions, tol = 1.0e-4, 
                        subset, na.action = na.fail,
                        method, CV = FALSE, nu)
     lda(x,   grouping, prior = proportions, tol = 1.0e-4, 
                        subset, na.action = na.fail,
                        method, CV = FALSE, nu)

_A_r_g_u_m_e_n_t_s:

 formula: A formula of the form `groups ~ x1 + x2 + ...'  That is, the
          response is the grouping factor and the right hand side
          specifies the (non-factor) discriminators. 

    data: Data frame from which variables specified in  `formula' are
          preferentially to be taken. 

       x: (required if no formula is given as the principal argument.)
          a matrix or data frame or Matrix containing the explanatory
          variables. 

grouping: (required if no formula principal argument is given.) a
          factor specifying the class for each observation. 

   prior: the prior probabilities of class membership.  If unspecified,
          the class proportions for the training set are used.  If
          present, the probabilities should be specified in the order
          of the factor levels.  

     tol: A tolerance to decide if a matrix is singular; it will reject
          variables and linear combinations of unit-variance variables
          whose variance is less than `tol^2'. 

  subset: An index vector specifying the cases to be used in the
          training sample.  (NOTE: If given, this argument must be
          named.) 

na.action: A function to specify the action to be taken if `NA's are
          found. The default action is for the procedure to fail.  An
          alternative is `na.omit', which leads to rejection of cases
          with missing values on any required variable.  (NOTE: If
          given, this argument must be named.) 

  method: `"moment"' for standard estimators of the mean and variance,
          `"mle"' for MLEs, `"mve"' to use `cov.mve', or `"t"' for
          robust  estimates based on a t distribution. 

      CV: If true, returns results (classes and posterior
          probabilities) for leave-out-out cross-validation. Note that
          if the prior is estimated, the proportions in the whole
          dataset are used. 

      nu: degrees of freedom for `method = "t"'. 

_D_e_t_a_i_l_s:

     The function tries hard to detect if the within-class covariance
     matrix is singular. If any variable has within-group variance less
     than `tol^2' it will stop and report the variable as constant. 
     This could result from poor scaling of the problem, but is more
     likely to result from constant variables.

     Specifying the `prior' will affect the classification unless
     over-ridden in `predict.lda'. Unlike in most statistical packages,
     it will also affect the rotation of the linear discriminants
     within their space, as a weighted between-groups covariance matrix
     is used. Thus the first few linear discriminants emphasize the
     differences between groups with the weights given by the prior,
     which may differ from their prevalence in the dataset.

_V_a_l_u_e:

     an object of class `"lda"' containing the following components:

   prior: the prior probabilities used. 

   means: the group means. 

 scaling: a matrix which transforms observations to discriminant
          functions, normalized so that within groups covariance matrix
          is spherical. 

     svd: the singular values, which give the ratio of the between- and
          within-group standard deviations on the linear discriminant
          variables.  Their squares are the canonical F-statistics. 

       N: The number of observations used. 

    call: The (matched) function call. 

   class: The MAP classification (a factor) 

posterior: posterior probabilities for the classes 

_N_o_t_e:

     This function may be called giving either a formula and optional
     data frame, or a matrix and grouping factor as the first two
     arguments.  All other arguments are optional, but `subset=' and
     `na.action=', if required, must be fully named.

     If a formula is given as the principal argument the object may be
     modified using `update()' in the usual way.

_S_e_e _A_l_s_o:

     `predict.lda', `qda', `predict.qda'

_E_x_a_m_p_l_e_s:

     data(iris3)
     Iris <- data.frame(rbind(iris3[,,1], iris3[,,2], iris3[,,3]), 
                        Sp = rep(c("s","c","v"), rep(50,3)))
     train <- sample(1:150, 75)
     table(Iris$Sp[train])
     ## your answer may differ
     ##  c  s  v 
     ## 22 23 30
     z <- lda(Sp ~ ., Iris, prior = c(1,1,1)/3, subset = train)
     predict(z, Iris[-train, ])$class
     ##  [1] s s s s s s s s s s s s s s s s s s s s s s s s s s s c c c
     ## [31] c c c c c c c v c c c c v c c c c c c c c c c c c v v v v v
     ## [61] v v v v v v v v v v v v v v v
     z1 <- update(z, . ~ . - Petal.W.)

