What is BayesX - General scope

BayesX is a software tool for estimating structured additive regression models. Structured additive regression embraces several well-known regression models such as generalized additive models (GAM), generalized additive mixed models (GAMM), generalized geoadditive mixed models (GGAMM), dynamic models, varying coefficient models, and geographically weighted regression within a unifying framework. Besides exponential family regression, BayesX also supports non-standard regression situations such as regression for categorical responses, hazard regression for continuous survival times, continuous time multi-state models, quantile regression, distributional regression models and multilevel models.

Inferential procedures

Estimation of regression models can be achieved based on four different inferential procedures (implemented in different regression objects):

  • MCMC simulation techniques (bayesreg objects).
    A fully Bayesian interpretation of structured additive regression models is obtained by specifying prior distributions for all unknown parameters. Estimation can then be facilitated using Markov chain Monte Carlo simulation techniques. Bayesreg objects provide numerically efficient implementations of MCMC schemes for structured additive regression models in case of exponential family responses, categorical responses, hazard regression and multi-state models.
  • MCMC simulation techniques (mcmcreg objects).
    Mcmcreg objects provide similar functionality for fully Bayesian inference as bayesreg objects but implement structured additive regression models for responses beyond simple exponential families (distributional regression), quantile regression and multilevel models.
  • Mixed model based estimation (remlreg objects).
    Taking advantage of the close connection between penalised likelihood estimate and mixed models, the smoothing parameters of the penalties in structured additive regression can be interpreted as variance components of the random effects. Remlreg objects therefore employ mixed model methodology for the estimation of structured additive regression models. From a Bayesian perspective, this yields empirical Bayes / posterior mode estimates for the structured additive regression models. However, estimates can also merely be interpreted as penalized likelihood estimates from a frequentist perspective.
  • Penalized least squares including model selection (stepwisereg objects).
    As a fourth alternative, BayesX provides a penalized likelihood approach for estimating structured additive regression models including model selection. The algorithms are able to

    • decide whether particular effect types enter the model,
    • decide whether a continuous covariate enters the model linearly or nonlinearly,
    • select complex interaction effects (two dimensional surfaces, varying coefficient terms),
    • select the degree of smoothness of nonlinear covariate, spatial or cluster specific heterogeneity effects.

    Different models are compared via various goodness of fit criteria, e.g. AIC, BIC, GCV and 5 or 10 fold cross validation.

Model classes and model terms

BayesX provides functionality for the following types of responses:

  • Univariate exponential family
    Supported response distributions are Gaussian, Poisson, Binomial and Gamma distribution as well as some simple versions of the negative binomial, zero-inflated Poisson, and zero-inflated negative binomial.
  • Distributional regression A large number of univariate and multivariate continuous, discrete or mixed discrete-continuous responses can be treated within the framework of distributional regression. In this setting, potentially all parameters of these distributions can be related to structured additive predictors.
  • Quantile regression Bayesian quantile regression allows to study specific quantiles of the response distribution without relying on a specific distributional assumption.
  • Categorical responses with unordered responses
    For categorical responses with unordered categories, BayesX supports multinomial logit and multinomial probit models. Both effects of category-specific and globally-defined covariates can be estimated. Category-specific offsets or non-availability indicators can be defined to account for varying availability and varying choice sets.
  • Categorical responses with ordered responses
    For ordered categorical responses, ordinal as well as sequential models can be specified. Effects can be requested to be category-specific or to be constant over the categories. Supported response functions include the logit and the probit transformation.
  • Continuous time survival models
    BayesX supports Cox-type hazard regression models with structured additive predictor for continuous time survival analysis. In contrast to the Cox model, the baseline hazard rate is estimated jointly with the remaining effects based on penalized splines. Furthermore, both time-varying effects and time-varying covariates can be included in the predictor. Arbitrary combinations of right, left and interval censored as well as left truncated observations can be analysed.
  • Continuous time multi-state models
    Multi-state models form a general class for the analysis of the evolution of discrete phenomena in continuous time. Transition intensities between the discrete states are specified in analogy to the hazard rate in continuous time survival models.

Structured additive regression models can be build from arbitrary combinations of the following model terms:

  • Nonlinear effects:
    Nonlinear effects can be estimated based on either penalized spline or random walk models.
  • Seasonal effects:
    Specific autoregressive priors allow for the estimation of flexible, time-varying seasonal effects.
  • Spatial effects:
    Spatial effects can be specified based on Markov random fields, stationary Gaussian random fields (kriging) or bivariate penalized splines. Both georeferenced regional data as well as point-referenced data based on coordinates are supported.
  • Interaction surfaces:
    Bivariate extensions of penalised splines allow to estimate flexible interactions between continuous covariates. Stationary Gaussian random fields can also be considered a radial basis function approach and, hence, form a second possibility for the specification of interaction surfaces.
  • Varying coefficients:
    Varying coefficient models with both continuous and spatial effect modifiers can be estimated. The latter case is also known as geograhpcially weighted regression.
  • Cluster-specific random effects:
    BayesX supports i.i.d. Gaussian random intercepts and random slopes.
  • Regularized high-dimensional effects:
    High-dimensional vectors of regression coefficients can be assigned Bayesian regularization priors. Available alternatives are ridge regression, lasso regularization and normal mixture of inverse gamma (spike and slab) priors.
  • Multilevel models:
    In multilevel models, parameters of specific effects can themselves be assigned a structured additive predictor (e.g. in multilevel random effects specifications).

Note that parts of the functionality may be available for one of the regression objects only. For example, bayesreg objects do not support interval censored survival times while multinomial probit models can not be estimated with remlreg objects. Details can be found in the reference manual.

R Packages

In addition to BayesX, there is an R add-on package of the same name that provides additional functionality for manipulating geographical data and additional graphics facilities. In particular, the package includes resources for constructing boundary and graph files from other geographical information systems. The R-package BayesXsrc allows to install BayesX via the package management of R while R2BayesX and BayesR provide convenient access from within R in the usual R fomular style.