P3-1: Multilevel Generalized Additive Models for Location, Scale, and Shape

PhD Student:Manuel Carlan
Supervisor: Prof. Dr. Thomas Kneib
Group: Statistics and Econometrics

Hierarchical data structures represent one of the most important cases of scaling problems when information has to be integrated from different hierarchical levels. Generalized linear mixed models (GLMMs) are a convenient tool in the statistical analysis of such data structures in which covariate effects can be specified on different levels of the hierarchy and random effects are incorporated to account for unobserved heterogeneity as well as within-group correlations. To achieve additional flexibility with respect to the included covariate effects, generalized additive mixed models (GAMMs) replace the usual linear predictor of GLMMs by means of an additive predictor. This allows for nonlinear as well as other types of covariate effects, such as spatial effects, interaction surfaces, or varying coefficients. In a further extension, the nonlinear functional effects can again be considered as random effects such that cluster-specific nonlinear curves result. Inference can either be based on Bayesian principles using Markov chain Monte Carlo simulation techniques or on (restricted) maximum likelihood estimation.

GLMMs and GAMMs are rather flexible in terms of the predictor structure and the types of covariate effect included, However, they always retain the basic assumption of generalized linear models for the response distribution. More specifically, the response distribution is assumed to be within the family of univariate exponential families (comprising the normal, binomial, Poisson, and gamma distribution as the most important special cases). Furthermore, the predictor is related to the expectation of the response via a monotonically increasing link function. As a consequence, the covariate effects solely focus on the mean and it is not possible to capture effects related to higher order moments, such as variance or skewness. Generalized additive models for location, scale, and shape (GAMLSS) provide a general framework to overcome this restriction, since basically any distribution can be assumed for the response and multiple regression predictors can be related to all parameters of this distribution. As the most well known special case, this framework comprises heteroscedastic normal regression with effects on both the mean and the standard deviation or the variance. However, GAMLSS are much more general and include regression models for generalized gamma or beta distributions and zero-inflated count data regression and can be extended to new distributions if required. Standard GAMLSS allow for additive predictor structures but are not capable of dealing with multilevel random effects structures.

In this project, we will develop multilevel GAMLSS and appropriate statistical inference following an approach recently suggested for mean regression. We will combine the flexibility of GAMLSS with respect to the choice of the response distribution with the flexibility of GAMMs concerning the inclusion of hierarchical random effects. In fact, such a multilevel regression specification goes well beyond common mixed models in allowing multilevel specifications, not only in the random effects but also for any type of regression effect contained in the model specification (most importantly for spatial effects but also cluster-specific or temporal effects). This will allow us to explain temporal, spatial, or hierarchical scaling effects in terms of appropriate covariates. Inference in multilevel GAMLSS is rather challenging and we will rely on Markov chain Monte Carlo simulations to make inference tractable even with complex model specifications. Therefore, the project will rely considerably on results obtained during the first funding period. The development and in particular the implementation of multilevel GAMLSS will be conducted in collaboration with Stefan Lang (University of Innsbruck), who has long-term experience in Bayesian semiparametric regression and statistical computing based on Markov chain Monte Carlo simulations.