Topics for Bachelor Theses, Master Theses and Lab Rotations in Statistics

This page lists different topics that can be turned into bachelor theses, master theses and lab rotations for students in applied statistics, data science, economics, etc., depending on individual qualifications. If you are interested, get in contact with the responsible person listed for the topic.

  • Title: Expected to Benefit Sets in Distributional Regression
    Short description: In many situations, the effect of a treatment is not homogeneous for the complete population of subjects under study, but rather varies heterogeneously across subjects. While often the goal of statistical investigations is to estimate the corresponding heterogeneity of treatment effects, ony may also be interested in inverting the relationship to identify those subjects which will benefit the most from a treatment, including appropriate quantification of uncertainty. This leads to so-called "expected to benefit" sets of observations. The goal of this master thesis is to implement and evaluate Bayesian approaches for the identification of expected to benefit sets for distributional regression models.
    Contact: Thomas Kneib (tkneib@uni-goettingen.de)
  • Title: Conformal Prediction in Distributional Regression
    Short description: Conformal prediction provides data analysts with a model-agnostic way of constructing predictions in an online setting where predictions are constructed successively. It utilizes past experience where the quality of predictions is evaluated based on a conformity measure. In this master thesis, the foundations of conformal prediction shall be worked out and contrasted with more conventional statistical approaches for making and evaluating predictions, in the specific context of distributional regression models.
    Contact: Thomas Kneib (tkneib@uni-goettingen.de)
  • Title: Heteroscedastic Uncertainty Estimation
    Short description: Heteroscedastic regression models entail regression specifications not only for the mean but also for the variance of the response variable. While such models are well-established in statistics as a special case of generalized additive models for location, scale and shape, they are also receiving increasing attention in machine learning. Recently it has been reported that probabilistic neural networks may fail in disentangling regression effects on the mean and the variance, see https://openreview.net/pdf?id=aPOpXlnV1T. In this project, the findings of this paper will be replicated and investigated with the aim of (i) understanding potential sources of the problem, and (ii) studying whether similar challenges arise with different estimation approaches. The project will be worked on in collaboration with Alexander März.
    Contact: Thomas Kneib (tkneib@uni-goettingen.de)
  • Title: LASSO regularization and group fixed effects
    Short description: Fixed effects specifications in panel data enable to control for various types of unobserved heterogeneity, but considerably inflate the number of parameters to be estimated. To overcome this problem, group fixed effects approaches aim at identifying sub-groups in the data that share the same fixed effects structure. In this thesis, regularization approaches such as the fused LASSO will be investigated with respect to their ability to identify group fixed effects in panel data.
    Contact: Thomas Kneib (tkneib@uni-goettingen.de)
  • Title: Topics of Bayesian statistics in economics
    Short description: We want to explore the rise of Bayesianism and its topics in the last 20 years in the field of economics. We want to distinguish topics in which Bayesian methods were used as opposed to non-Bayesian methods by looking at a large data set of articles in economic science. Therefore, we need to develop appropriate metrics that can be used in the context of machine learning algorithms.
    Contact: Jens Lichter (jens.lichter@uni-goettingen.de)
  • Title: Entwicklung des Trinkwasserverbrauches in Göttingen
    Short description: Modellieren des Trinkwasserverbrauches in Göttingen anhand Harzer Trinkwasserseen. Ziel ist es, den Trinkwasserverbrauch über das Jahr zu analysieren und dabei besonders auf extreme Ereignisse wie sehr hohe Temperaturen zu schauen.
    Contact: Jens Lichter (jens.lichter@uni-goettingen.de)
  • Title: Lohn- und Personalstrukturanalyse im niedersächsischen Gesundheitswesen
    Short description: Im Rahmen einer (Bachelor-)Abschlussarbeit sollen Daten zur Lohn- und Personalstruktur in niedersächsischen Krankenhäusern und vergleichbaren Gesundheitsbetrieben erhoben und analysiert werden.
    Contact: Alexander Silbersdorff (asilbersdorff@uni-goettingen.de)
  • Title: What catches the learning eye
    Short description: Using PyGaze the eye movement data of students watching introductory mathematics and statistics lectures should be recorded and analysed with respect to the students learning success. Contact: Alexander Silbersdorff(asilbersdorff@uni-goettingen.de)
  • Title: Machine learning applications for image and video analysis in livestock farming
    Short description: Monitoring the behavior of animals is crucial in livestock farming. Among other things, the information collected can be used for the development of new farming methods or assistance systems. With the rise of powerful machine learning methods, there is the potential to increasingly automate monitoring tasks using tools for image and video analysis. Tasks that are of interest include automatic animal tracking and action recognition. Contact: Jonathan Henrich (jonathan.henrich@uni-goettingen.de)
  • Title: Heterogeneous effects of environmental policy on innovation (using China as an example).
    Short description: Using Chinese firm-level data, the effects of a Chinese environmental policy measure on innovation will be investigated. Classical regression methods and machine learning methods such as generalized random forests can be applied. In the course of the work, data may have to be acquired, processed and analyzed. Furthermore, a comprehensive overview of the existing literature has to be given and existing methods have to be applied.
    Contact: Isea Cieply (isea.cieply@uni-goettingen.de)
  • Title: The Gender Pay Gap of Elites: What factors contribute to gender inequalities among leaders in senior management and academia?
    Short description: The work involves extensive research on existing studies on the gender pay gap in academia and higher management. Based on this, suitable data have to be cleaned and analyzed mainly by means of classical regression models. Among others, the Socio-Economic Panel (SOEP) can be used for the data analysis.
    Contact: Isea Cieply (isea.cieply@uni-goettingen.de)
  • Title: Outlier detection in time series using machine learning
    Short description: Tree growth data from dentrometers usually have a very high resolution and are measure movements in micrometer scale. However, dendrometers are very sensitive and over the time they need to be reinstalled, which can cause point outliers and also whole sequences of outliers. The growth development of a tree tends to behave very similar to nearby trees of the same species at similar age. We have a data set with multiple growth data of trees in close proximity. Therefore, we want to use a multivariate time series model to detect outliers based on machine learning algorithms.
    Contact: Jens Lichter (jens.lichter@uni-goettingen.de)
  • Uncertainty Estimation in (Medical) Image Classification
    Short description: Over the last decade, neural networks have reached almost every field of science and became a crucial part of various real world applications. Due to the increasing spread, confidence in neural network predictions became more and more important. However, basic neural networks do not deliver certainty estimates or suffer from over or under confidence, i.e. are badly calibrated. This thesis investigates and extends existing approaches for measuring uncertainty in Deep Neural Networks applied to (Medical) Image Classification tasks.
    Contact: Michael Schlee(michael.schlee@uni-goettingen.de)
  • Hashtag weighted topic extraction
    Short description: In recent years, social media platforms have witnessed an exponential growth in user-generated content, leading to a vast amount of information available online. Extracting relevant topics from this vast pool of data has become a crucial task for various applications, including sentiment analysis, trend detection, and opinion mining. Traditional methods for topic extraction rely on techniques such as keyword matching and statistical algorithms, which often fail to capture the dynamic nature and contextual relevance of topics. This master thesis proposes shall investigate the influence inherent metadata associated with social media content, specifically hashtags on the accuracy and relevance of topic extraction.
    Contact: Michael Schlee(michael.schlee@uni-goettingen.de)
  • Title: Tree instance segmentation from forest point clouds using deep learning
    Short description: With recent advances in laser scanning, it is possible to create three-dimensional point clouds of the surfaces in a forest. To monitor and understand changes in forest composition and structure it is often useful to segment this forest point cloud into individual trees. In this work, the aim is to built upon an existing deep-learning-based segmentation method. Possible avenues for research include self- and semi-supervised learning strategies as well as the exploration of new model architectures. This topic can be worked on during a lab rotation or as a master thesis. Contact: Jonathan Henrich (jonathan.henrich@uni-goettingen.de)
  • Title: Machine learning applications for image and video analysis in livestock farming
    Short description: Monitoring the behavior of animals is crucial in livestock farming. Among other things, the information collected can be used for the development of new farming methods or assistance systems. With the rise of powerful machine learning methods, there is the potential to increasingly automate monitoring tasks using tools for image and video analysis. Tasks that are of interest include automatic animal tracking and action recognition. This topic can be worked on during a lab rotation or as a bachelor or master thesis. Contact: Jonathan Henrich (jonathan.henrich@uni-goettingen.de)
  • Title: Implement Bayesian Discrete Choice Models in Liesel
    Liesel is a Python framework for efficient probabilistic programming that consists of a model-building library and a library for Markov-Chain-Monte-Carlo (MCMC) algorithms. This thesis implements functionality for setting up and sampling discrete choice models with hierarchical priors and mixtures-of-normals-priors with the Liesel framework and validates their behavior through simulations and comparisons to existing implementations in the R package `bayesm`. Since this thesis has a strong focus on programming in Python, prior programming experience in Python is recommended.
    Contact: Johannes Brachem (brachem@uni-goettingen.de)
  • Title: Bayesian Penalized Transformation Models for Bounded Responses
    Short description: Penalized Transformation Models (PTMs) are a novel form of location-scale regression. They allow researchers to place covariate models on the location and scale of a response variable, while estimating the response's conditional distribution directly from the data. Thus, they do not require the assumption of a parametric distribution, like existing location-scale regression models do. This thesis explores the application of PTMs to response variables that are bounded, meaning, for example, that the response can only take positive values or values between 0 and 1. To this end, the concept of link functions known from Generalized Additive Models is applied to PTMs. The model is implemented in Python using Jax and Liesel. Previous experience with Python is recommended.
    Contact: Johannes Brachem (brachem@uni-goettingen.de)
  • Title: Bayesian Penalized Transformation Models for Count Data
    Short description: Penalized Transformation Models (PTMs) are a novel form of location-scale regression. They allow researchers to place covariate models on the location and scale of a response variable, while estimating the response's conditional distribution directly from the data. Thus, they do not require the assumption of a parametric distribution, like existing location-scale regression models do. This thesis explores the application of PTMs to count data. The model is implemented in Python using Jax and Liesel. Previous experience with Python is recommended.
    Contact: Johannes Brachem (brachem@uni-goettingen.de)
  • Title: Bayesian Penalized Transformation Models with different reference distributions
    Short description: Penalized Transformation Models (PTMs) are a novel form of location-scale regression. They allow researchers to place covariate models on the location and scale of a response variable, while estimating the response's conditional distribution directly from the data. This is achieved by estimating a transformation function that relates the conditional distribution of the data to a fully specified reference distribution. Notably, the reference distribution determines the tail behavior of the model. Commonly, the standard normal distribution is used as the reference distribution. This thesis explores the application of different reference distributions. The model is implemented in Python using Jax and Liesel. Previous experience with Python is recommended.
    Contact: Johannes Brachem (brachem@uni-goettingen.de)
  • Title: Apply Generalized Linear Mixed Model to an Experiment on Nudging Meal Choices
    Short description: In behavioural economics, the use of small interventions (nudges) to influence people’s behaviour, often in a socially desirable way, is an area of considerable interest. This thesis analyses data from an experiment that investigated a way of nudging Mensa customers towards choosing meat-free meals. Since the outcome is categorical, an ordinary linear model is not viable. Further, it would be desirable to take inter-individual differences into account. This thesis will therefore employ a generalised linear mixed model to analyse the data.
    Contact: Johannes Brachem (brachem@uni-goettingen.de)
  • Title: Distributional Rregression using Stochastic Variational Inference and Normalizing Flows
    Short description: In distributional regression, the conditional distribution of the response variables given the covariate information and the vector of model parameters is modelled with a P-parametric probability density function where each parameter is modelled through a linear predictor and a bijective response function that map the domain of the predictor into the domain of the parameter. The goal of the thesis is to implement a flexible Stochastic Variational Inference algorithm, a technique for approximating posterior distributions through optimization. The idea is to define a family of densities over the latent variables defined by a vector of variational parameters and then find the settings of the parameters that make the variational distribution close to the posterior by stochastic optimization. In particular, the student will work with Normalizing Flows, a modern technique for approximating a distribution where a simple initial density is transformed into a more complex one by applying a sequence of invertible transformations until a desired level of complexity is attained.
    Contact: Gianmarco Callegher (gianmarco.callegher@uni-goettingen.de
  • Title: (Generalized) Linear Model via Stochastic Variational Inference exploiting Sparse Matrices representation
    Short description: Stochastic Variational Inference is a technique for approximating posterior distributions through optimization. The idea is to define a family of densities over the latent variables defined by a vector of variational parameters and then find the settings of the parameters that make the variational distribution close to the posterior by stochastic optimization. In this project the student will exploit sparse matrix representation of the design and precision matrix for implementing a fast Python library for (Generalized) Linear Models based on SVI.
    Contact: Gianmarco Callegher (gianmarco.callegher@uni-goettingen.de
  • Title: Variational Inference in Liesel
    Short description: Stochastic Variational Inference is a technique for approximating posterior distributions through optimization. The idea is to define a family of densities over the latent variables defined by a vector of variational parameters and then find the settings of the parameters that make the variational distribution close to the posterior by stochastic optimization. In this project the student will extend the [Liesel](https://docs.liesel-project.org/en/latest/) framework by providing an alternative to Goose and the MCMC kernels through VI.
    Contact: Gianmarco Callegher (gianmarco.callegher@uni-goettingen.de