optRF
Optimising Random Forest (optRF)
The Research Project
Random forest is a particularly prominent machine learning method used for predictions and prediction based decision-making processes. Although random forest is known to have many advantages, one aspect that is often overseen is that it is a non-deterministic method that can produce different models using the same input data. This can have severe consequences on decision-making processes. The R package optRF models the non-linear relationship between the number of trees and the prediction stability and uses this model to determine the optimal number of trees for any given data set.
Software
The R package optRF is open source and provides tools to automatically optimise the prediction stability of random forest prediction models. It can be installed in R by:
> install.packages("optRF")
> library("optRF")
> ?opt_prediction
Further material can be found at:
Publications
A detailed description of the method as well as a practical introduction to the problem of non-determinism and the work flow of the R package can be found at:
- Link to the original publication: optRF: Optimising random forest stability by determining the optimal number of trees. BMC Bioinformatics (2025). DOI: 10.1186/s12859-025-06097-1
- Link to the blog post: How to Set the Number of Trees in Random Forest - A practical introduction to the optRF package, towardsdatascience.com
Presentations
Presentation slides of selected conference contributions about the research project can be found here:
- Presentation of optRF generally in all fields of biometry, presented at the 6th Central European Network (CEN) conference "Power of Data – Shaping the Future of Life Sciences" 2026 in Warsaw (Poland)
Presentation slides
- Presentation of the optRF method specifically in genomic selection in wheat breeding, presented at the 8th Conference on Cereal Biotechnology and Breeding (CBB) 2025 in Budapest (Hungary)
Presentation slides