learnMET is a flexible and user-friendly R package for genomic prediction with multi-environment breeding data (genomic and environmental data) using machine learning techniques. The package facilitates the implementing of different types of cross-validation schemes that are relevant in plant breeding (e.g. predicting a new year based on past field trials, predicting missing genotypes in tested environments…). Different machine learning-based prediction methods can be tested and compared, using different sets of predictor variables. The package integrates also the retrieval of environmental data, if the user provides geographical coordinates and planting and harvest dates. Daily weather data can be aggregated into environmental covariates using different methods precised in the package documentation. The software is available via GitHub, and we provide vignettes explaining how to run the functions. Find out more in our learnMET paper. Developer: Cathy Westhues
G-hat to identify selected complex traits
The G-hat method can be used to identify complex traits that have been subjected to selection. It does this by relating allele frequency change to SNP effect estimates for every SNP genotypes. See our paper for details. G-hat is available for R and can be installed via CRAN, the Comprehensive R Archive Network, or with the "install.packages()" command.
GenWin is an R package that defines window or bin boundaries for the analysis of genomic data. Boundaries are based on the inflection points of a cubic smoothing spline fitted to the raw data. Along with defining boundaries, a technique to evaluate results obtained from unequally-sized windows is provided. Applications are particularly pertinent for, though not limited to, genome scans for selection based on variability between populations (e.g. using Wright’s fixations index, Fst, which measures variability in subpopulations relative to the total population).
GenWin is available on CRAN, the Comprehensive R Archive Network.
The ohtadstats R package, a work of former lab member Paul Petrowski, can be implemented to calculate Tomoko Ohta’s partitioning of linkage disequilibrium, deemed D-statistics, for pairs of loci. The package is written so that it can be scaled-up to form a genome-wide test, by implementing the function repeatedly across pairs of loci in a genotype table. See our Heredity paper for an example of this package in action.
DriftSimulator.R is an R function for conducting simulations of genetic drift at a single locus. Initial frequency, number of generations, and population demographics can all be manipulated, and plotting is simple. Documentation is in the header of the file. Load into R with “source()”, or by copy-pasting the text of the script.
DriftSimulatorWithBottlenecks.R is very similar to the above R function for conducting simulations of genetic drift at a single locus, but also enables the user to specify a bottleneck event. Documentation is in the header of the file. Load into R with “source()”, or by copy-pasting the text of the script.
VectorFst.R is a simple R function that can be used to calculate locus-by-locus FST values from allele frequency data. Basic documentation is included in the header of the file. Load into R with “source()”, or by copy-pasting the text of the script.
ModifiedRogersDistanceFunction.R is a basic function for calculating the modified Roger’s genetic distance between individuals. The calculation is simple, but I’m not aware of other implementations in R. Apply to a dataframe with individuals in rows and markers in columns. There should be two columns per marker (one column for each allele), coded as 0, 1, or 2.