**A 13: Identifying biological pathways in expression data and modeling these pathways in the prediction equations **

**PhD Student:** Li Zhengcaou

**Supervisor:** Prof. Dr. Henner Semianer, Prof. Dr. Heike Bickeböller, Prof. Dr. Martin Schlather

**Group:** Animal Breeding and Genetics

Research plan

Ph.D. Dissertation: Identifying biological pathways in expression data and modeling these pathways in the prediction equations

Keywords: biological pathways, structural equation models, expression data, prediction equations.

Research Background

Prediction of breeding values is of central importance in livestock improvement. With the development of high-throughput sequencing technologies, several approaches such as GBLUP, Bayesian methods or Kernel-based methods to accomplish genomic prediction have been developed and widely used, which normally use molecular marker information, phenotypic and pedigree information to calculate genomic breeding values. However, the scope for improvement of accuracy in prediction using these approaches is quite limited (few per cent). A significant step forward is expected by integrating structural knowledge of the biology underlying the relevant trait into the prediction model. At the moment, this is hampered by two conditions: i) for many relevant traits in farm animals there is only limited knowledge about the pathways underlying the phenotype; and ii) it is unclear how complex and nonlinear metabolic interactions can be properly accounted for in genomic prediction models.

Aims of this research

I. Retrieving pathway information from suitable data, such as expression or transcriptome data.

II. Modelling biological pathways into the prediction equations.

Introduction

So far we use biological pathway information from data bases like Kyoto Encyclopedia of Genes and Genomes (KEGG), which is a database resource for understanding high-level functions and utilities of the biological system, such as the cell, the organism and the ecosystem, from molecular-level information, especially large-scale molecular datasets generated by genome sequencing and other high-throughput experimental technologies. However, we know that these data bases are rather incomplete for most farm animal species. An alternative could be to retrieve pathway information from data, such as expression or transcriptome data with tools such as structural equations models (SEMs). Structural equation models are multivariate models that account for causal associations between variables, and can be used to study recursive and simultaneous relationships among phenotypes in multivariate systems such as genomics, systems biology, and multiple trait models in quantitative genetics. They were adapted to the quantitative genetics mixed-effects models settings by Gianola and Sorensen (2004). Additionally, several researchers developed inference techniques by providing likelihood functions and posterior distributions for Bayesian analysis and addressed identifiability issues inherent to structural equation modeling. We will attempt to use SEMs or similar approaches to construct biological pathways in the next years.

Researchers in animal breeding and genetics have taken up new statistical ideas rapidly and have also contributed to the field of biological statistics, significantly so in the cases of BLUP, Bayesian methods, and whole-genome prediction. Such approaches are based on a somewhat linear map of the genome, i.e., that a string of bases can produce an accurate genotype-phenotype mapping. However, the DNA-protein process is not linear because of phenomena like protein folding, pervasive interaction and feedbacks in the metabolism, and nonlinear enzyme kinetics. DNA and methylation information may be crucial for breeding value assessment, but appropriate environmental modeling (environmentomics) with supplementary omics-type information should also be considered for building more effective prediction machineries. A study from human genetics (2)(Wheeler et al. 2014) has indicated that integrating messengerRNA and microRNA expression data substantially increases predictive performance in the context of personalized medicine; this approach is a special case of Reproducing kernel Hilbert spaces regression (3)(Gianola & van Kaam 2008). One of our objectives is to integrate the expression data into prediction models, which may improve the accuracy of prediction of breeding values in farm animals.

According to recent reports, we know these approaches are still undeveloped in the field of genomic prediction, and they have been viewed as a promising direction by several top quantitative genetics scientists. So we are determined to solve these problems to some extent.

(1) Valente B.D., Rosa G.J., Gianola D., Wu X.L. & Weigel K. (2013) Is structural equation modeling advantageous for the genetic improvement of multiple traits? Genetics 194, 561-72.

(2) Wheeler H.E., Aquino-Michaels K., Gamazon E.R., Trubetskoy V.V., Dolan M.E., Huang R.S., Cox N.J. & Im H.K. (2014) Poly-omic prediction of complex traits: OmicKriging. Genet Epidemiol 38, 402-15.

(3) Gianola D. & van Kaam J.B. (2008) Reproducing kernel hilbert spaces regression methods for genomic assisted prediction of quantitative traits. Genetics 178, 2289-303.

(4) Rosa G.J. & Vazquez A.I. (2010) Integrating biological information into the statistical analysis and design of microarray experiments. Animal 4, 165-72.

(5) Rosa G.J., Valente B.D., de los Campos G., Wu X.L., Gianola D. & Silva M.A. (2011) Inferring causal phenotype networks using structural equation models. Genet Sel Evol 43, 6.

(6) Gonzalez-Recio O., Gianola D., Long N., Weigel K.A., Rosa G.J. & Avendano S. (2008) Nonparametric methods for incorporating genomic information into genetic evaluations: an application to mortality in broilers. Genetics 178, 2305-13.

(7) Penagaricano F., Valente B.D., Steibel J.P., Bates R.O., Ernst C.W., Khatib H. & Rosa G.J. (2015) Exploring causal networks underlying fat deposition and muscularity in pigs through the integration of phenotypic, genotypic and transcriptomic data. BMC Syst Biol 9, 58.

(8) Ober, U., M. Erbe, N. Long, E. Porcu, M. Schlather, and H. Simianer.(2011). Predicting Genetic Values: A Kernel-Based Best Linear Unbiased Prediction With Genomic Data. Genetics. 188:695?708. doi:10.1534/genetics.111.128694.