Project (Johannes Söding)
Making the petabytes of deposited metagenomics sequences accessible through an ultrafast search method
You will develop and apply a method that, according to our estimates, should be able to search for similar sequences in huge databases several orders of magnitude faster than existing algorithms. The method will make the petabytes of metagenomics sequences lying around in public databases accessible, for example to find new enzymes for biotechnology (e.g. new CRISPR-CAS proteins), or new biosynthetic gene clusters producing antimicrobial compounds. Applications demonstrating the usefulness of the method will be part of the project. The project requires an affinity for programming and prior programming experience beyond lecture exercises (ideally in C++).
Statistical and machine learning methods for residue-residue protein contact prediction and protein structure prediction
The statistical coupling between columns in a multiple protein sequence alignment (MSA) with sufficiently many sequences can be used to predict direct physical contact between the corresponding amino acid residues. From the reliably prediceed contacts, the protein structure can be predicted. The currently best methods train undirected graphical models (Markov Random Fields) and predict the coupled residues from the strongest edges (indicating statistical coupling) in the undirected model. In this project, you will develop general machine learning algorithms to efficiently train the models using the true likelihood instead of the commonly used pseudolikelihood. You will then develop a Bayesian statistical model that should be able to predict residue contacts with fewer sequences in the MSA than currently possible. These statistical advances should allow us to predict protein structures for the many protein families that contain too few sequences to make reliable contact and structure predictions with current methods.
Other project topics within our research interests are possible and can be discussed.
Homepage Research Grouphttp://www.mpibpc.mpg.de/de/soeding
For more information see for instance: