Abstract (ENG)

Exact localization of prokaryotic translation initiation sites with automated prediction systems is still not completely solved. In this context, two approaches from the field of machine learning have been developed: The Oligo Kernel algorithm, a supervised learning method for analysis of signaling in biological sequences and TICO (Translation Initiation site COrrection), a tool for (re-)annotation of translation initiation sites with an unsupervised classification scheme.

It is shown that the Oligo Kernel algorithm is well suitable for analysis of biological signals. In a case study on translation initiation sites of eubacterium Escherichia coli K-12 the high performance of the Oligo classificator is demonstrated on experimentally verified data. A visualization of the discriminative signals facilitates a biological meaningful interpretation. For E. coli K-12 commonly known signals for translation initiation and their inherent variability can be clearly identified. Since the algorithm is flexible regarding the degree of positional smoothing it can be adapted to analysis of other biological signals.

The program TICO significantly improves prediction of prokaryotic translation initiation sites as compared to previous approaches, by post-processing an initial gene annotation as obtained by a classical gene finder. The improvement of such a reannotation amounts up to 30%. The algorithm provides a visualization method allowing an intuitive presentation of the discriminative features. The program can be accessed through a web interface and is freely available as command line tool for Linux and Windows.