Segment and combine approach for Biological Sequence Classification
Proc. IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB 2005), page 194--201 - 2005
This paper presents a new algorithm based on the
segment and combine paradigm, for automatic classification of
biological sequences. It classifies sequences by aggregating the
information about their subsequences predicted by a classifier
derived by machine learning from a random sample of training
subsequences. This generic approach is combined with decision
tree based ensemble methods, scalable both with respect to
sample size and vocabulary size. The method is applied to three
families of problems: DNA sequence recognition, splice junction
detection, and gene regulon prediction. With respect to standard
approaches based on n-grams, it appears competitive in terms of
accuracy, flexibility, and scalability. The paper highlights also the
possibility to exploit the resulting models to identify interpretable
patterns specific of a given class of biological sequences.
BibTex references
@InProceedings\{GBW05,
author = "Geurts, Pierre and Blanco Cuesta, Antia and Wehenkel, Louis",
title = "Segment and combine approach for Biological Sequence Classification",
booktitle = "Proc. IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB 2005)",
pages = "194--201",
year = "2005",
keywords = "bioinformatics, machine learning",
url = "http://www.montefiore.ulg.ac.be/services/stochastic/pubs/2005/GBW05"
}
![geurts-cibcb2005.pdf [104Ko]](http://www.montefiore.ulg.ac.be/services/stochastic/pubs/images/pdf.png)