Segment and combine approach for Biological Sequence Classification

Pierre Geurts, Antia Blanco Cuesta, Louis Wehenkel
Proc. IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB 2005), page 194--201 - 2005
Download the publication : geurts-cibcb2005.pdf [104Ko]  
This paper presents a new algorithm based on the segment and combine paradigm, for automatic classification of biological sequences. It classifies sequences by aggregating the information about their subsequences predicted by a classifier derived by machine learning from a random sample of training subsequences. This generic approach is combined with decision tree based ensemble methods, scalable both with respect to sample size and vocabulary size. The method is applied to three families of problems: DNA sequence recognition, splice junction detection, and gene regulon prediction. With respect to standard approaches based on n-grams, it appears competitive in terms of accuracy, flexibility, and scalability. The paper highlights also the possibility to exploit the resulting models to identify interpretable patterns specific of a given class of biological sequences.

BibTex references

@InProceedings\{GBW05,
  author       = "Geurts, Pierre and Blanco Cuesta, Antia and Wehenkel, Louis",
  title        = "Segment and combine approach for Biological Sequence Classification",
  booktitle    = "Proc. IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB 2005)",
  pages        = "194--201",
  year         = "2005",
  keywords     = "bioinformatics, machine learning",
  url          = "http://www.montefiore.ulg.ac.be/services/stochastic/pubs/2005/GBW05"
}

Other publications in the database

» Pierre Geurts
» Louis Wehenkel