Bioinformatics
(In collaboration with )Biological data classification
Contact: Pierre Geurts , Raphaël Marée, Louis WehenkelBecause of the rapid progress of computer and information technology, large amounts of data are nowadays available in almost any field of application. For example, modern medical instrumentation and acquisition technologies (mass spectrometry, microarray, sequencing tools...) generate large datasets describing for example patients, animals, tissus, or cells. The analysis of such amount of data is impossible without the help of efficient computer based tools. Data Mining refers to the application of automatic learning and visualization techniques in order to help a human expert to extract potentially interesting and synthetic knowledge from these large volumes of raw data. Potential medical applications are the automatic design of diagnostic or prognostic tools for a given disease or the identification of potential biomarkers for this disease. Among many other applications, machine learning techniques can also be applied for the identification of coding and non-coding regions in the genome of a given species or for genetic linkage analysis.
Here at the Systems and Modeling research unit, we analyse existing tools and develop new machine learning methods and methodologies. These fundamental researches are often driven by the application needs. In terms of application, we help the user to collect, clean, and design their databases. Then, our approach for a given task is to apply and compare several modern machine learning techniques (e.g. decision tree based methods, neural networks, support vector machines, bayesian networks). these algorithms are developed and adapted internally so as to provide a toolbox of machine learning techniques available to the user.
Case study
In an ongoing research project in collaboration with the
laboratory of clinical chemistry and rheumatology (see also this project),
we apply machine learning techniques for the diagnosis of inflammatory
diseases from proteomic mass spectra. A database containing data from
healthy and disease patients has been gathered using Surface Enhanced
Laser Desorption/Ionisation-Time of Flight-Mass Spectrometry
(SELDI-TOF-MS). Several machine learning tools were applied to this
data. The results in terms of accuracy of the diagnostic rule and
identified biomarkers are very promising. The methodology is
furthermore generic and it could be applied to data obtained
from other medical instrumentation like for example microarray.
Publications
-
Proteomic mass spectra classification using decision tree based ensemble methods
P. Geurts, M. Fillet, D. de Seny, M.-A. Meuwis, M.-P. Merville, L. Wehenkel
Bioinformatics 2005; 21: 3138-3145. -
Discovery of new rheumatoid arthritis biomarkers using SELDI-TOF-MS ProteinChip approach
Dominique deSeny, Marianne Fillet, Marie-Alice Meuwis, Pierre Geurts, Laurence Lutteri, Clio Ribbens, Vincent Bours, Louis Wehenkel, Jacques Piette, Michel Malaise, Marie-Paule Merville
Accepted for publication in Arthritis and Rheumatism - 2005 - Segment and combine approach for Biological Sequence Classification
Pierre Geurts, Antia Blanco Cuesta, Louis Wehenkel
To appear in Proc. IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB 2005) - 2005
Final thesis
2004-2005
- Développement d'un outil de recherche dans les bases de données génétiques, Benjamin Renwart.
- Analyse de séquences biologiques par arbres de décision, Antia Blanco Cuesta.
- Apprentissage automatique sur données biomédicales à l'aide de méthodes à base de noyaux, Christophe Grosfils
2002-2003
- Application de l'apprentissage automatique à l'extraction et la gestion de connaissance des DNA Arrays, François Van Lishout
2000-2001
- Application de l'apprentissage automatique à la localisation de gènes à effets quantitatifs, Estelle Graas.
Life Science Image Classification
Contact: Raphaël Marée, Louis WehenkelWith the improvements in sensor and image acquisition technology, possibilities to gather image data about the natural world are multitudinous. Scientists have investigated the possiblity to use these image data together with computer vision technology to solve real-world problems. This has lead to some successful applications such as classification of blood cells and human radiographs, identification of animal species or individuals (mollusc, salamanders, ...), recognition of seeds, etc. Using databases including labeled images provided by human experts, scientists have been able to design specific computer vision programs to classify automatically previously unseen images of such natural "objects" or to estimate medical parameters.
Here at the Montefiore Institute / GIGA Bioinformatics Unit, we developed and adapted a new generic Data
Mining approach for image classification. It was successfully applied
to several types of image classification problems: recognition of
digits, faces, 3D objects, textures, buildings, general purpose
photographs, ...
We envision now to use such techniques to image classification
problems related to life sciences. As our approach has already been
tested successfully on a large number of problems, we expect good
performances on natural world images. The approach being automatic and
generic, we believe some interesting results could rapidly be obtained
if human experts simply provide a set of labeled images.
The first step of that kind of project consists in
providing images (in usual computer formats) with labels. The task
of our method is to automatically construct a model able to
classify new images. First results are then rapidly obtained in
the form of error rates on a set of images. If the evaluation is
successful, an autonomous program (classifier) can then be
developped and integrated in the real environment.
Publications
-
Biomedical Image Classification with Random Subwindows and Decision Trees
Raphaël Marée, Pierre Geurts, Justus Piater, Louis Wehenkel
To appear in Proc. ICCV workshop on Computer Vision for Biomedical Image Applications - 2005
Final thesis
2004-2005
- Classement automatique des poudres, Claudio Rudi.
This thesis is about automatic powder classification with machine learning methods. One possible application is in pharmaceutical powders.