Investigation of Biclustering Methods for High Dimensional ‘Omic’ Data

Dan Lin

Date and place: Tuesday December, 8th 11:00 at Room A (B34, GIGA+1)

Technological advances in microarray technology lead to collection of information about transcript abundance in biological samples for thousands of genes simultaneously. Due to the fact that only a small set of the genes participates in a cellular process of interest. An interesting cellular process is active only in a subset of the conditions. The idea of clustering genes and conditions simultaneously becomes important.

There are dozens of algorithms currently available for the bicluster analysis. However, there were no systematic comparisons performed in order to evaluate the performance of the biclustering tools. Several publications on a survey of the methods (Madeira and Oliveira 2004), on a comparison of the most highly used techniques (Prelic et al. 2006) were released, but no conclusive results could be drawn from them. Prelic et al. (2006) selected several methods for evaluation and performed a comparative study. Some attempts to make a comparison were done by developers of new heuristic approaches. Yet, the framework for the methods compared in various papers is not uniform.

Therefore, an on-going simulation study is carried out to investigate the performance of several biclustering methods in term of consistent identification of biclusters present in the data. Quality measures for biclusters are discussed. In addition, we also investigate the sources of variability in the results and the dependency on the initial values driven by random seeds.

Some preliminary observations are to be discussed and various topics are put into perspective for further research.