TFE 2011-2012 (final year project)

Back to list

Using biological networks to search for interacting loci in genome-wide association studies

Because the costs have become affordable, we can now perform routinely whole-genome sequencing of individual human genomes. However, the more we know about the genetic etiology of a complex disease, the more we realize there is a lot more to know. Genomes are composed of both protein-coding and nonprotein-coding DNA and we are only beginning to have clear handles on the mechanisms of gene expression1, the initial product of genome expression being the transcriptome, the final product being the proteome. Focusing on one particular platform or data type may miss an obvious signal. A combination of different viewpoints to genome sequences, whether derived from the general population or diseased individuals, involving an “integrated” genome- wide analysis of DNA (genomics), RNA expression (transcriptomics), protein expression (proteomics), DNA methylation (epigenomics), and accounting for existing interactions within and between these omics data sets will be crucial to increase insight into disease pathways while creating new opportunities for understanding cellular functional architecture. When envisaging an “integrated” approach, several challenges exist, including 1) data pre-processing and quality control, 2) high dimensionality requiring complex computational analyses, 3) elevated multiple testing, 4) finding the most optimal way to integrate data, balancing between enough detail and parsimony, 5) interpretation (validation) of the final model(s). This project focuses on the interlinked challenges 2)- 4).

Two data sources that are routinely being integrated are transcriptome data and genome data. In this thesis, we focus on the high-throughput and genome-wide measurement of gene expression in a natural population of unrelated humans, and on the subsequent association of variation in expression to “expression quantitative trait loci” (eQTLs) on DNA using oligonucleotide arrays with hundreds of thousands of single-nucleotide polymorphism (SNP) markers that capture most of the human genetic variation well (Franke et al ). This strategy has been successfully applied to several diseases such as celiac disease (Hunt et al. 2008, Nat Genet 40, 395-402) and asthma (Moffatt et al. 2007, Nature 448, 470-473): associated genetic variants have been identified that affect levels of gene expression in cis or in trans, providing insight into the biological pathways affected by these diseases. Less commonly used is to find associations of variation in expression to multiple genetic loci or clusters of genetic loci.

In practice, the thesis will consist of three main parts:

Depending on the progress made in this project, the work may lead to a genuine scientific publication. This project will allow you to work together with other academic institutions throughout Europe.


Kristel Van Steen (