A Universal Framework for Regulatory Element Discovery across All Genomes and Data-types

Description:

Princeton University Invention # 08-2427

Understanding the mechanistic basis of gene regulation is a central challenge for modern biology, and success in this domain is fundamental to a rational basis for understanding and treating human disease. One set of modern methods attempt to systematically relate gene expression measurements (microarrays) and sequence, in order to identify short regulatory elements that are causally related to variations in gene expression. These approaches typically aim to identify statistically over-represented short sequences in groups of genes that are coordinately expressed as determined by microarray measurements. Although these methods achieve reasonable success within the relatively small and simple genomes of microbes such as Saccharomyces cerevisiae, they fail when confronted with the large and complex genomes of vertebrates and mammals. The existing methods suffer from deficiencies in both sensitivity and specificity. Sensitivity is low because signal-to-noise of small regulatory elements is much lower within the larger regulatory regions of these genomes. Specificity is also a challenge, since the space of short motifs within large sequences is very large, and since the existing methods attempt to identify regulatory elements within individual groups of genes, it is rather easy to identify many motif predictions that are over-represented in individual gene sets but which, nevertheless, have no causal role in regulation, and therefore represent false-positives.

Researchers at the Lewis-Sigler Institute for Integrative Genomics, Princeton University have developed a framework for detecting regulatory DNA and RNA motifs that relies on directly assessing the mutual information between DNA/RNA sequence and measurements related to gene expression. Since this approach can be applied to many groups of co-regulated genes simultaneously, the likelihood of false-positives dramatically drops, while preserving adequate sensitivity to detect the vast majority of known elements within well-characterized genomes. To the best of our knowledge, this is the first usage of mutual information in this important context. Relying on the universal concept of mutual information yields several important advantages. First, unlike existing motif discovery approaches, the proposed invention is directly applicable to any type of expression data, including single microarray conditions, gene clustering partitions, and in situ hybridization patterns. It does not require any of the model-related assumptions (e.g. defining a genomic sequence statistical background) commonly made by other methods. It simultaneously finds DNA motifs in upstream regions and RNA motifs in 3'UTRs and highlights their functional relations. It is applicable to all (and has been applied successfully with respect to many) genomes, including large vertebrate and mammalian genomes, and genomes with unusual and extreme genomic compositions such as that of Plasmodium falciparum, the malaria parasite. It yields very few false positive predictions if any. It incorporates a default systematic analysis of the functional coherence of the predicted regulatory elements, their conservation, positional and orientation biases, cooperativity, and co-localization with other motifs. Furthermore, it displays the results via an information-rich user-friendly graphical interface.

References:

Elemento, O., Slonim, N., Tavazoie, S., October 26, 2007, A Universal Framework for Regulatory Element Discovery across All genomes and Data Types, Molecular Cell, Vol. 28, 337-350.

Princeton is currently seeking to license this technology in the area of bioinformatics and genomics research. Patent protection is pending.

For more information on Princeton University invention # 08-2427 please contact:

Laurie Tzodikov

Office of Technology Licensing and Intellectual Property

Princeton University

4 New South Building

Princeton, NJ 08544-0036

(609) 258-7256

(609) 258-1159 fax

tzodikov@princeton.edu