Princeton
University Invention # 08-2427
Understanding the mechanistic basis of gene regulation is a central
challenge for modern biology, and success in this domain is fundamental to a
rational basis for understanding and treating human disease. One set of
modern methods attempt to systematically relate gene expression measurements
(microarrays) and sequence, in order to identify short regulatory elements that
are causally related to variations in gene expression. These approaches
typically aim to identify statistically over-represented short sequences in
groups of genes that are coordinately expressed as determined by microarray
measurements. Although these methods achieve reasonable success within the
relatively small and simple genomes of microbes such as Saccharomyces
cerevisiae, they fail when confronted with the large and complex genomes of
vertebrates and mammals. The existing methods suffer from deficiencies in
both sensitivity and specificity. Sensitivity is low because
signal-to-noise of small regulatory elements is much lower within the larger
regulatory regions of these genomes. Specificity is also a challenge,
since the space of short motifs within large sequences is very large, and since
the existing methods attempt to identify regulatory elements within individual
groups of genes, it is rather easy to identify many motif predictions that are
over-represented in individual gene sets but which, nevertheless, have no causal
role in regulation, and therefore represent
false-positives.
Researchers at the Lewis-Sigler Institute for Integrative Genomics,
Princeton University have developed a framework for detecting regulatory DNA and
RNA motifs that relies on directly assessing the mutual information between
DNA/RNA sequence and measurements related to gene expression. Since this
approach can be applied to many groups of co-regulated genes simultaneously, the
likelihood of false-positives dramatically drops, while preserving adequate
sensitivity to detect the vast majority of known elements within
well-characterized genomes. To the best of our knowledge, this is the
first usage of mutual information in this important context. Relying on the
universal concept of mutual information yields several important advantages.
First, unlike existing motif discovery approaches, the proposed invention is
directly applicable to any type of expression data, including single
microarray conditions, gene clustering partitions, and in situ
hybridization patterns. It does not require any of the model-related
assumptions (e.g. defining a genomic sequence statistical background) commonly
made by other methods. It simultaneously finds DNA motifs in upstream regions
and RNA motifs in 3'UTRs and highlights their functional relations. It is
applicable to all (and has been applied successfully with respect to many)
genomes, including large vertebrate and mammalian genomes, and genomes with
unusual and extreme genomic compositions such as that of Plasmodium
falciparum, the malaria parasite. It yields very few false positive
predictions if any. It incorporates a default systematic analysis of the
functional coherence of the predicted regulatory elements, their conservation,
positional and orientation biases, cooperativity, and co-localization with other
motifs. Furthermore, it displays the results via an information-rich
user-friendly graphical interface.
References:
Elemento, O., Slonim, N., Tavazoie, S., October 26, 2007, A Universal
Framework for Regulatory Element Discovery across All genomes and Data Types,
Molecular Cell, Vol. 28, 337-350.
Princeton is currently seeking to license this technology in the area of
bioinformatics and genomics research. Patent protection is pending.
For more information on Princeton University invention # 08-2427 please
contact:
Laurie Tzodikov
Office of Technology Licensing and Intellectual Property
Princeton University
4 New South Building
Princeton, NJ 08544-0036
(609) 258-7256
(609) 258-1159 fax
tzodikov@princeton.edu