Tommi Jaakkola, Ph.D.
Department of Electrical Engineering and Computer Science
Computer Science and Electrical Engineering
Ph.D. Brain & Cognitive Sciences • 1997
Massachusetts Institute of Technology
Prof. Jaakkola's group focuses on several complementary areas: statistical inference and estimation, machine learning, and computational biology. The group's research in computational biology is motivated by the need to understand cellular mechanisms responsible for transcriptional control. This is a problem of enourmous scientific and practical importance. Our work has focused first on model organisms such as yeast with the ultimate goal of understanding regulatory control in more complex human cells.
Understanding the regulatory programs operating in a cell presents a number of challenges, arising in part from the complexity of genetic regulatory systems, and in part from the lack of sufficient experimental data probing key aspects of such systems. The landscape of available experimental data is rapidly changing, however. A number of high-throughput biological data sources are already available for the purpose of elucidating underlying transcriptional mechanisms. While the available measurements are noisy, fragmented, or otherwise incomplete, they nevertheless offer complementary information about the structure and functioning of the biological system. When used in combination with each other, the data can provide a compelling view about the underlying biology.
The cell cycle is a prominent example of a regulatory program. Understanding how the cell cycle is regulated and what may disrupt the normal process of the cell through the cell cycle would have considerable implications. To resolve the regulatory control underlying the cell cycle in yeast or other organisms presents a number of computational challenges that we have to overcome. Time course gene expression profiles, for example, are noisy irregularly sampled measurements across a large number of cells whose cell cycles are only partially synchronized. The length of the cell cycle may also be affected by the experimental protocols used. In a series of papers, we have developed a robust methodology for reconstruction, comparative alignment, and deconvolution of time course expression profiles.
We have also developed a new comprehensive methodology for inferring annotated molecular interaction networks, known as physical networks, from the available complementary data sources. In contrast to many competing statistical approaches, these models are readily interpretable and directly verifiable. To infer such rich molecular representations from the available diverse data sources, it is crucial to articulate clear causal mechanisms (e.g., molecular cascades) that mediate how cell responses to perturbations such as gene deletions. By providing explicit computational descriptions of such mechanisms, and by selecting the appropriate mechanisms in an automated manner, we are able to maintain the interpretation of our models even when they are estimated on the basis of diverse data sources.
Beyond reconstructing molecular interaction networks based only on the limited data that is currently available, we have automated the design of new experiments so as to select experiments that would be most informative in terms of resolving remaining ambiguities. Moreover, we are in the process of formulating a theory of automated computational model verification where, analogously to the verification of any scientific theory, only the validity of (competitively) unique predictions from the model can provide additional evidence of correctness. These represent steps towards automated computational discovery.
- C-H. Yeang and T. Jaakkola. Modeling the combinatorial functions of multiple transcription factors. In The Ninth Annual International Conference on Research in Computational Molecular Biology, 2005.
- C-H. Yeang, H. Mak, S. McCuine, C. Workman, T. Jaakkola, and T. Ideker. Validation and refinement of gene-regulatory pathways on a network of physical interactions. Genome Biology, 6(7):R62, 2005.
- C-H. Yeang, T. Ideker, and T. Jaakkola. Physical network models. Journal of Computational Biology, 11(2-3):243--263, 2004.
- Z. Bar-Joseph, G. Gerber, I. Simon, Gifford I., D., and T. Jaakkola. Comparing continuous representations of time series expression profiles to identify differentially expressed genes. Proceedings of the National Academy of Sciences, 100(18):10146--10151, 2003.
- Z. Bar-Joseph, G. Gerber, T. Lee, N. Rinaldi, J. Yoo, F. Robert, B. Gordon, E. Fraenkel, T. Jaakkola, R. Young, and D. Gifford. Computational discovery of gene modules and regulatory networks. Nature Biotechnology, 21(11):1337--1342, 2003.
- C-H. Yeang and T. Jaakkola. Physical network models and multi-source data integration. In The Seventh Annual International Conference on Research in Computational Molecular Biology, 2003.
- Z. Bar-Joseph, G. Gerber, D. Gifford, T. Jaakkola, and I. Simon. Continuous representations of time series gene expression data. Journal of Computational Biology, 10(3-4):341-356, 2003.
Last Updated: April 16, 2008