Manolis Kellis - The wisdom of crowds
A recent paper out of Manolis Kellis' group shows the wisdom of crowds when it comes to
selecting methods for analyzing gene regulatory networks.
August 10, 2012
Uncovering and modeling gene regulatory networks is one of the longstanding challenges in computational
biology. While many different methods exist for analyzing and reconstructing gene regulatory networks, it is often
difficult to decipher when these techniques will operate successfully, and which method is optimal for exploring
Each year, the Dialogue on Reverse Engineering Assessment and Methods (DREAM) project brings together
researchers from around the world to tackle different challenges in cellular network inference and quantitative
model building in systems biology. These challenges range from assessing computational models of predicting
breast cancer survival to predicting disease phenotypes from systems genetic data.
In 2010, participants were asked to focus on the reconstruction of regulatory networks for microorganisms,
performing a blind assessment of more than 30 network inference methods on Escherichia coli, Staphylococcus aureus, Saccharomyces cerevisiae, and in silico microarray data. Participants made predictions about the
different networks and then submitted their results, and information on the different inference methods they used,
to DREAM challenge organizers. Each submission was evaluated to see which techniques for gene network
analysis were the most successful.
Daniel Marbach, a postdoctoral fellow in EECS Associate Professor Manolis Kellis' research group at the MIT
Computer Science and Artificial Intelligence Lab (CSAIL), analyzed the results in a paper that appeared this month
on the cover of Nature Methods. The work was completed in collaboration with Gustavo Stolovitzky at IBM, Jim
Collins and James Costello at Boston University, and Robert Küffner at Ludwig-Maximilians University in Munich.
The results were surprising as they showed that there is not one optimal method favored across all datasets;
instead, they found that different methods were strongly favored for different networks, suggesting that no single
method could be uniformly recommended.
Moreover, by grouping methods according to the type of methodology used, they found that similar approaches
led to similar performance patterns across different datasets.
"This project engaged many key players in the network inference community, and taught us a great deal about the
state of the art in the field," Kellis says. "It suggests that some general principles underlie the performance of
different prediction methods, and that they capture different aspects of the underlying networks."
The team then set out to combine these community predictions in order to construct a new predictor that
combines the strengths of individual methods. The results upheld a longstanding belief in the wisdom of the crowd,
showing that the optimal way to analyze datasets is frequently a middle ground that combines several different
"We tried to leverage the wisdom of crowds to construct a method that builds on the strength of complementary
approaches," Marbach says. "We realized that when you combine the predictions of all the teams you get even
more powerful prediction methods that consistently outperform individual approaches over a large range of
In the study, Marbach and his colleagues compared 35 individual methods for gene regulatory networks, 29 of
which were submitted through the DREAM project and six of which were common network inference techniques.
By combining different inference methods, they were able to construct high-confidence consensus networks for Escherichia coli, Staphylococcus aureus, and test 53 novel interactions in E.coli, of which 43 percent were
supported, displaying the power of community-based methods for network inference.
"The novelty that we saw in this study is that you can get this improvement of accuracy when combining different
methods for network inference. While this has been observed in other fields, this is a new result for network
biology," Marbach says. "We found that this community approach performs consistently across very different settings. Therefore, for a new dataset, the best strategy for network inference may be to apply a set of diverse
methods and then combine the resulting predictions."
While most research groups work independently to solve complex challenges, the DREAM project engaged many
different groups of researchers to solve the same problem. Each group applied their methods to reconstruct the
regulatory network for the same three microorganisms using the same datasets, and then submitted their results
and methods of analysis for evaluation.
"The DREAM project enabled rapid sharing of results, direct comparison of the methods, and led to many new
insights on the state of the art, that wouldn't have been possible with the traditional approach of waiting for each
publication," Kellis says. "The community really came together to participate in this study, and everything we
learned relied on the participation and energy of dozens of teams across the world coming together."
The DREAM meeting is organized each year by Dr. Gustavo Stolovitzky and colleagues in conjunction with the
RECOMB satellites on Regulatory Genomics and Systems Biology, co-organized by Prof. Manolis Kellis at MIT
and Dr. Andrea Califano at Columbia University. The joint meeting will be held Nov. 12-15, 2012, in San
Francisco. For more information, please visit: http://recomb-2012.c2b2.columbia.edu/.