CompGen: Using High-Performance Computing to Unlock the Mysteries of Genetics


When scientists first mapped the human genome in 2000, the promise of personalized medicine went from a remote dream to a breakthrough within grasp. Thirteen years later, however, momentum has slowed because the ability to sequence DNA has begun to outpace computing—in particular, the capability of storing, transmitting and, most critically, analyzing the data.

With a $2.6 million grant from the National Science Foundation, University of Illinois researchers from across campus are collaborating to build an instrument that will enable faster, more accurate DNA sequencing and the processing of massive data sets. The potential payoff: A better understanding of the basic processes of life, illumination on how evolution works and custom treatments for disease, among other breakthroughs.

CompGen2“We’re on the cusp of a second genomic revolution, but we need big data to make it happen,” said Gene Robinson, director of Illinois’ Institute for Genomic Biology. Robinson, together with IGB researchers Saurabh Sinha and Victor Jongeneel, is teaming up with researchers in the Coordinated Science Laboratory (CSL) to develop and build the instrument. The CSL team includes electrical and computer engineering professors Steven Lumetta and Ravi Iyer.

“The purpose of CompGen is to open the floodgates for genomic information and transform computing for genomics,” Robinson said.

The team building this unique instrument includes a consortium of 15 companies, universities, and research institutions. They’ll design the instrument’s hardware and software simultaneously, creating a single integrated platform to drive new breakthroughs in genomics. The consortium will also allow for new applied research projects that require the instrument. Already teams are collaborating on improved error correction, genome assembly, and variant calling.

“We’re seeing exponential growth in gene sequencing today. The amount of genome data available for exploration doubles every five months,” said CSL interim director Klara Nahrstedt. “A project like CompGen is perfect for Illinois, requiring an organization that has a deep understanding of both Big Data and of biology and genomics.”

Iyer agreed: “I believe that genomic data is indeed the most complex of all big data problems and has the potential to be transformational to computer science and engineering in all of its aspects.”

Currently, the world’s most powerful sequencers map an individual’s DNA by chopping up a human’s 3 billion nucleotides, which encode the instructions for a gene, into very tiny strings that machines can effectively process. Researchers must then take the tiny strings and order them correctly, much like putting together a million-piece puzzle.

With the CompGen instrument, the goal is to be able to accelerate genomic science, using new computational technologies and techniques to leverage the more widespread availability of genomic data. It will do this by incorporating technologies—like non-volatile memory and die-stacked memory—that are only beginning to make their way to commercial products.

Lumetta, principal investigator on the project, will lead the instrument’s design, which will combine custom, state-of-the-art technologies that enable the processing, information retrieval and storage of massive data sets. With CompGen’s scaling capabilities, scientists hope to be able to compare large genome collections, with the idea of exploring such complex issues as the impact of climate change on gene expression and ecosystems and exploring social aspects of genomics.

“The machine will be built with genomic applications in mind, with the idea that eventually we’d like to see this technology migrate to a cloud environment,” Lumetta said. “With Illinois’ expertise in genomic research and high-performance computing, we believe that we can have an enormous impact in this field.”

For more information on the broader CompGen initiative, see:

Members of the CompGen project include:

University of Illinois at Urbana-Champaign
Abbott Nutrition
Agilent Technologies
Baylor College of Medicine Human Genome Sequencing Center
Beijing Genomics Institute
Mayo Clinic
Strand Life Sciences
Tata Institute of Fundamental Research
Tezzaron Semiconductor
Washington University’s Genome Institute