University of Illinois Collaborates with Mayo Clinic to Revolutionize Genomic Data Analysis


By Claudia Lutz, IGB

Today’s researchers, working with the advantages of new, sophisticated laboratory technology, have unleashed a river of valuable biomedical data—much more, in fact, than many of them have the tools to properly analyze, or the capacity to store. In 2012, the National Institutes of Health created the Big Data to Knowledge (BD2K) initiative to enable efforts to harness the potential of this flood of information. As part of the first wave of BD2K funding, the University of Illinois at Urbana-Champaign and Mayo Clinic have now received a $9.34M, 4-year award to create one of several new Centers of Excellence for Big Data Computing.

The NIH initiative encompasses a broad range of “big data” types, including collections of high-resolution research images or real-time recordings of complex biological phenomena. The Illinois-Mayo Center, to be located on the Urbana-Champaign campus, will focus on the analytical challenges posed by the rapidly growing body of genomic and transcriptomic data produced by genome-wide, high-throughput experimental technologies.

The Center’s research goal is to create a revolutionary analytical tool that allows any biomedical researcher to place a gene-based data set in the context of “community knowledge,” the entire body of previously published gene-related data. This broad context for individual data sets will offer new functional insights for the genes being studied. The proposed Knowledge Engine for Genomics, or KnowEnG, will be unique in its integration of many disparate sources of gene data to increase its analytical power, as well as in its planned scalability—the tool will be designed to accommodate the continued growth of genomic community knowledge, and the increasing computational infrastructure required to work with genomic data.

To create KnowEnG, the Center will combine the expertise of many units across the U of I campus, including the Carl R. Woese Institute for Genomic Biology (IGB), the Department of Computer Science, the Coordinated Science Laboratory, the College of Engineering, and the National Center for Supercomputing Applications (NCSA). As a leader of biomedical research and structured data collection, Mayo Clinic will play a vital role in design, testing, and refinement.

The Center will be led by computer scientist and IGB affiliate Jiawei Han, who will serve as Program Director. Other Principal Investigators are computer scientist and IGB member Saurabh Sinha; physicist, bioengineer and IGB member Jun Song; and Richard Weinshilboum, M.D., interim director of the Mayo Clinic Center for Individualized Medicine and director of the center’s Pharmacogenomics Translational Program. IGB and NCSA Director of Bioinformatics and Director of the High-Performance Biological Computing Group, C. Victor Jongeneel, will function as Executive Director.

The Center’s transcendence of disciplinary boundaries will be key to its success. Insights drawn from many areas of computer science will strengthen KnowEnG’s design.

“By integrating multiple analytical methods derived from the most advanced data mining and machine learning research, KnowEnG will transform the way biomedical researchers analyze their genome-wide data,” said Han. “The Center will leverage the latest computational techniques used to mine corporate or Internet data to enable the intuitive analysis and exploration of biomedical Big Data.”

The Center will also rely on communication between interface design experts at Illinois and biomedical researchers at Mayo Clinic, who represent KnowEnG’s intended users. Feedback among these Center members will ensure that the developed tool is valuable, intuitive, and customizable for use in a broad array of experimental contexts.

Describing his excitement for the project, co-PI Sinha explained, “This is [a project] that’s bigger than all of us . . . what I’m most excited about is the actual possibility that this could be a tool which everybody uses in the world.”

In addition to development of KnowEnG, the Center will develop a training framework that empowers researchers to use the new tool and engage in bioinformatics research, regardless of their prior computational knowledge. The Center will also participate in a planned nation-wide consortium, composed of all the BD2K Centers of Excellence established by the NIH initiative, to exchange insights, contribute to standards for tool development, and help set broad goals for the future of work on Big Data.