CompGen Team Develops Faster Algorithms to Analyze Genomic Associations


The genome-wide association study (GWAS) has great potential to facilitate the identification of genomic loci underlying phenotypic variability and disease susceptibility. Although successful in elucidating the genomic sources of a wide range of agronomically important traits and human diseases, the statistical approaches employed in a typical GWAS assume a potentially over simplistic model of the relationship between genes and traits.

2016 CompGen Fellow Angela Chen

2016 CompGen Fellow Angela Chen

While it is possible to employ a more complicated statistical model reflective of biological reality in a GWAS, such models introduce a substantial computational burden. Therefore, a team of CompGen researchers is developing a Java-based program that can perform a GWAS using these sophisticated statistical models in a reasonable amount of time.

Through the support of CompGen, the team is introducing multithreading into the code, which has already resulted in a dramatic decrease in computational time.

“There are billions of data points we’re looking at, and if we’re trying to use complex statistical models to find multiple pairs of interacting genomic loci that are associated with a trait on a typical laptop, it could take up to 1,000 years to process,” said Angela Chen, a 2016 CompGen Fellow and statistics graduate student. “We want to make this analysis computationally faster, so that it is more accessible for scientists to use it. We also want any researcher with potentially minimal computational training to be able to utilize our program and perform a GWAS using this statistical model, and then be able to easily interpret the results within the context of biology.”

Corn field on University of Illinois South Farms.

With faster software developed by the CompGen team, scientists can more easily piece together how genomic variations connect with various traits and diseases, including those in crops.

“Multithreading decreases software run time. For example, it can reduce the analysis time from potentially 10 years to as short as 10 hours, in some cases,” said Chen. “We will continue to refine our use of multithreading—and investigate other techniques—so that we can continue to decrease the computational time.”

The team has shown success in preliminary analyses their program. With the faster software, scientists can more easily piece together how genomic variations connect with various traits and diseases.

Chen works with CompGen researchers Alexander Lipka, an assistant professor in the Crop Sciences Department, and Liudmila Mainzer, a senior research scientist at NCSA.