The genome-wide association study (GWAS) has great potential to facilitate the identification of genomic loci underlying phenotypic variability and disease susceptibility. Although successful in elucidating the genomic sources of a wide range of agronomically important traits and human diseases, the statistical approaches employed in a typical GWAS assume a potentially over simplistic model of the relationship between genes and traits.
While it is possible to employ a more complicated statistical model reflective of biological reality in a GWAS, such models introduce a substantial computational burden. Therefore, a team of CompGen researchers is developing a Java-based program that can perform a GWAS using these sophisticated statistical models in a reasonable amount of time.
Through the support of CompGen, the team is introducing multithreading into the code, which has already resulted in a dramatic decrease in computational time.
“There are billions of data points we’re looking at, and if we’re trying to use complex statistical models to find multiple pairs of interacting genomic loci that are associated with a trait on a typical laptop, it could take up to 1,000 years to process,” said Angela Chen, a 2016 CompGen Fellow and statistics graduate student. “We want to make this analysis computationally faster, so that it is more accessible for scientists to use it. We also want any researcher with potentially minimal computational training to be able to utilize our program and perform a GWAS using this statistical model, and then be able to easily interpret the results within the context of biology.”
“Multithreading decreases software run time. For example, it can reduce the analysis time from potentially 10 years to as short as 10 hours, in some cases,” said Chen. “We will continue to refine our use of multithreading—and investigate other techniques—so that we can continue to decrease the computational time.”
The team has shown success in preliminary analyses their program. With the faster software, scientists can more easily piece together how genomic variations connect with various traits and diseases.