News

CompGen Team Builds Ancestral Trees to Determine Disease-Causing Genetic Variants

Twitter

Many of our most widespread diseases, such as diabetes, cancer, cardiovascular disease, and mental illness, are associated with variants in our genes. How do these variants in our genomes carry across generations, and how do they ultimately affect our health? University of Illinois researchers are trying to unlock the mystery.

Derek Wildman, a CompGen researcher and professor of molecular and integrative physiology

Derek Wildman, a CompGen researcher and professor of molecular and integrative physiology

Parsing out ancestry-related genomic variations requires some data crunching. To put it in perspective, within each human genome, there are 46 chromosomes, and a single chromosome can have 6.5 million variants. Variants can be passed down from generation to generation, creating a map of ancestral genomic history. Each of those variants may play a unique role in our health.

Using novel algorithms, researchers from CompGen, a collaborative computational genomics initiative between the Coordinated Science Laboratory and the Carl R. Woese Institute for Genomic Biology (IGB), are employing the supercomputing power of NCSA’s Blue Waters to scan 2,500 genomes to determine how variants transfer through ancestral ties.

Armed with this information, they can start to understand how our ancestry makes us either susceptible or resilient to diseases.

“How our genomic variants are partitioned across geographic and ethnic diversity is really important, both for understanding human evolutionary history and patterns of migration and globalization, but also very important for understanding health and disease, which is our major focus,” said Derek Wildman, a CompGen researcher and professor of molecular and integrative physiology.

Wildman, who is working with Don Armstrong, a research scientist at IGB, and Monica Uddin, an associate professor of psychology at Illinois, says past research examining ancestry and disease have relied upon self-reported ethnicity, a limiting factor.

This ancestral tree depicts the clustering of nearly 6.5 million chromosome 1 variants, gathered from analyzing 2,504 genomes.

This ancestral tree depicts the clustering of nearly 6.5 million chromosome 1 variants, gathered from analyzing 2,504 genomes.

“A lot of disease research has based ethnic categorizations along self-reported ethnicity, but genetic variation is more subtle and complex than socially constructed categories such as race,” said Wildman. “We’re all mixed to varying degrees with different histories, and that complexity, which we can determine using Blue Waters, likely plays a role in our health and disease.”

When looking at what can causes disease, it is additionally important to disambiguate between genetic and external factors of particular ethnic groups. Wildman, for example, has found that in terms of pregnancy, African Americans are at a greater risk for pre-term birth than white Americans. The reason for this is something this team is still investigating.

“We’re not sure whether that’s due to environment, psycho-social factors, a history of racism and segregation, or genetics,” said Wildman. “We haven’t been able to tease them apart, but it seems worthwhile to examine all those aspects. Having accurate ancestral trees in relation to genetic variants is a key component.”

To determine disease-causing genetic variants, the team needs to solve another genetic problem that’s emerging: genomics is quickly becoming the discipline that generates the most data, surpassing other big data producers, like YouTube and Twitter, in scale. That’s where Blue Waters can help.

“There are more possible phylogenetic trees from the 2,500 genomes we’re analyzing than there are electrons in the universe,” said Wildman. “So looking at all of those would be prohibitive, but there are approaches on Blue Waters that allow you to simulate phylogenetic trees and get an idea of what the correct ones are.”

Don Armstrong

Don Armstrong

Armstrong is using algorithmic approaches called maximum likelihood and Bayesian inference to comprehensively sort through and efficiently select the likely ancestral roots of each human genome.

Once the researchers have the ancestral trees, they can map out how our ancestry affects genomic variations, and which variations are markers for disease.

“We’re working to make better maps of ancestral and genomic history and to see the genetic landscape more accurately,” said Wildman. “Ultimately, knowing what diseases you may be susceptible to, based on your genetics, means you can take action and make better informed decisions about your health.”