Illinois CompGen and Harvard T. H. Chan School of Public Health join forces for a detailed review of Isaac, Illumina’s variant calling workflow


The pace of implementing personalized medicine is increasing. Genomics facilities across the country will likely face the challenge of sequencing hundreds of individual genomes per day in the near future.

With this vast amount of data, current practices in genomic analysis pose serious computational challenges. Therefore, efforts in both academia and the private sector have focused on developing alternative workflows that may substantially reduce the computational cost per genome.

Illinois CompGen and Harvard review Illumina's Isaac workflow

One kind of genomic analyses that is particularly prevalent, called variant calling, aims to find genetic differences (variants) between an individual and a population average. These variants could potentially be linked to disease predisposition and other phenotypic features, which is helpful in a range of applications, from targeted drug therapies to selective crop breeding.

Isaac is an “ultra-fast” variant calling workflow, designed by Illumina, Inc., and claims to be six times faster than BWA-GATK—a trusted open-source alternative —with comparable sensitivity and specificity.

Researchers in the Computational Genomics Initiative (CompGen) at Illinois teamed up with the Harvard Chan School to perform an independent review of Isaac, focusing on its accuracy of retrieving variant calls.

“Isaac is indeed quite fast, providing variant calls in just a few hours on a single, but powerful server,” said Liudmila Sergeevna Mainzer, a research assistant professor for CompGen.

The accuracy results suggest that the Isaac workflow has undergone substantial improvement across several versions in the last year. The call accuracy is especially high on the standard benchmarking genome NA12878. However, the accuracy drops on other data, and exomes tend to have a high fraction of false positive calls.

The researchers published the results in a report that includes information about accuracy measurements across many datasets, performance metrics, some command-line parameters and documentation.