2014 Genome Informatics: Challenges of Migrating to GRCh38

GRCh38 was released in December of 2013 and represents over four years of curation to improve errors and add missing sequence. There are many reasons to believe GRCh38 is a substantial improvement over GRCh38, but there are challenges to migrating to this assembly. While the assembly model is the same as for GRCh37, there are now 178 regions with alternate loci, as opposed to 3 in GRCh37. Modern analysis tools have difficulty with these sequences as they cannot distinguish between paralogous duplication and the allelic duplication introduced by the alternate loci. Additionally while de novo gene annotation exists for GRCh38, most large-scale variant calling projects have not used this assembly for analysis. The development of a gold set of variants for GRCh38 will be critical for the wide-spread adoption of this assembly.