HISAT 2 is a fast and sensitive alignment program for mapping next-generation sequencing reads (whole-genome, transcriptome, and exome sequencing data) against the general human population (as well as against a single reference genome). Based on generalized BWT [1], we designed and implemented a generalized FM index (GFM), an original approach and its first implementation to the best of our knowledge. In addition to using one global GFM index that represents general population, HISAT 2 uses a large set of small GFM indexes that collectively cover the whole genome (each index representing a genomic region of 56 Kbp, with 55,000 indexes needed to cover human population). These small indexes (called local indexes) combined with several alignment strategies enable effective alignment of sequencing reads. This new indexing scheme is called Hierarchical Generalized FM index (HGFM). We have developed HISAT 2 based on the HISAT [2] and Bowtie 2 [3] implementations.
A few notes:
-
HISAT 2's index (HGFM) size for the human reference genome and 12.3 million common SNPs is 6.2GB. The SNPs consist of 11 million single nucleotide polymorphisms, 728,000 deletions, and 555,000 insertions. Insertions and deletions used in this index are small (usually <20bp). We plan to incorporate structural variations (SV) into this index.
-
The memory footprint of HISAT 2 is relatively low, 6.7GB.
-
The runtime of HISAT 2 is estimated to be slightly slower than HISAT (30–100% slower for some data sets).
-
HISAT 2 provides greater accuracy for alignment of reads containing SNPs.
-
We plan to release an alpha version of HISAT 2 in August this year.
References:
[1] Sirén J, Välimäki N, Mäkinen V (2014) Indexing graphs for path queries with applications in genome research. IEEE/ACM Transactions on Computational Biology and Bioinformatics 11: 375–388. doi: 10.1109/tcbb.2013.2297101
[2] Kim D, Langmead B, and Salzberg SL HISAT: a fast spliced aligner with low memory requirements, Nature methods, 2015
[3] Langmead B, Salzberg SL: Fast gapped-read alignment with Bowtie 2. Nat Methods 2012, 9:357-359