mapquik
is an ultra-fast read mapper based on
The underlying seed constructs (
The mapping performance of mapquik
degrades markedly when identity between reads and the reference is lower than mapquik
is not suitable for mapping PacBio CLR reads, and potentially also Oxford Nanopore reads until base-calling consistently reaches identity levels above
Pre-requisites: A working Rust environment.
Clone the repository, and run
rustup install nightly
cargo +nightly build --release
The nightly version of cargo
is required because mapquik
uses experimental language features (such as SIMD and intrinsics).
target/release/mapquik <reads.fq> --reference <reference.fa>
mapquik
takes a single FASTA/FASTQ input (gzip
-compressed or not) as input. Multi-line sequences are not supported.
The output of mapquik
is a regular PAF file.
An example reference genome, and a script to simulate reads using pbsim
are provided in the example/
folder. To run mapquik
on a small set of 100 reads, type:
cd example && bash run_ecoli.sh
which will run both mapquik
and minimap2
on 100 simulated reads, and return the output of paftools.js mapeval
on both PAF files.
To simulate a larger set of reads using pbsim and map, type:
bash simulate_pbsim.sh && bash run_ecoli_full.sh
For further information on usage and parameters, run
target/release/mapquik -h
for a one-line summary of each flag, or run
target/release/mapquik --help
for a lengthy explanation of each flag.
All scripts used to generate the figures and tables in the paper can be found in the experiments/
folder. Specifically, the simulate_chm13.sh
and simulate_maize.sh
scripts can be used similarly to simulate reads.
In order to obtain and map DeepConsensus reads, first run
wget https://storage.googleapis.com/brain-genomics-public/research/deepconsensus/data/v0.3/assembly_analysis/fastqs/HG002_24kb_2SMRT_cells.dc.v0.3.q20.fastq.gz
gunzip -c HG002_24kb_2SMRT_cells.dc.v0.3.q20.fastq.gz | grep -v TOTAL > dc.hg002.fastq
and map to a reference genome reference.fa
in your directory with mapquik
using
target/release/mapquik dc.hg002.fastq --reference reference.fa -p mapquik-dc
mapquik significantly accelerates the seeding and chaining steps for both the human and maize genomes with minimap2
, and on the maize genome, a minimap2
.
mapquik
indexing is minimap2
, which is of independent interest.
mapquik
is freely available under the MIT License.
- Barış Ekim, supervised by Bonnie Berger at the Computer Science and Artificial Intelligence Laboratory (CSAIL) at Massachusetts Institute of Technology (MIT)
- Rayan Chikhi at the Department of Computational Biology at Institut Pasteur
@article{mapquik,
title={Efficient mapping of accurate long reads in minimizer space with mapquik},
author={Ekim, Bar{\i}{\c{s}} and Sahlin, Kristoffer and Medvedev, Paul and Berger, Bonnie and Chikhi, Rayan},
journal={Genome Research},
pages={gr--277679},
year={2023},
publisher={Cold Spring Harbor Laboratory}
}
Should you have any inquiries, please contact Barış Ekim at baris [at] mit [dot] edu, or Rayan Chikhi at rchikhi [at] pasteur [dot] fr.