Data resources¶

Taxonomy classifiers for use with q2-feature-classifier¶

Danger

Pre-trained classifiers that can be used with q2-feature-classifier currently present a security risk. If using a pre-trained classifier such as the ones provided here, you should trust the person who trained the classifier and the person who provided you with the qza file. This security risk will be addressed in a future version of q2-feature-classifier.

Note

Taxonomic classifiers perform best when they are trained based on your specific sample preparation and sequencing parameters, including the primers that were used for amplification and the length of your sequence reads. Therefore in general you should follow the instructions in Training feature classifiers with q2-feature-classifier to train your own taxonomic classifiers (for example, from the marker gene reference databases below).

Naive Bayes classifiers trained on:

Silva 138 99% OTUs full-length sequences (MD5: fddefff8bfa2bbfa08b9cad36bcdf709)
Silva 138 99% OTUs from 515F/806R region of sequences (MD5: 28105eb0f1256bf38b9bb310c701dc4e)
Greengenes 13_8 99% OTUs full-length sequences (MD5: 03078d15b265f3d2d73ce97661e370b1)
Greengenes 13_8 99% OTUs from 515F/806R region of sequences (MD5: 682be39339ef36a622b363b8ee2ff88b)

Please cite the following references if you use any of these pre-trained classifiers:

Bokulich, N.A., Robeson, M., Dillon, M.R. bokulich-lab/RESCRIPt. Zenodo. http://doi.org/10.5281/zenodo.3891931
Bokulich, N.A., Kaehler, B.D., Rideout, J.R. et al. Optimizing taxonomic classification of marker-gene amplicon sequences with QIIME 2’s q2-feature-classifier plugin. Microbiome 6, 90 (2018). https://doi.org/10.1186/s40168-018-0470-z
See the SILVA website and the latest Greengenes publication for the latest citation information for these reference databases.

Please note, these classifiers were trained using scikit-learn 0.23.1, and therefore can only be used with scikit-learn 0.23.1. If you observe errors related to scikit-learn version mismatches, please ensure you are using the pretrained-classifiers that were published with the release of QIIME 2 you are using.

Marker gene reference databases¶

These marker gene reference databases are formatted for use with QIIME 1 and QIIME 2. If you’re using these databases with QIIME 2, you’ll need to import them into artifacts before using them.

Greengenes (16S rRNA)¶

Find more information about Greengenes in the DeSantis (2006) and McDonald (2012) papers.

License Information can be found on the Greengenes website. Greengenes data are released under a Creative Commons Attribution-ShareAlike 3.0 License.

Silva (16S/18S rRNA)¶

QIIME-compatible SILVA releases (up to release 132), as well as the licensing information for commercial and non-commercial use, are available at https://www.arb-silva.de/download/archive/qiime.

We also provide pre-formatted SILVA reference sequence and taxonomy files here that were processed using RESCRIPt. See licensing information below if you use these files.

Silva 138 SSURef NR99 full-length sequences (MD5: de8886bb2c059b1e8752255d271f3010)
Silva 138 SSURef NR99 full-length taxonomy (MD5: f12d5b78bf4b1519721fe52803581c3d)
Silva 138 SSURef NR99 515F/806R region sequences (MD5: a914837bc3f8964b156a9653e2420d22)
Silva 138 SSURef NR99 515F/806R region taxonomy (MD5: e2c40ae4c60cbf75e24312bb24652f2c)

Please cite the following references if you use any of these pre-formatted files:

Bokulich, N.A., Robeson, M., Dillon, M.R. bokulich-lab/RESCRIPt. Zenodo. http://doi.org/10.5281/zenodo.3891931
See the SILVA website for the latest citation information for SILVA.

License Information:¶

The pre-formatted SILVA reference sequence and taxonomy files above are available under a Creative Commons Attribution 4.0 License (CC-BY 4.0). See the SILVA license for more information.

The files above were downloaded and processed from the SILVA 138 release data using the RESCRIPt plugin and q2-feature-classifier. Sequences were downloaded, reverse-transcribed, and filtered to remove sequences based on length, presence of ambiguous nucleotides and/or homopolymer. Taxonomy was parsed to generate even 7-level rank taxonomic labels, including species labels. Sequences and taxonomies were dereplicated using RESCRIPt. Sequences and taxonomies representing the 515F/806R region of the 16S SSU rRNA gene were extracted with q2-feature-classifier, followed by dereplication with RESCRIPt.

UNITE (fungal ITS)¶

All releases are available for download at https://unite.ut.ee/repository.php.

Find more information about UNITE at https://unite.ut.ee.

Microbiome bioinformatics benchmarking¶

Many microbiome bioinformatics benchmarking studies use mock communities (artificial communities constructed by pooling isolated microorganisms together in known abundances). For example, see Bokulich et al., (2013) and Caporaso et al., (2011). Public mock community data can be downloaded from mockrobiota, which is described in Bokulich et al., (2016).

Public microbiome data¶

Qiita provides access to many public microbiome datasets. If you’re looking for microbiome data for testing or for meta-analyses, Qiita is a good place to start.

SEPP reference databases¶

The following databases are intended for use with q2-fragment-insertion, and are constructed directly from the SEPP-Refs project.

Silva 128 SEPP reference database (MD5: 7879792a6f42c5325531de9866f5c4de)
Greengenes 13_8 SEPP reference database (MD5: 9ed215415b52c362e25cb0a8a46e1076)