This repository contains examples, demonstrations, and support scripts
for building custom sourmash
databases, using
the new sourmash sketch fromfile
command
and related additions to sourmash.
See sourmash#1671 for the overall discussion about building databases.
See an example of building a private database.
Another example: building protein and DNA databases starting from genomes.
Building a DNA+protein database from the NCBI genome assembly & proteome files.
Building a DNA+protein database from an NCBI genome assembly file.
fasta-to-fromfile.py
- build afromfile
CSV file from a list of FASTA files.genbank-to-fromfile.py
- build afromfile
CSV file from a list of FASTA files downloaded from Genbankkiln.py
- support library for buildingfromfile
CSVs.mass-rename.py
- a script to bulk-rename sourmash signatures.mass-merge.py
- a script to bulk-merge sourmash signatures by spreadsheet column attribute.sigs-to-manifest.py
- a script to extract and/or update sourmash manifests from many databases.