GigaDB - Submission Guidelines

GigaDB is a China National GeneBank supported repository used to host data and tools associated with articles in GigaScience. As part of your manuscript submission and in line with the Reporting Standards and FAIRsharing guidelines for data deposition and formatting for papers submitted to GigaScience we will provide an associated GigaDB dataset to host the data and files required for transparency and reproducibility. GigaDB is an open-access database. As such, all data submitted to GigaDB must be fully consented for public release (for more information about our data policies, please see our Terms of use page).

The workflow diagram below details a standard submission process:

Workflow diagram of manuscript and data submission process for GigaScience — Workflow overview

This workflow diagram outlines the manuscript and data submission process for GigaScience. It covers the steps from initial manuscript submission to the eventual publication of the dataset.

Workflow Steps

Authors submit manuscript

Is it in scope for GigaScience?

Decision: If no, reject. If yes, continue.

Does manuscript include data?

Decision: If no, no further GigaDB involvement. If yes, continue.

Is data available to peer reviewers?

Decision: If no, provide authors with private FTP login, then authors upload all data files to GigaDB private FTP area and continue. If yes, continue.

Editors send manuscript and private FTP login details to reviewers

Does manuscript pass review?

Decision: If no, either reject or author makes revisions to manuscript and/or data in FTP server, and continue. If yes, continue.

Is all data available?

Decision: If no, gather all required data. If yes, continue.

Is all metadata available?

Decision: If no, gather all required metadata. If yes, continue.

Curator uploads metadata to GigaDB

Did authors confirm dataset page?

Decision: If no, authors liaise with curators to ensure dataset page is complete and correct, then again curators upload metadata to GigaDB and generate dataset page. If yes, publish dataset.

When contacted by curators to process the GigaDB dataset you will be invited to:

Create a GigaDB user account
Upload your prepared data files if not already public (see checklists below)
Supply the appropriate metadata
Proofread and approve the GigaDB pre-publication dataset page

For all datasets the following information will be required. Most of the details will be imported directly from the GigaScience manuscript submission, other details will be requested by the curators.

Item	Imported directly from manuscript	Description
Submitting author	yes	First Name, Last Name, Email, Institution/Company, ORCID.
Author list	yes	First Name, Last Name, ORCID
Dataset title	yes	Manuscript title prefixed with “Supporting data for”
Dataset description	yes	Manuscript abstract
Funding information	yes	Funding body, program, award ID and awardee
Dataset type	no	Selected from controlled vocabulary
Keywords	no	Please list upto 5 keywords, separated by semicolons. All keywords are converted to lowercase.
Additional information links	no	Any URLs to FTP servers or webpages associated with your dataset as semicolon separated lists
Thumbnail image	no	An appropriate image to represent the dataset. Title, Credit, Source and License (CC0 or public domain only) details will be required.
External accessions	no	If any data that you wish to publish in GigaDB has been submitted to to an external resource such as EBI or NCBI, please provide the accession(s) as a semicolon separated list in the format 'SRA:SRPXXXXXX' ; BioProject:PRJNAXXXXXX'
Protocols.io link	no	Where authors provide their methods via protocols.io we can embed these in GigaDB datasets, please provide the published widget URL or DOI

For datasets that include biological sample-related data we would expect the sample metadata to be included in the GigaDB dataset. We understand that the level of sample metadata made available is often limited by sample collection restrictions, but authors should make every effort to provide as comprehensive metadata about samples as is possible.

Below is the list of attributes commonly associated with any biological sample. In addition to these we strongly encourage the inclusion of ALL appropriate attributes, and for specific types of data there are a number of standards that we encourage our users to adopt. Please see the Dataset Type specific checklists for recommendations.

Attribute	Requirement	Description
Sample name, absolutely mandatory field	recommended	Use an alphanumeric string to uniquely identify each sample used in your study, you may use BioSample IDs if you have them.
Species tax ID	recommended	Please enter the NCBI Taxonomy ID for the species used in your study. NB this is mandatory for any sequenced samples.
Species name, absolutely mandatory field	recommended	Please enter the bionomial (Genus species) name for the species of this sample
Description, absolutely mandatory field	recommended	Human readable description of sample, it should be unique within a dataset i.e. no two samples are identical so the description should reflect that.
Geographic location (country and/or sea,region)	recommended	The geographical origin of the sample as defined by the country or sea name followed by specific region name. Country or sea names should be chosen from the INSDC country list
Geographic location (latitude and longitude)	recommended	The geographical origin of the sample as defined by latitude and longitude. The values should be reported in decimal degrees and on WGS84 system e.g. -69.576435, 91.883948
Broad-scale environmental context	recommended	Please add one or more ENVO terms to describe the broad environment in which sampling occurred e.g. cliff [ENVO:00000087]
Local environmental context	recommended	Please add one or more ENVO terms to describe the local environment in which sampling occurred as a semicolon separated list, e.g. digestive tract environment [ENVO:01001033]

For all datasets we expect all data to be available from a stable public open access source and where appropriate we will link directly to external sources rather than duplicate data files.

However if there is no established suitable repository for a particular file/data-type we will host it on our servers.

Where possible, all files should be machine readable without the need for proprietary software (e.g. No PDF, Excel or Word documents).

For all files we host, we expect the following details:

Item	Mandatory	Description
File name	yes	The exact name of the file including relative file path. Ideally it should be unique within the dataset. Filenames should only include the following characters a-z,A-Z,0-9,_,-,+,. Filenames should not include spaces, we recommend using the underscore (_) in place of spaces.
Description	yes	Short human readable description of the file and its contents
Data type	yes	The type of data in the file, selected from a controlled vocabulary
Format	yes	Most common formats are automatically assigned by file extension, but can be updated manually if required.
MD5 #value	yes	These are calculated automatically on our server and added to the database on submitters behalf.
File-Sample association	no	If the sample is derived from a particular sample (in GigaDB) an explicit link can be made between sample(s) and file(s) by adding the Sample ID to the file attributes.
Additional attributes	no	If files have metadata that should be included with them they can be added as attributes, the most common example is Licenses

Due to the nature of scientific publications the files that need to be provided are usually unique to the individual manuscript, however there are some commonalities that we have attempted to capture in a set of minimal checklists for the most common dataset types that we receive. These lists are to be treated as a guide only and there may be changes to them over time.

Please see the Dataset Type specific checklists for recommendations:

GigaDB - Submission Guidelines

General Submission Guidelines

Workflow

Workflow overview

Required metadata

GigaDB - Submission Guidelines

General Submission Guidelines

Workflow

Workflow overview

Workflow Steps

Required metadata