[go: up one dir, main page]

Skip to main content

GigaDB - Submission Guidelines

General Submission Guidelines

GigaDB is a China National GeneBank supported repository used to host data and tools associated with articles in GigaScience. As part of your manuscript submission and in line with the Reporting Standards and FAIRsharing guidelines for data deposition and formatting for papers submitted to GigaScience we will provide an associated GigaDB dataset to host the data and files required for transparency and reproducibility. GigaDB is an open-access database. As such, all data submitted to GigaDB must be fully consented for public release (for more information about our data policies, please see our Terms of use page).

Workflow

The workflow diagram below details a standard submission process:

Workflow diagram of manuscript and data submission process for GigaScience

Workflow overview

This workflow diagram outlines the manuscript and data submission process for GigaScience. It covers the steps from initial manuscript submission to the eventual publication of the dataset.

Workflow Steps

  1. Authors submit manuscript
  2. Is it in scope for GigaScience?
  3. Decision: If no, reject. If yes, continue.
  4. Does manuscript include data?
  5. Decision: If no, no further GigaDB involvement. If yes, continue.
  6. Is data available to peer reviewers?
  7. Decision: If no, provide authors with private FTP login, then authors upload all data files to GigaDB private FTP area and continue. If yes, continue.
  8. Editors send manuscript and private FTP login details to reviewers
  9. Does manuscript pass review?
  10. Decision: If no, either reject or author makes revisions to manuscript and/or data in FTP server, and continue. If yes, continue.
  11. Is all data available?
  12. Decision: If no, gather all required data. If yes, continue.
  13. Is all metadata available?
  14. Decision: If no, gather all required metadata. If yes, continue.
  15. Curator uploads metadata to GigaDB
  16. Did authors confirm dataset page?
  17. Decision: If no, authors liaise with curators to ensure dataset page is complete and correct, then again curators upload metadata to GigaDB and generate dataset page. If yes, publish dataset.

When contacted by curators to process the GigaDB dataset you will be invited to:

  • Create a GigaDB user account
  • Upload your prepared data files if not already public (see checklists below)
  • Supply the appropriate metadata
  • Proofread and approve the GigaDB pre-publication dataset page

Required metadata

For all datasets the following information will be required. Most of the details will be imported directly from the GigaScience manuscript submission, other details will be requested by the curators.


Item Imported directly from manuscript Description
Submitting author yes First Name, Last Name, Email, Institution/Company, ORCID.
Author list yes First Name, Last Name, ORCID
Dataset title yes Manuscript title prefixed with “Supporting data for”
Dataset description yes Manuscript abstract
Funding information yes Funding body, program, award ID and awardee
Dataset type no Selected from controlled vocabulary
Keywords no Please list upto 5 keywords, separated by semicolons. All keywords are converted to lowercase.
Additional information links no Any URLs to FTP servers or webpages associated with your dataset as semicolon separated lists
Thumbnail image no An appropriate image to represent the dataset. Title, Credit, Source and License (CC0 or public domain only) details will be required.
External accessions no If any data that you wish to publish in GigaDB has been submitted to to an external resource such as EBI or NCBI, please provide the accession(s) as a semicolon separated list in the format 'SRA:SRPXXXXXX' ; BioProject:PRJNAXXXXXX'
Protocols.io link no Where authors provide their methods via protocols.io we can embed these in GigaDB datasets, please provide the published widget URL or DOI

For datasets that include biological sample-related data we would expect the sample metadata to be included in the GigaDB dataset. We understand that the level of sample metadata made available is often limited by sample collection restrictions, but authors should make every effort to provide as comprehensive metadata about samples as is possible.

Below is the list of attributes commonly associated with any biological sample. In addition to these we strongly encourage the inclusion of ALL appropriate attributes, and for specific types of data there are a number of standards that we encourage our users to adopt. Please see the Dataset Type specific checklists for recommendations.


Attribute Requirement Description
Sample name, absolutely mandatory field recommended Use an alphanumeric string to uniquely identify each sample used in your study, you may use BioSample IDs if you have them.
Species tax ID recommended Please enter the NCBI Taxonomy ID for the species used in your study. NB this is mandatory for any sequenced samples.
Species name, absolutely mandatory field recommended Please enter the bionomial (Genus species) name for the species of this sample
Description, absolutely mandatory field recommended Human readable description of sample, it should be unique within a dataset i.e. no two samples are identical so the description should reflect that.
Geographic location (country and/or sea,region) recommended The geographical origin of the sample as defined by the country or sea name followed by specific region name. Country or sea names should be chosen from the INSDC country list
Geographic location (latitude and longitude) recommended The geographical origin of the sample as defined by latitude and longitude. The values should be reported in decimal degrees and on WGS84 system e.g. -69.576435, 91.883948
Broad-scale environmental context recommended Please add one or more ENVO terms to describe the broad environment in which sampling occurred e.g. cliff [ENVO:00000087]
Local environmental context recommended Please add one or more ENVO terms to describe the local environment in which sampling occurred as a semicolon separated list, e.g. digestive tract environment [ENVO:01001033]



For all datasets we expect all data to be available from a stable public open access source and where appropriate we will link directly to external sources rather than duplicate data files.

However if there is no established suitable repository for a particular file/data-type we will host it on our servers.

Where possible, all files should be machine readable without the need for proprietary software (e.g. No PDF, Excel or Word documents).


For all files we host, we expect the following details:



Item Mandatory Description
File name yes The exact name of the file including relative file path. Ideally it should be unique within the dataset. Filenames should only include the following characters a-z,A-Z,0-9,_,-,+,. Filenames should not include spaces, we recommend using the underscore (_) in place of spaces.
Description yes Short human readable description of the file and its contents
Data type yes The type of data in the file, selected from a controlled vocabulary
Format yes Most common formats are automatically assigned by file extension, but can be updated manually if required.
MD5 #value yes These are calculated automatically on our server and added to the database on submitters behalf.
File-Sample association no If the sample is derived from a particular sample (in GigaDB) an explicit link can be made between sample(s) and file(s) by adding the Sample ID to the file attributes.
Additional attributes no If files have metadata that should be included with them they can be added as attributes, the most common example is Licenses


Due to the nature of scientific publications the files that need to be provided are usually unique to the individual manuscript, however there are some commonalities that we have attempted to capture in a set of minimal checklists for the most common dataset types that we receive. These lists are to be treated as a guide only and there may be changes to them over time.

Please see the Dataset Type specific checklists for recommendations:

If you have any questions, please contact us at database@gigasciencejournal.com.