Publish your COVID-19 research data to make it available for the rest of the research community. The data should be deposited in a public repository together with descriptive metadata. For many biological datatypes, there are international databases that can be considered de facto standards.
Submitting data
SciLifeLab (datacentre@scilifelab.se) or NBIS (support@nbis.se) can provide personal consultations for where and how to share data in a public database. Do not hesitate to get in touch with us if you have any questions. Your research group does not have to be affiliated with any particular institution to get our help, we are available to help everyone affiliated with a university in Sweden.
The European Bioinformatics Institute (EBI) hosts many different international data repositories which should be used if appropriate. For further information, see their COVID-19 Data Portal data submission page. For data types where no suitable international repository is available, your data can be deposited to the SciLifeLab Data Repository which is run by the SciLifeLab Data Centre. For human data which needs to be stored in a safe environment with controlled access, SciLifeLab can help with publishing and access control.
Below are our data submission guidelines for each specific data type. you can also find information in The European COVID-19 Data Portal data submission information.
-
Genomics & transcriptomics data
We suggest that raw virus sequence data as well as assembled and annotated genomes are submitted to ENA. ENA have produced documentation to help with submission at SARS-CoV-2 submission. In order to provide further support to users, the Swedish COVID-19 Data Portal team has also produced a detailed tutorial on submission to ENA.
Before submission of raw sequence data (e.g. shotgun sequencing) it is necessary to remove contaminating human reads. Host (human) sequence data requires restricted access, and NBIS is building a local federated version of the European Genome-phenome Archive (EGA) in Sweden (EGA-SE), allowing for the publication of sensitive personal data within a legal framework. Until local EGA is available, the dataset should remain in the secure analysis environment (e.g., at Bianca on Uppmax). SciLifeLab can help with publishing and access control. In any case, we recommend to make a metadata-only record in the SciLifeLab Data Repository with contact details on how to get access, and for which a DOI (i.e., a persistent identifier) can be issued. The DOI can then be used in the article to refer to the dataset. Once the Swedish EGA is operational, and the dataset is deposited there, the access information can be changed to point to the EGA ID. See DOI: 10.17044/NBIS/G000014 for an example.
- The European Nucleotide Archive (ENA)
- ENA SARS-CoV-2 submission tutorial
- SciLifeLab Data Repository for metadata records of sequence data with restricted access
-
Protein data
For a curated list of relevant proteomics repositories see FAIRsharing using the query ’proteomics’.
We recommend to use the PRIDE repository provided by the ProteomeXchange Consortium. The repository admits protein and peptide identification/quantification data with the accompanying mass spectra evidence and any other related data types. Submission is done using the PX Submission Tool.
Other types of proteomics data should also be made available, we recommend SciLifeLab Data Repository. In order to make the data useful and ready for analyses and integration, a detailed description of the data format and how the variables are organized should be provided. Each protein variable should come with a unique identifier such as UniProt ID or ENGS ID (and stating the versions used to link the data).
- PRIDE repository and PX Submission Tool
- SciLifeLab Data Repository for other types of proteomics data
-
Imaging data
Depending on the type of image data you have, different public repositories are available, please see the table at BioImage Archive.
-
Biochemistry
We suggest that users submit data to ChEMBL which is a manually curated database of bioactive molecules with drug-like properties run by EMBL-EBI. It brings together chemical, bioactivity and genomic data to aid the translation of genomic information into effective new drugs.
-
Health data
In cases where data cannot be deposited into a public repository due to privacy restrictions we suggest creating a metadata-only record on the SciLifeLab Data Repository with information about what data is available upon request and how such a request can be made. The repository is managed locally by the SciLifeLab Data Centre, and it allows to obtain a DOI which can then be referred to in the publication.
General data repositories
Most life science data types can be published as raw or processed data in repositories at the EMBL-EBI as described under Data types. When no archive is suitable, use a general purpose repository such as Figshare or Zenodo. This includes documents, presentations, figures, protocols, or other information that you want to make public at any stage in the research process. A publication here is permanent, and provides a Digital Object Identifier, DOI.
SciLifeLab supports COVID-19 related data to be published in the SciLifeLab Data Repository.
If you have datasets related to COVID-19 research that are best suited for this type of repository, please contact the SciLifeLab Data Centre at datacentre@scilifelab.se or visit our repository support page.