This repository creates user-friendly (tidy) TSVs of data from Scopus and Journal Metrics and converts data to NLM journal IDs for PubMed integration. Data pulled from Scopus include journal subject areas and open access status. Data pulled from Journal Metrics include journal three measures (CiteScore
, SJR
, SNIP
) of journal prestige and a Scopus–ISSN mapping.
Execution is performed by running notebooks in the following order:
1.process-titles.ipynb
to process the raw Scopus title list2.process-metrics.ipynb
to process the raw Journal Metric download3.pubmed.ipynb
to convert Scopus IDs to NLM journal IDs
The data
directory contains the following tidy outputs:
titles.tsv
: IDs and names for titles in Scopustitle-attributes.tsv
: attributes for Scopus titles such as open access status, publisher, and active status (excludes conference proceedings)publishers.tsv
: number of journals per publisher as well as URL-friendly slugs. Redundant or misspelled publisher names can be manually fixed inpublisher-name-patches.tsv
.title-top-levels.tsv
: top-level subject categories for each Scopus titleasjc-codes.tsv
: ASJC (All Science Journal Classification) code definitionssubject-areas.tsv
: ASJC subject areas for each Scopus title
The data
directory contains the following tidy outputs:
issn.tsv
: a mapping between Scopus titles and ISSNs, including linking ISSNs.pubmed-map.tsv
: a Scopus–NLM journal mapping
metrics.tsv.gz
: metrics for Scopus journalspubmed-metrics.tsv.gz
: metrics for PubMed journals
This repository is built from the following publicly-available inputs in download
:
extlistJuly2021.xlsx
: Scopus title list (from "Download Scopus Source List" at source)CiteScore 2011-2020 new methodology - May 2021.xlsb
: Journal Metricspubmed-journals.tsv
: PubMed journal information (source via process-nlm-catalog.ipynb)20210912.ISSN-to-ISSN-L.txt.gz
: The "ISSN-L matching table" is extracted and compressed fromissnltables.zip
which is available upon request from ISSN.
This repository uses conda to manage its environment as specified in environment.yml
.
Install the environment with:
conda env create --file=environment.yml
Then use conda activate scopus
and conda deactivate
to activate or deactivate the environment.
All original work in this repository is dedicated to the public domain under CC0 1.0 Universal. Note that this repository incorporates publicly available datasets that were not explicitly released with a public license. The authors of this repository claim no ownership of this content.