An API client (usable as a command-line script or as a Python library) for exporting dataset metadata from CKAN sites to Excel-compatible CSV files.
To install run:
pip install ckanapi-exporter
ckanapi-exporter --url 'https://demo.ckan.org' \
--column "Title" --pattern '^title$' > output.csv
This searches each dataset on demo.ckan.org for fields matching the
regular expression
^title$
(the --pattern
argument) and puts the values into a
column called "Title" in the CSV file (the --column
argument). It'll create
an output.csv
file something like this:
Title |
---|
Senior Salaries Information |
Demo Data for Open Data in 1 Day - Spending Over £500 |
UK Cat Burglaries |
... |
You can add as many columns as you want: just add a --column
and a
--pattern
argument for each column. The title of the column in the CSV file
can be anything you want - it doesn't have to match the name of the field in
CKAN. Let's add a second column titled "Rights" that contains the
license_title
fields from the datasets:
ckanapi-exporter --url 'https://demo.ckan.org' \
--column "Title" --pattern '^title$' \
--column "Rights" --pattern '^license_title$' > output.csv
Title | Rights |
---|---|
Senior Salaries Information | Creative Commons Attribution |
Demo Data for Open Data in 1 Day - Spending Over £500 | Creative Commons CCZero |
UK Cat Burglaries | UK Open Government Licence (OGL) |
... | ... |
The ckanapi-exporter calls the package_search
API action and you can pass in related query parameters by using the --params
argument and passing in a string formated as a dictionary. Each key: value pair
represents a query passed to the API call.
For example if you wanted to only export datasets between a date range you can
pass in the fq
(filtered query) parameter and use metadata_created
to filter
the results.
ckanapi-exporter --url 'https://demo.ckan.org' \
--params "{'fq':'metadata_created:[2017-01-01T00:00:00Z TO 2017-01-31T23:59:99.999Z]'}" \
--column "Title" --pattern '^title$' \
--column "Rights" --pattern '^license_title$' > output.csv
You can apply certain transformations to the values from the datasets.
For example, let's add a third column with the first 50 characters of each
dataset's description (the notes
field in the CKAN API):
ckanapi-exporter --url 'https://demo.ckan.org' \
--column "Title" --pattern '^title$' \
--column "Rights" --pattern '^license_title$' \
--column "Description" --pattern '^notes$' --max-length 50 > output.csv
Title | Rights | Description |
---|---|---|
Senior Salaries Information | Creative Commons Attribution | Demo information about senior salaries from 11/04/ |
Demo Data for Open Data in 1 Day - Spending Over £500 | Creative Commons CCZero | Data on spending over £500 generated for Open Data |
UK Cat Burglaries | UK Open Government Licence (OGL) | A record of cat burgalries, listing the cat names, |
... | ... | ... |
Let's add a column containing the formats of each datasets' resources:
ckanapi-exporter --url 'https://demo.ckan.org' \
--column "Title" --pattern '^title$' \
--column "Rights" --pattern '^license_title$' \
--column "Description" --pattern '^notes$' --max-length 50 \
--column Formats --pattern '^resources$' '^format$' > output.csv
This time the pattern has two arguments: --pattern '^resources$' '^format$'
.
This means find the "resources" field of each dataset and then find the
"format" field of each resource. When a dataset has more than one resource
the formats will be combined into a quoted, comma-separated list in a single
table cell. It'll create a CSV file something like this:
Title | Rights | Description | Formats |
---|---|---|---|
Senior Salaries Information | Creative Commons Attribution | Demo information about senior salaries from 11/04/ | XLSX, CSV |
Demo Data for Open Data in 1 Day - Spending Over £500 | Creative Commons CCZero | Data on spending over £500 generated for Open Data | CSV, CSV, CSV, CSV |
UK Cat Burglaries | UK Open Government Licence (OGL) | A record of cat burgalries, listing the cat names, | JPEG, CSV, CSV |
... | ... | ... | ... |
CSV is repeated a lot because lots of the datasets have multiple CSV resources.
You can add the --deduplicate
option to the column to remove the duplication:
ckanapi-exporter --url 'https://demo.ckan.org' \
--column "Title" --pattern '^title$' \
--column "Rights" --pattern '^license_title$' \
--column "Description" --pattern '^notes$' --max-length 50 \
--column Formats --pattern '^resources$' '^format$' --deduplicate \
> output.csv
Title | Rights | Description | Formats |
---|---|---|---|
Senior Salaries Information | Creative Commons Attribution | Demo information about senior salaries from 11/04/ | XLSX, CSV |
Demo Data for Open Data in 1 Day - Spending Over £500 | Creative Commons CCZero | Data on spending over £500 generated for Open Data | CSV |
UK Cat Burglaries | UK Open Government Licence (OGL) | A record of cat burgalries, listing the cat names, | JPEG, CSV |
... | ... | ... | ... |