How it works¶

The first objective of the ScrAPD tool set is to create a fully automated pipeline to collect traffic fatality data for the City of Austin.

The pipeline has 2 branches. The first branch (on the left) is for the data extracted from the APD reports and reflects the current data. The second one manages archived data provided by the City of Austin and the state of Texas via TxDoT.

The archived data are published between March and June of the following year (i.e.: CoA pusblished its 2018 data set in March 2019, TxDot dot is supposed to provide the 2018 data around June or July 2019).

pipelines

Current data (APD)¶

APD releases crash reports on the city website.
Every hour, ScrAPD checks for new reports.
If a new report is detected, it is extracted and stored in the raw data set of the current year.
Augmentations are generated and applied to the data set to add missing data (for instance the crash coordinates). This creates the augmented data set.
The raw and augmented data sets are stored on GitHub and data.world in JSON format.

Archived data (CoA and TxDot)¶

CoA and TxDot release their data.
A tool is manually triggered (twice a year) to check for the new data sets.
The new data sets are downloaded.
They are processed and converted to the ScrAPD format.
The data sets are stored on GitHub in JSON format.

FAQ¶

Why do we need two pipelines?¶

The first reason is the difference of delay between sources. APD releases a report when a crash case is closed. CoA and TxDot release their information the year after.

The second reason is that the data sets do not contain exactly the same information and break the queries and graph generation. That's why we have scrapd data sets and archived data sets.

What are augmentations?¶

Augmentations are a mechanim to enhance data sets by adding information from other sources.

An example is the crash coordinates, which are not provided by APD. As a result, we generate an augmentation using an augmenter which converts the crash location to coordinates using the GeoCensus database.

The augmenter specification is defined in the Datasets wiki.