Checklist for Data
New Project
Planning
Purpose?
Audience?
Desired format?
Check Schedule to see priority
* Calculate Timeline
Documentation
* Find old documentation
* Read definitions/previous documentation/official documentation
* Add to or create new documentation
Determine resources needed:
* Identify people who need to be involved
* Have supplies?
* Look at previous years’ format
Make decisions
* File structure
* Choose Medium
* If Access, build db schema
* Determine whether needed:
* Communication Strategy
--->>>Before Diving in:
* Publish a raw copy on Tableau Private. - CRITICAL STEP.
* Serves as a 1st back-up
* Review raw data - CRITICAL STEP
* Check record counts of raw data.
* Do counts make sense? Validate with other sources if possible.
* Record the raw values
* Records Count - CRITICAL STEP.
* Overview of data
* What is the source?
* Do I have access to the source? If not, do I need to get it, can I get it,
etc.
* For example, I’ve started working on a data source that I did not have
access to, so I had to start over with everything when I realized it. I created
a .csv file of their data source (Tableau) and then I was able to manipulate it the
way I needed (join with another table, and create Row-Level-Security).
* If data replaced, check all calculations against the primary.
* For any joins, check the data types to ensure they are the same
* * Review for comments that may need to be expedited to Mandatory
Reporting (depends on the process)
* Daily (every time you export and/or import)
Export
Save Raw (.txt) & Create Working (.xlsx)
Cleaning
* If errors: check previous imports, re-export if possible.
* Records Count
* Descriptives (light to clean)
* Validate (Random Checks)
* Frequencies
* Make decisions about
* Missing data – create problems, so best to fix rather than delete. Be
consistent.
* Is data self-report or archival data?
* Unclear data – clear enough to put in category? If yes, do, if not,
missing.
* Double+/No answer (code as separate: 98 for 2+, 99 for no answer)
* Unfinished surveys – depends on context. E.g. for some surveys we tag on
extra questions at the end that may not be necessary for the overall purpose of the
survey (e.g. VE-135 the last few questions at Aims were about the program).
* Recoding & calculating
* Document process, errors, and observations for future changes.
Import
Cleaning:
* Records count
* Fill in missing data & recode
* Descriptives (heavy for baseline and checks)
* Frequencies
* Charts/Graphs
* Tables
Save As (using naming/file convention)
Maintenance
* Back-up
* Compact & Repair Database