Hands-on Activity: 2.
Exploring the Semi-structured
Data Model of JSON
Learning Goals:
By the end of this activity, you will be able to:
1. Display the nested structure of a JSON file.
2. Extract data from a JSON file.
Instructions:
Step 1. Open a terminal shell. Open a terminal shell by clicking on the square black box on the top
left of the screen.
Run cd Downloads/lect4data/json to change into the directory containing the JSON file.
Step 2. Look at JSON file. Let's look at the contents of the JSON file:
more twitter.json
Press the spacebar to go down and q to quit more.
The contents of the file is difficult to understand since it is packed together.
Step 3. View JSON schema. We can view the schema of the JSON file by running schema.py :
./json_schema.py twitter.json | more
The top-level fields are contributors, truncated, text, etc. Some fields have nested fields, such as
entities, which contains symbols, media, hashtags, etc. If go you down (press spacebar), you will see
multiple levels of nesting.
Enter q to quit more.
Step 4. Extract values in JSON data. We can extract individual values from fields within the JSON
data by running print_json.py:
./print_json.py
The print_json.py asks for the file name, tweet number, and path to extract. The path is the path to
the field in the schema.
Let's look at the value for the text field in the 99th tweet. First, enter twitter.json for the filename:
Next, enter 99 for the number:
Next, enter text for the path:
Note: you may remember the field text from the schema:
The result is:
Now let's find the value for retweeted_status/retweet_count in the 99th tweet. The retweet_count
field is nested in the retweeted_status field, so we enter retweeted_status/retweet_count for the
path: