Big Query
Big Query
Log in to BigQuery
When you log in to BigQuery using the landing page, you will automatically open your project space.
This is a high-level overview of your project, including the project information and the current
resources being used. From here, you can check your recent activity.
Navigate to your project’s BigQuery Studio by selecting BigQuery from the navigation menu and
BigQuery Studio from the dropdown menu.
BiqQuery Studio components
Once you have navigated to BigQuery from the project space, most of the major components of the
BigQuery console will be present: the Navigation pane, the Explorer pane, and the SQL Workspace.
On the console page, find the Navigation pane. This is how you navigate from the project space to
the BigQuery tool. This menu also contains a list of other Google Cloud Project (GCP) data tools.
During this program, you will focus on BigQuery, but it’s useful to understand that the GCP has a
collection of connected tools data professionals use every day.
The Explorer pane lists your current projects and any starred projects you have added to your
console. It’s also where you’ll find the + ADD button, which you can use to add datasets.
This button opens the Add dialog that allows you to open or import a variety of datasets.
Add Public Datasets
BigQuery offers a variety of public datasets from the Google Cloud Public Dataset Program. Scroll
down the Add dialog to the Public Datasets option.
Select Public Datasets. This takes you to the Public Datasets Marketplace, where you can search for
and select public datasets to add to your BigQuery console. For example, search for the "noaa
lightning" dataset in the Marketplace search bar. When you search for this dataset, you will find
NOAA’s Cloud-to-Ground Lightning Strikes data.
Select the dataset to read its description. Select View dataset to create a tab of the dataset’s
information within the SQL workspace.
The Explorer Pane lists the noaa_lightning and other public datasets.
You added the public noaa_lightning dataset to your BigQuery Workspace, so the Explorer pane
displays the noaa_lightning dataset, along with the list of other public datasets. These datasets are
nested under bigquery-public-data. Star bigquery-public-data by navigating to the top of the Explorer
pane and selecting the star next to bigquery-public-data.
Starring bigquery-public-data will enable you to search for and add public datasets by scrolling in the
Explorer pane or by searching for them in the Explorer search bar.
For example, you might want to select a different public dataset. If you select the second dataset,
"austin_311," it will expand to list the table stored in it, “311_service_requests.”
The Explorer pane with the “bigquery-public data” and “austin_311” datasets expanded, revealing
the “311_service_requests” table
When you select a table, its information is displayed in the SQL Workspace. Select the
311_service_requests table to examine several tabs that describe it, including:
Details, which contains additional metadata, such as the creation date of the dataset
The final menu pane in your console is the SQL Workspace. This is where you will actually write and
execute queries in BigQuery.
The SQL Workspace also gives you access to your personal and project history, which stores a record
of the queries you’ve run. This can be useful if you want to return to a query to run it again or use
part of it in another query.
In addition to offering access to public datasets, BigQuery also gives you the ability to upload your
own data directly into your workspace. Access this feature by opening the + ADD menu again or by
clicking the three vertical dots next to your project’s name in the Explorer pane. This will give you the
option to create your own dataset and upload your own tables. You will have the opportunity to
upload your own data in an upcoming activity to practice using this feature!
Key takeaways
BigQuery's SQL workspace allows you to search for public datasets, run SQL queries, and even upload
your own data for analysis. Whether you're working with public datasets, running SQL queries, or
uploading your own data, BigQuery’s SQL workspace offers a range of features to support all kinds of
data analysis tasks. Throughout this program, you will be using BigQuery to practice your SQL skills,
so being familiar with the major components of your BigQuery console will help you navigate it
effectively in the future!
This reading provides you with the steps the instructor performs in the following video, BigQuery in
action. The video focuses on how to create a query to view a small section of data from a large
dataset.
Keep this guide open as you watch the video. It can serve as a helpful reference if you need
additional context or clarification while following the video steps. This is not a graded activity, but
you can complete these steps to practice the skills demonstrated in the video.
To follow along with the examples in this video, log in to your BigQuery account and follow the
instructions to star bigquery-public-data in The Explorer pane section of the previous reading, Get
Started with BigQuery.
A database is a collection of data stored in a computer system. Query languages such as SQL enable
communication between databases and data analysts. You discovered earlier that a relational
database is made up of several tables that may be joined together to create relationships. Primary
and foreign keys serve as representations of these relationships. To extract data from these tables,
data analysts use queries. To learn more about that, explore BigQuery in action:
1. Log in to BigQuery and go to your console. You should find the Welcome to your SQL
Workspace! landing page open. Select COMPOSE A NEW QUERY In the Bigquery console.
Make sure that no tabs are open so that the entire workspace is displayed, including the
Explorer pane.
2. Enter sunroof in the search bar. In the search results, expand sunroof_solar and then select
the solar_potential_by_postal_code dataset.
3. Observe the Schema tab of the Explorer pane to explore the table fields.
4. Select the Preview tab to view the regions, states, yearly sunlight, and more.
Example 2: Writing a query
In order to view the entire dataset, you will need to write a query.
1. The first step is finding out the complete, correct path to the table you want to work with.
Select the ellipses (three vertical dots) by the dataset solar_potential_by_postal_code, then
select Query. A new tab will populate on your screen. Select the tab. The path to the table
should be written inside two backticks.
2. Select the full path by highlighting the text including the backticks and copy it. (Note: You can
also get the full path to the project, database, and table directly by clicking the ellipses next
to the table's name in the Explorer panel on the left and selecting Copy ID.)
3. Now, click on the plus sign to create a new query. Notice that BigQuery doesn’t automatically
generate a SELECT statement in this window. Enter SELECT and add a space after it.
4. Put an asterisk * after SELECT to indicate you want to return the entire dataset. The asterisk
lets the database know to include all columns. Without this shortcut, you would have to
manually enter every column name!
5. Next, press the Enter/Return key and enter FROM on the second line. FROM indicates where
the data is coming from. After FROM, add another space.
6. Paste in the path to the table that you copied earlier. It will read `bigquery-public-
data.sunroof_solar.solar_potential_by_postal_code`
Many of the public databases on BigQuery are living records and, as such, are periodically updated
with new data. Throughout this course (and others in this certificate program), if your results differ
from those you encounter in videos or screenshots, there's a good chance it is due to a data refresh.
You can verify when a table has been refreshed by selecting it from the Explorer panel and clicking
Details. You'll find the date the table was created, when it was last modified, as well as other useful
information.
1. For example, you might only need data from Pennsylvania. You’d begin your query the same
way you just did in the previous examples: Click on the plus sign, enter SELECT, add a space,
an asterisk (*), and then press Enter/Return.
3. This time, add WHERE. It will be on the same line as the FROM statement. Add a space and
enter state_name with a space before state and a space after name. state_name is a column
name in the table.
4. Because you only want data from Pennsylvania, add = and 'Pennsylvania' on the same line as
state_name. In SQL, single quotes represent the beginning and ending of a string.
6. Review the data on solar potential for Pennsylvania. Scroll through the query results.
Keep in mind that SQL queries can be written in a lot of different ways and still return the same
results. You might discover other ways to write these queries!
Think of a database like a well-organized warehouse. In this warehouse, data is stored in tables,
which are like different sections for storing specific types of items. For example, one table might
contain information about customers, while another table might store data about their orders.
Now, imagine you want to find information about a specific customer and their order history. You
wouldn't want to search through the entire warehouse, right? That's where SQL queries come in.
A SQL query is like a request you send to the database, asking it to retrieve specific information from
those tables. It's like giving the warehouse manager a detailed list of what you need, and they fetch it
for you.
SELECT: This part of the query specifies what information you want to retrieve. It's like
saying, "I want to see the customer's name and order date."
FROM: This tells the database where to find the information. You specify the table or tables
containing the data you need. It's like saying, "Get this information from the 'Customers' and
'Orders' tables."
WHERE: This is where you get specific. The "WHERE" clause lets you filter the data based on
certain conditions. For example, you could say, "Show me only the orders placed by
customers living in California."
SQL queries can be much more complex than this, allowing you to perform calculations, sort data,
group information, and much more. The key is understanding the basic building blocks and how they
work together to help you extract meaningful insights from your data.
Keep practicing writing queries, and you'll become more comfortable with the language and its
capabilities. Remember, every data analyst starts somewhere, and with persistence, you'll master
these skills in no time!
You have recently been introduced to BigQuery, a data warehouse on Google Cloud that data
analysts use to query, filter large datasets, aggregate results, and perform complex operations. In this
activity, you will explore the BigQuery interface; upload public data to your console; and write some
simple SQL queries using SELECT, FROM, and WHERE.
By the time you complete this activity, you will be more familiar with writing queries in the BigQuery
interface. This will enable you to practice SQL, which is important for working with databases in your
career as a data analyst.
Step-By-Step Instructions
Follow the instructions to complete each step of the activity. Then answer the questions at the end
of the activity before going to the next course item.
For this activity, you will need a BigQuery account. If you haven’t made one already, follow the
instructions from the Using BigQuery reading.
1. Log in to BigQuery.
2. Select the Go to console button on the BigQuery homepage. This will open a new tab with your
console.
3. Take a moment to explore your console. The Explorer menu includes a search bar you can use to
find resources, pinned projects, and the + ADD button for adding data. The Editor welcome page is
where you will navigate to a query editor, try sample data, add local data files, add Google cloud
storage, or add other external connections. You can also find your job history, query history, and
saved queries here.
In order to start writing queries, you will need some data to work with. Once you’re familiar with the
BigQuery interface, you can access a public dataset directly from your console.
1. Select the search bar in the Explorer pane.
2. Enter “london bicycle” in the search box and press enter; this will return the london_bicycles
database from the Greater London Authority. Select the database for more details. If you cannot find
it, make sure you're searching in all projects. The london_bicycles database is in the bigquery-public-
data project.
3. Select the arrow to the left of the london_bicycles database name. This expands the dataset to
reveal two table names: cycle_hire and cycle_stations. Select the cycle_hire table name within the
Explorer pane.
Screenshot of BigQuery. The titles cycle_hire and cycle_stations are shown beneath london_bicycles
in the explorer pane. Cycle_hire is selected.
This will pull the cycle_hire schema into the console. Take a moment to explore the field names and
the associated information.
4. Now, select the PREVIEW tab to find a sample of the data that you’ll be working with.
So far, you’ve learned three basic parts of a query: SELECT, FROM, and WHERE. As a refresher:
SELECT is the section of a query that indicates what data you want SQL to return to you.
FROM is the section of a query that indicates which table the desired data comes from. You
must provide a full path to the table. The path includes the project name, database name,
and table name, each separated by a period.
WHERE is the section of a query that indicates any filters you’d like to apply to your table.
1. Select the Blue + button or QUERY - In new tab to start a new query.
2. Start your query with a SELECT clause, and indicate which column you want to select from the
table; in this case, you’ll input end_station_name.
3. After you have indicated which column you are selecting, write your FROM clause. Specify the
table you are querying from by inputting the following location: `bigquery-public-
data.london_bicycles.cycle_hire`;
SELECT
end_station_name
FROM
`bigquery-public-data.london_bicycles.cycle_hire`;
This query may take a few seconds to execute. Once it has finished, you will find the list of station
names you requested under the Query Results console pane.
After running the first basic query, try answering a specific question about the data. For example,
how many bike trips lasted for 20 minutes or longer?
1. Select the Blue + button or QUERY - In new tab to start a new query. Start with your SELECT
statement again. This time, include the two columns duration and start_station_name in the query.
The data in these columns will tell where the trip started and the duration of the trip. Be sure to
separate each column name with a comma.
2. Next, add your FROM statement. You will be using the same table as the previous query: FROM
`bigquery-public-data.london_bicycles.cycle_hire`;. Note: The backticks around the table in this line
of code are optional.
3. Finally, add a WHERE statement to specify that you want to filter for only bike rides 20 minutes or
longer. If you check the preview of this data, you might notice that the duration is recorded in
seconds, so you’ll specify 1200 seconds in your query. Write that as WHERE duration >= 1200;
1
2
SELECT
duration,
start_station_name
FROM
`bigquery-public-data.london_bicycles.cycle_hire`
WHERE
This query may take a few seconds to execute. Once it has finished, you will find a list of rides from
this table that fit your criteria. There are millions of rows with bike trips that are 20 minutes or
longer!
If you’re comfortable using queries to answer questions, try creating and running queries to
complete the tasks below:
Return all the rental_ids, station IDs, and station names that bike_id 1710 started from.
Use the solutions doc to check your work: Intro to BigQuery Solutions
DOCX File
1.
Question 1
Reflection
SELECT
end_station_name
FROM
`bigquery-public-data.london_bicycles.cycle_hire`
WHERE
rental_id = 57635395;
At what station did the bike trip with rental_id 57635395 end?
1 point
2.
Question 2
In this activity, you had an opportunity to get more familiar with BigQuery and writing SQL queries. In
the text box below, write 2-3 sentences (40-60 words) in response to each of the following questions:
How do you think you can use public datasets in BigQuery to help develop your data analysis
skills?
How do you think understanding basic query syntax will help you write more complex
queries in the future?
1 point
These are the solutions to the Intro to BigQuery activity. You can refer to this document if you
are troubleshooting your own queries. You can also copy and paste these solutions to test them
SELECT
start_station_name
FROM
`bigquery-public-data.london_bicycles.cycle_hire`
WHERE
start_station_id = 111;
2. Return all of the rental_ids, station IDs, and station names that bike_id 1710 started
from.
SELECT
rental_id,
start_station_id,
start_station_name
FROM
`bigquery-public-data.london_bicycles.cycle_hire`
WHERE
bike_id = 1710;
SELECT
bike_model
FROM
`bigquery-public-data.london_bicycles.cycle_hire`
WHERE
bike_id = 58782;
With SQL, capitalization usually doesn’t matter. You could enter SELECT or select or SeLeCT. They all
work! But if you use capitalization as part of a consistent style your queries will look more
professional.
To enter SQL queries like a pro, it is always a good idea to use all caps for clause starters (e.g. SELECT,
FROM, WHERE, etc.). Functions should also be in all caps (e.g. SUM()). Column names should be all
lowercase (refer to the section on snake_case later in this guide). Table names should be in
CamelCase (refer to the section on CamelCase later in this guide). This helps keep your queries
consistent and easier to read while not impacting the data that will be pulled when you run them.
The only time that capitalization does matter is when it is inside quotes (more on quotes below).
Vendors of SQL databases may use slightly different variations of SQL. These variations are called SQL
dialects. Some SQL dialects are case sensitive. BigQuery is one of them. Vertica is another. But most,
like MySQL, PostgreSQL, and SQL Server, aren’t case sensitive. This means if you searched for
country_code = ‘us’, it will return all entries that have 'us', 'uS', 'Us', and 'US'. This isn’t the case with
BigQuery. BigQuery is case sensitive, so that same search would only return entries where the
country_code is exactly 'us'. If the country_code is 'US', BigQuery wouldn’t return those entries as
part of your result.
For the most part, it also doesn’t matter if you use single quotes ' ' or double quotes " " when
referring to strings. For example, SELECT is a clause starter. If you put SELECT in quotes like 'SELECT'
or "SELECT", then SQL will treat it as a text string. Your query will return an error because your query
needs a SELECT clause.
But there are two situations where it does matter what kind of quotes you use:
Within each SQL dialect there are rules for what is accepted and what isn’t. But a general rule across
almost all SQL dialects is to use single quotes for strings. This helps get rid of a lot of confusion. So if
we want to reference the country US in a WHERE clause (e.g. country_code = 'US'), then use single
quotes around the string 'US'.
The second situation is when your string has quotes inside it. Suppose you have a column
favorite_food in a table called FavoriteFoods and the other column corresponds to each friend.
friend favorite_food
You might notice how Rachel’s favorite food contains an apostrophe. If you were to use single quotes
in a WHERE clause to find the friend who has this favorite food, it would look like this:
SELECT
friend
FROM
FavoriteFoods
WHERE
This won’t work. If you run this query, you will get an error in return. This is because SQL recognizes
a text string as something that starts with a quote ' and ends with another quote '. So in the bad
query above, SQL thinks that the favorite_food you are looking for is 'Shepherd', because the
apostrophe in Shepherd's ends the string.
Generally speaking, this should be the only time you would use double quotes instead of single
quotes. So your query would look like this instead:
SELECT
friend
FROM
FavoriteFoods
WHERE
SQL understands text strings as either starting with a single quote ' or double quote ". Since this
string starts with double quotes, SQL will expect another double quote to signal the end of the string.
This keeps the apostrophe safe, so it will return "Shepherd's pie" and not 'Shepherd'.
Comments as reminders
As you get more comfortable with SQL, you will be able to read and understand queries at a glance.
But it never hurts to have comments in the query to remind yourself of what you are trying to do.
And if you share your query, it also helps others understand it.
For example:
--This is an important query used later to join with the accounts table
SELECT
FROM
Publishers
You can use # in place of the two dashes, --, in the above query but keep in mind that # isn’t
recognized in all SQL dialects (MySQL doesn’t recognize #). So it is best to use -- and be consistent
with it. When you add a comment to a query using --, the database query engine will ignore
everything in the same line after --. It will continue to process the query starting on the next line.
It is important to always make sure that the output of your query has easy-to-understand names. If
you create a new column (say from a calculation or from concatenating new fields), the new column
will receive a generic default name (e.g. f0). For example:
2
3
SELECT
SUM(tickets),
COUNT(tickets),
SUM(tickets) AS total_tickets,
COUNT(tickets) AS number_of_purchases
FROM
Purchases
Results are:
f0 f1 total_tickets number_of_purchases
8 4 8 4
The first two columns are named f0 and f1 because they weren’t named in the above query. SQL
defaults to f0, f1, f2, f3, and so on. We named the last two columns total_tickets and
number_of_purchases so these column names show up in the query results. This is why it is always
good to give your columns useful names, especially when using functions. After running your query,
you want to be able to quickly understand your results, like the last two columns we described in the
example.
On top of that, you might notice how the column names have an underscore between the words.
Names should never have spaces in them. If total_tickets had a space and looked like total tickets
then SQL would throw a syntax error because it wouldn't know what to do with the second word
(tickets). So, spaces are bad in SQL names. Never use spaces.
The best practice is to use snake_case. This means that 'total tickets', which has a space between the
two words, should be entered as total_tickets with an underscore instead of a space.
You can also use CamelCase capitalization when naming your table. CamelCase capitalization means
that you capitalize the start of each word, like a two-humped (Bactrian) camel. So the table
TicketsByOccasion uses CamelCase capitalization. Please note that the capitalization of the first word
in CamelCase is optional; camelCase is also used. Some people differentiate between the two styles
by calling CamelCase, PascalCase, and reserving camelCase for when the first word isn't capitalized,
like a one-humped (Dromedary) camel; for example, ticketsByOccasion.
At the end of the day, CamelCase is a style choice. There are other ways you can name your tables,
including:
All lower or upper case, like ticketsbyoccasion or TICKETSBYOCCASION
Keep in mind, the option with all lowercase or uppercase letters can make it difficult to read your
table name, so it isn’t recommended for professional use.
The second option, snake_case, is technically okay. With words separated by underscores, your table
name is easy to read, but it can get very long because you are adding the underscores. It also takes
more time to enter. If you use this table a lot, it can become a chore.
In summary, it is up to you to use snake_case or CamelCase when creating table names. Just make
sure your table name is easy to read and consistent. Also be sure to find out if your company has a
preferred way of naming their tables. If they do, always go with their naming convention for
consistency.
Indentation
As a general rule, you want to keep the length of each line in a query <= 100 characters. This makes
your queries easy to read. For example, check out this query with a line with >100 characters:
SELECT
CASE WHEN genre = 'horror' THEN 'Will not watch' WHEN genre = 'documentary'
FROM
MovieTheater
GROUP BY
This query is hard to read and just as hard to troubleshoot or edit. Now, here is a query where we
stick to the <= 100 character rule:
2
3
10
SELECT
CASE
FROM
MovieTheater
GROUP BY
Now it is much easier to understand what you are trying to do in the SELECT clause. Sure, both
queries will run without a problem because indentation doesn’t matter in SQL. But proper
indentation is still important to keep lines short. And it will be valued by anyone reading your query,
including yourself!
Multi-line comments
If you make comments that take up multiple lines, you can use -- for each line. Or, if you have more
than two lines of comments, it might be cleaner and easier is to use /* to start the comment and */
to close the comment. For example, you can use the -- method like below:
7
-- Date: September 15, 2020
SELECT
FROM
table
*/
SELECT
FROM
table
In SQL, it doesn’t matter which method you use. SQL ignores comments regardless of what you use:
#, --, or /* and */. So it is up to you and your personal preference. The /* and */ method for multi-
line comments usually looks cleaner and helps separate the comments from the query. But there
isn’t one right or wrong method.
When you join a company, you can expect each company to use their own SQL platform and SQL
dialect. The SQL platform they use (e.g. BigQuery, MySQL, or SQL Server) is where you will enter and
run your SQL queries. But keep in mind that not all SQL platforms provide native script editors to
enter SQL code. SQL text editors give you an interface where you can enter your SQL queries in an
easier and color-coded way. In fact, all of the code we have been working with so far was entered
with an SQL text editor!
If your SQL platform doesn’t have color coding, you might want to think about using a text editor like
Sublime Text or Atom. This section shows how SQL is displayed in Sublime Text. Here is a query in
Sublime Text:
With Sublime Text, you can also do advanced editing like deleting indents across multiple lines at the
same time. For example, suppose your query somehow had indents in the wrong places and looked
like this:
This is really hard to read, so you will want to eliminate those indents and start over. In a regular SQL
platform, you would have to go into each line and press BACKSPACE to delete each indent per line.
But in Sublime, you can get rid of all the indents at the same time by selecting all lines and pressing
Command (or CTRL in Windows) + [. This eliminates indents from every line. Then you can select the
lines that you want to indent (i.e., lines 2, 4, and 6) by pressing the Command key (or the CTRL key in
Windows) and selecting those lines. Then while still holding down the Command key (or the CTRL key
in Windows), press ] to indent lines 2, 4, and 6 at the same time. This will clean up your query and
make it look like this instead:
Sublime Text also supports regular expressions. Regular expressions (or regex) can be used to search
for and replace string patterns in queries. We won’t cover regular expressions here, but you might
want to learn more about them on your own because they are a very powerful tool.
Search in course
Search
English
Hide menu
o
Video: VideoGet to know BigQuery, including sandbox and billing options
. Duration: 1 hour1h
Ungraded App Item: Ungraded App ItemHands-On Guided Project: Select a new city using BigQuery
. Duration: 1 hour1h
Practice Quiz: Hands-On Activity: Choose the right tool for the job
. Duration: 1 hour1h
. Duration: 1 hour1h
Practice Quiz: Test your knowledge on using SQL with large datasets
Module 3 challenge
1.
2. Module 3
PreviousNext
Ready to review what you’ve learned before taking the quiz? I’m here to help.
Resume assignment
Receive grade
To Pass100% or higher
Your grade
-Not available
Like
Dislike
Report an issue
Hi, Dr!
Help me practice
Let's chat
Coach is powered by AI, so check for mistakes and don't share sensitive info. Your data will be used in
accordance with Coursera's Privacy Notice.
Ctrl+M
Back
English
Recently, you’ve been thinking about identifying good data sources that would be useful for analysis.
You also spent some time in a previous activity exploring a public dataset in BigQuery and writing
some basic SQL queries. In addition to using public data in BigQuery, your future data career will
involve importing data from other sources. In this activity, you will create a custom table and dataset
that you’ll load into a new table and query.
By the time you complete this activity, you will be able to load your own data into BigQuery for
analysis. This will enable you to import your own data sources into BigQuery, which is a skill that will
enable you to more effectively analyze data from different sources.
Step-By-Step Instructions
Follow the instructions to complete each step of the activity. Then answer the questions at the end
of the activity before going to the next course item.
To get started, download the baby names data zip file. This file contains about 7 MB of data about
popular baby names from the U.S. Social Security Administration website.
Select the link to the baby names data zip file to download it.
Unzip the file you downloaded onto your computer to access it on BigQuery. Once you have
unzipped the file, find a .pdf file titled NationalReadMe that contains more information about the
dataset. This dataset tracks the popularity of baby names for each year; you can find text files labeled
by the year they contain. Open yob2014.txt to preview the data. You will notice that it’s a .csv file
with three columns. Remember where you saved this folder so you can reference it later.
Before uploading your .txt file and creating a table to query, you will need to create a dataset to
upload your data into and store your tables.
1. From the BigQuery console, go to the Explorer pane in your workspace and select the three dots
next to your project to open a menu. From here, select Create dataset. Note that unless you have
already specified your own project name, a unique name is assigned to your project by BigQuery,
typically in the format of two words and a number, separated by hyphens (e.g. loyal-glass-371423 in
the image below).
2. This will open the Create dataset menu. This is where you will fill out information about the
dataset. Input the Dataset ID as babynames and set the Data location to Multi-region (US). Once
you have finished filling out this information, select the blue CREATE DATASET button at the bottom
of the menu.
Step 4: Create a table
Now that you have a custom database stored in your project space, this is where you will add your
table.
1. Select the newly created babynames database. Check the tabs in your Dataset info window and
select the first blue + CREATE TABLE button. This will open another menu in your console.
2. In the Source section, select the Upload option in Create table from. Then select Browse to open
your files. Find and open the yob2014.txt file. Set the file format to .csv. In the Destination section,
in the Table data box, name your table as names_2014. For Schema, select Edit as text and input the
following code:
name:string,
gender:string,
count:integer
3. This will establish the data types of the three columns in the table. Leave the other parameters as
they are, and select Create table.
Once you have created your table titled names_2014, it will appear in your explorer pane under the
database babynames that you created earlier.
4. Select the table to open it in your workspace. Here, you can check the table schema. Then, go to
the Preview tab to explore your data. The table should have three columns: name, gender, and
count.
Now that your table is set up, you’re ready to start writing queries and answering questions about
this data. For example, let’s say you were interested in the top five baby names for boys in the United
States in 2014.
Select COMPOSE NEW QUERY to start a new query for this table. Then, enter this code:
3
4
10
11
SELECT
name,
count
FROM
your-project.babynames.names_2014
WHERE
gender = 'M'
ORDER BY
count DESC
LIMIT
NOTE: Making sure that your FROM statement is correct is key to making this query run! The
database needs the query to tell it the location of the table you just uploaded so that it can fetch the
data. It’s like giving the query a map to your table. That map will include your unique BigQuery
project name (replace your-project in the code above with your unique project name), the database
name (babynames), and the table name (names_2014). The location names for each of these
elements are separated by periods. The final result will be something like this:
loyal-glass-371423.babynames.names_2014
Note that loyal-glass-371423 is just an example of a project name. You must use your project’s name
in your FROM statement.
This query selects the name and count columns from the names_2014 table. Using the WHERE
clause, you are filtering for a specific gender for your results. Then, you’re sorting how you want your
results to appear with ORDER BY. Because you are ordering by the count in descending order, you
will get names and the corresponding count from largest to smallest. Finally, LIMIT tells SQL to only
return the top five most popular names and their counts.
Once you have input this in your console, select RUN to get your query results.
Optional Step 6: Up for a challenge?
If you are comfortable creating your own custom tables, try uploading more files from the baby
names dataset into tables you can query. For example, you could upload each of the files from 2015
to 2019 to find the top baby names for those years.
1.
Question 1
Reflection
After running the query on your new table, what was the third-most popular baby name for boys in
2014?
1 point
Jacob
Mason
Noah
William
2.
Question 2
In this activity, you explored public data in BigQuery and used it to create a custom table. In the text
box below, write 2-3 sentences (40-60 words) in response to each of the following questions:
Why is being able to use data from different sources useful as a data analyst?
How can you use BigQuery custom tables and datasets in your future analysis projects?
1 point
Search in course
Search
English
Hide menu
Ungraded App Item: Ungraded App ItemHands-On Guided Project: Select a new city using BigQuery
. Duration: 1 hour1h
Practice Quiz: Hands-On Activity: Choose the right tool for the job
. Duration: 1 hour1h
. Duration: 1 hour1h
Practice Quiz: Test your knowledge on using SQL with large datasets
Module 3 challenge
1.
2. Module 3
PreviousNext
Hands-On Activity: Choose the right tool for the job
Ready to review what you’ve learned before taking the quiz? I’m here to help.
Resume assignment
Receive grade
To Pass80% or higher
Your grade
-Not available
Like
Dislike
Report an issue
Hi, Dr!
Help me practice
Let's chat
Coach is powered by AI, so check for mistakes and don't share sensitive info. Your data will be used in
accordance with Coursera's Privacy Notice.
Ctrl+M
Back
English
Activity Overview
As a data analytics professional, part of your job will be understanding which tools are the right ones
for the job. At this point, you’ve gotten familiar with spreadsheets and SQL as potential tools for data
analysis. This activity is an opportunity to apply your new knowledge about data analytics tools to
workplace scenarios. After reading each scenario, you will be prompted to choose whether you
would use a spreadsheet or SQL to work with the data and to explain why you selected that tool.
Review each scenario and provide your reflective response before going to the next course item.
1.
Question 1
You are a junior data analyst at a major online pet supplies business. The product development
department has asked you to identify the top 10 products with the highest sales in the last month so
that they can decide what types of products to develop next. These products come from multiple
tables within the company’s database: one containing product details and another with sales data.
Reflection
Consider whether you would use a spreadsheet or SQL in the scenario above.
1 point
2.
Question 2
In this scenario, you work for a small fitness tech company. Your manager has asked you to
determine which cities have a large population of customers; this will help them decide where to
build the next retail store. You have a .csv file with 300 names, phone numbers, and addresses. The
manager also requested a quick, simple visualization, such as a bar chart, so the team can make
some quick comparisons.
Reflection
Consider whether you would use a spreadsheet or SQL in the scenario above.
Now, write 2-3 sentences (40-60 words) in response to these questions. Identify which tool you
would use and then justify your thinking. Enter your response in the text box below.
1 point
3.
Question 3
The marketing team of a fashion retailer conducted a survey on customer satisfaction. The results are
in a .csv file. The team wants to calculate the average satisfaction score and compare it to the
average score from their last survey. They also want to segment responses based on demographic
data such as age, gender, and location. This segmentation will enable the marketing team to tailor
their marketing strategies to meet the specific needs and preferences of various customer groups.
Reflection
Consider whether you would use a spreadsheet or database in the scenario above.
1 point
4.
Question 4
The manager of an online education platform wants to know which courses have the lowest
completion rates so that they can invest in course enhancements and more targeted student support
for those courses. The data is stored in a relational database with separate tables for user
registration, course enrollment, and course completion.
Reflection
Consider whether you would use a spreadsheet or database in the scenario above.
Now, write 2-3 sentences (40-60 words) in response to these questions. Identify which tool you
would use and then justify your thinking. Enter your response in the text box below.
1 point
Search in course
Search
English
D
Hide menu
. Duration: 1 hour1h
o
Reading: ReadingIn-depth guide: SQL best practices
Ungraded App Item: Ungraded App ItemHands-On Guided Project: Select a new city using BigQuery
. Duration: 1 hour1h
Practice Quiz: Hands-On Activity: Choose the right tool for the job
. Duration: 1 hour1h
. Duration: 1 hour1h
Practice Quiz: Test your knowledge on using SQL with large datasets
Module 3 challenge
1.
2. Module 3
PreviousNext
Ready to review what you’ve learned before taking the quiz? I’m here to help.
Resume assignment
Receive grade
To Pass100% or higher
Your grade
-Not available
Like
Dislike
Report an issue
Hi, Dr!
Help me practice
Let's chat
Coach is powered by AI, so check for mistakes and don't share sensitive info. Your data will be used in
accordance with Coursera's Privacy Notice.
Ctrl+M
Back
English
Activity Overview
In previous lessons, you learned how to work with data in spreadsheets. Now, you’ll practice using
SQL to work with data in databases. By the time you complete this activity, you’ll be able to use SQL
to write queries that retrieve data from databases. Further, you’ll practice navigating large public
datasets in BigQuery in order to become proficient in SQL queries—an essential skill for your future
career as a data analyst.
Scenario
In this activity, you’ll query a public dataset that contains information about the trees in New York
City in 2005. You’ll also use the AVG function to determine the average diameter of all NYC trees in
2005. You’ll then have the opportunity to write queries using the data for 1995 and 2015 to compare
the average tree diameters.
Step-By-Step Instructions
Follow the instructions to complete each step of the activity. Then answer the questions at the end
of the activity before going to the next course item.
1. Log in to BigQuery Sandbox. On the BigQuery page, click the Go to BigQuery button.
If you have a free-of-charge trial version of BigQuery, you can use that instead.
Note: BigQuery Sandbox frequently updates its user interface. The latest changes may not be
reflected in the screenshots presented in this activity, but the principles remain the same. Adapting
to changes in software updates is an essential skill for data analysts, and it’s helpful for you to
practice troubleshooting. You can also reach out to your community of learners on the discussion
forum for help.
2. If you haven’t done so already, create a BigQuery project. (If you have a project, select it in the
Explorer pane.)
a. In the BigQuery console, select the dropdown list to the right of the Google Cloud logo to open the
Select a project dialog box.
b. In the Select a project dialog box, select the CREATE PROJECT button.
c. Give your project a name that will help you identify it later. This can be a unique project ID or use
an auto-generated one. You do not need to select an organization.
3. The three main sections of BigQuery are now onscreen: the BigQuery navigation menu; the
Explorer pane, which you can use to search for public datasets and open projects; and the Details
pane, which shows details of the database or dataset you’ve selected in the Explorer pane and
displays windows for you to enter queries.
Notice that you can use the <| symbol in the BigQuery navigation menu section to collapse it. There
is a similar symbol to collapse the Explorer pane.
Follow these steps to find and select the NYC Trees dataset for this activity:
3. A new box opens where you can search public datasets that are available through Google Cloud. In
the Search Marketplace text box, search for New York City Trees.
4. Select the search result NYC Street Trees, then select the View Dataset button.
Heading is NYC Street Trees. Link to
City of New York provided. Subheading is New York City Street Tree Census data. Select View dataset
button to link to dataset.
5. Google Cloud opens a new browser tab displaying BigQuery with the bigquery-public-data
collection open in the Explorer pane. To ensure the bigquery-public-data database remains in your
project’s Explorer pane, select the star next to the dataset.
6. The BigQuery Details pane contains information about the new_york_trees dataset. This
information includes the date the dataset was created, when it was last modified, and the Dataset ID.
The Details pane displaying the
new_york_trees data description including the dataset ID, when it was created, default table
expiration, when it was last modified, the data location, description, the default collation, the default
rounding mode, case insensitive, the labels, and the tags.
1. In the Explorer pane, select the arrow next to the new_york_trees dataset to display the tables it
contains.
Note: If the new_york_trees dataset is not in the Explorer pane, type new_york_trees into the
Search text box in the Explorer pane. (This will work if you have pinned bigquery-public-data in the
Explorer pane.) If search doesn’t return the needed results, follow the steps above to search for the
new_york_trees dataset.
2. Notice that the new_york_trees dataset contains three tree census tables from 1995, 2005, and
2015. It also contains a table that lists the tree species.
In the Explorer pane,
bigquery-public-data is open and the new_york_trees dataset is expanded to show the tables in the
dataset are: tree_census_1995, tree_census_2005, tree_census_2015, and tree_species.
These are all tables contained in the dataset. Now, examine the data for all trees cataloged in New
York City for three specific years.
3. Select the tree_census_2005 table. BigQuery displays the table’s structure in the Details pane.
4. In the Details pane, select Query > In new tab to open a new query window.
5. Notice that BigQuery populates the Query Window with a SELECT query. This query is incomplete
because it doesn’t contain anything in between SELECTand FROM.
This SELECT statement in the Details pane is incomplete because the columns to display have not
been specified. So, either list the columns separated by commas or use the asterisk to have BigQuery
return all columns in the table.
1. Type an asterisk * after the SELECT command in line one of the Query Editor. Your query should
now read SELECT * FROM followed by your table location. This command tells BigQuery to return all
of the columns in the tree_census_2005 table.
2. In the Query Editor, select the Run button to run the query. The results will be displayed as a table
in the Query Results pane below the Query Editor.
Results in the preview mode with columns with data populated including the row, object ID,
cen_year, tree_dbh, tree_loc, pit_type, soil_lvl, status, spc_latin, and spc_common.
This query returns all columns for the first 1,000 rows from the table. BigQuery returns only the first
1,000 rows because the SELECT query includes a LIMIT 1000 clause. This limits the rows returned to
reduce the processing time required.
3. Next, write a query to find out the average diameter of all NYC trees in 2005. On line 1, replace the
* after the SELECT command with AVG(tree_dbh). Select the Run button to execute the query.
This returns your answer, 12.833 (which means the average diameter of NYC trees in 2005 was
12.833 inches).
Now, come up with some questions and answer them with your own SQL queries. For example,
query the 1995 and the 2015 tables to find the average diameter of trees. You can then compare the
average diameter of the trees in all three datasets to determine whether the trees in NYC have
grown on average. Note that the field name for tree diameter in the tree_census_1995 table is
diameter.
You are also free to choose another publicly available dataset in BigQuery and write your own
queries for extra practice—there are many interesting choices!
1.
Question 1
Reflection
You want to compare tree sizes in 2005 to tree sizes in 2015. In the activity above, you calculated the
average diameter of NYC trees in 2005. Based on the new_york_trees dataset, what is the average
diameter of NYC trees in 2015?
1 point
11.439
12.981
12.334
11.279
2.
Question 2
During this activity, you practiced writing SQL queries to return information from datasets. In the text
box below, write a 2-3 sentence (40-60 words) response to each of the following questions:
What do you think might happen if you wrote each component of a query correctly, but
rearranged the order?
1 point
I, Susan Francis, understand that submitting work that isn’t my own may result in permanent failure
of this course or deactivation of my Coursera account.
SubmitSave draft
Like
Dislike
Report an issue
Search in course
Search
English
Hide menu
. Duration: 1 hour1h
Ungraded App Item: Ungraded App ItemHands-On Guided Project: Select a new city using BigQuery
. Duration: 1 hour1h
Practice Quiz: Hands-On Activity: Choose the right tool for the job
. Duration: 1 hour1h
. Duration: 1 hour1h
Practice Quiz: Test your knowledge on using SQL with large datasets
Module 3 challenge
1.
2. Module 3
3. Test your knowledge on using SQL with large datasets
PreviousNext
Ready to review what you’ve learned before taking the quiz? I’m here to help.
Resume assignment
Receive grade
To Pass75% or higher
Your grade
-Not available
Like
Dislike
Report an issue
Hi, Dr!
Help me practice
Let's chat
Coach is powered by AI, so check for mistakes and don't share sensitive info. Your data will be used in
accordance with Coursera's Privacy Notice.
Ctrl+M
Back
English
1.
Question 1
Fill in the blank: When using SQL, the _____ clause can be used to filter a dataset of customers to
only include people who have made a purchase in the past month.
1 point
WHERE
FILTER
FROM
SELECT
2.
Question 2
Which cases are most often used for column names in a database table and represent a SQL best
practice? Select all that apply.
1 point
Camel case
Snake case
Sentence case
Lowercase
3.
Question 3
1 point
Sentence case
Snake case
Camel case
Lowercase
4.
Question 4
What can be removed from the following query without preventing it from running or changing the
results?
SELECT *
FROM `Uni_dataset.new_table`
WHERE ID = 'Lawrence'
1 point
Backticks (`)
WHERE
SELECT