Course 2 - 121756
Course 2 - 121756
Course 2 - 121756
1. Ask
It’s impossible to solve a problem if you don’t know what it is. These are some
things to consider:
2. Prepare
You will decide what data you need to collect in order to answer your questions
and how to organize it so that it is useful. You might use your business task to
decide:
3. Process
Clean data is the best data and you will need to clean up your data to get rid of
any possible errors, inaccuracies, or inconsistencies. This might mean:
What data errors or inaccuracies might get in my way of getting the best
possible answer to the problem I am trying to solve?
How can I clean my data so the information I have is more consistent?
4. Analyze
You will want to think analytically about your data. At this stage, you might sort
and format your data to make it easier to:
Perform calculations
Combine data from multiple sources
Create tables with your results
5. Share
How can I make what I present to the stakeholders engaging and easy to
understand?
What would help me understand this if I were the listener?
6. Act
Now it’s time to act on your data. You will take everything you have learned from
your data analysis and put it to use. This could mean providing your stakeholders
with recommendations based on your findings so they can make data-driven
decisions.
Questions to ask yourself in this step:
How can I use the feedback I received during the share phase (step 5) to
actually meet the stakeholder’s needs and expectations?
These six steps can help you to break the data analysis process into smaller,
manageable parts, which is called structured thinking. This process involves
four basic activities:
When you are starting out in your career as a data analyst, it is normal to feel
pulled in a few different directions with your role and expectations. Following
processes like the ones outlined here and using structured thinking skills can
help get you back on track, fill in any gaps and let you know exactly what you
need.
1. Making predictions.
This problem type involves using data to make an informed decision about how
things may be in the future.
For example, a hospital system might use a remote patient monitoring to predict
health events for chronically ill patients. The patients would take their health
vitals at home every day, and that information combined with data about their
age, risk factors, and other important details could enable the hospital's algorithm
to predict future health problems and even reduce future hospitalizations.
2. Categorizing things.
This means assigning information to different groups or clusters based on
common features.
4. Identifying themes
Identifying themes takes categorization as a step further by grouping information
into broader concepts.
Going back to our manufacturer that has just reviewed data on the shop floor
employees. First, these people are grouped by types and tasks. But now a data
analyst could take those categories and group them into the broader concept of
low productivity and high productivity. This would make it possible for the
business to see who is most and least productive, in order to reward top
performers and provide additional support to those workers who need more
training.
5. Discovering connections
It enables data analysts to find similar challenges faced by different entities, and
then combine data and insights to address them.
Here's what I mean; say a scooter company is experiencing an issue with the
wheels it gets from its wheel supplier. That company would have to stop
production until it could get safe, quality wheels back in stock. But meanwhile,
the wheel companies encountering the problem with the rubber it uses to make
wheels, turns out its rubber supplier could not find the right materials either. If all
of these entities could talk about the problems they're facing and share data
openly, they would find a lot of similar challenges and better yet, be able to
collaborate to find a solution.
6. Finding patterns.
Data analysts use data to find patterns by using historical data to understand
what happened in the past and is therefore likely to happen again. E-commerce
companies use data to find patterns all the time. Data analysts look at transaction
data to understand customer buying habits at certain points in time throughout
the year. They may find that customers buy more canned goods right before a
hurricane, or they purchase fewer cold-weather accessories like hats and gloves
during warmer months. The e-commerce companies can use these insights to
make sure they stock the right amount of products at these key times.
Making predictions
A company that wants to know the best advertising method to bring in new customers is an
example of a problem requiring analysts to make predictions. Analysts with data on location, type
of media, and number of new customers acquired as a result of past ads can't guarantee future
results, but they can help predict the best placement of advertising to reach the target audience.
Categorizing things
A company that sells smart watches that help people monitor their health would be interested in
designing their software to spot something unusual. Analysts who have analyzed aggregated
health data can help product developers determine the right algorithms to spot and set off alarms
when certain data doesn't trend normally.
Identifying themes
User experience (UX) designers might rely on analysts to analyze user interaction data. Similar to
problems that require analysts to categorize things, usability improvement projects might require
analysts to identify themes to help prioritize the right product features for improvement. Themes
are most often used to help researchers explore certain aspects of data. In a user study, user
beliefs, practices, and needs are examples of themes.
By now you might be wondering if there is a difference between categorizing things and
identifying themes. The best way to think about it is: categorizing things involves assigning items
to categories; identifying themes takes those categories a step further by grouping them into
broader themes.
Discovering connections
A third-party logistics company working with another company to get shipments delivered to
customers on time is a problem requiring analysts to discover connections. By analyzing the wait
times at shipping hubs, analysts can determine the appropriate schedule changes to increase the
number of on-time deliveries.
Finding patterns
Minimizing downtime caused by machine failure is an example of a problem requiring analysts to
find patterns in data. For example, by analyzing maintenance data, they might discover that most
failures happen if regular maintenance is delayed by more than a 15-day window.
Smart Questions
Time-bound questions specify the time to be studied. The time period we want
to study is 1983 to 2004. This limits the range of possibilities and enables the
data analyst to focus on relevant data. Okay, now that you have a general
understanding of SMART questions, there's something else that's very important
to keep in mind when crafting questions, fairness.
Here's an example that breaks down the thought process of turning a problem
question into one or more SMART questions using the SMART method:
Questions should be open-ended. This is the best way to get responses that will
help you accurately qualify or disqualify potential solutions to your specific
problem. So, based on the thought process, possible SMART questions might
be:
On a scale of 1-10 (with 10 being the most important) how important is your
car having four-wheel drive?
What are the top five features you would like to see in a car package?
What features, if included with four-wheel drive, would make you more
inclined to buy the car?
How much more would you pay for a car with four-wheel drive?
Has four-wheel drive become more or less popular in the last three years?
Reports and dashboards are both useful for data visualization. But there
are pros and cons for each of them. A report is a static collection of data given to
stakeholders periodically. A dashboard on the other hand, monitors live,
incoming data. Let's talk about reports first. Reports are great for giving
snapshots of high level historical data for an organization. There are some
downsides to keep in mind too. Reports need regular maintenance and aren't
very visually appealing. Because they aren't automatic or dynamic, reports don't
show live, evolving data. For a live reflection of incoming data, you'll want to
design a dashboard. Dashboards are great for a lot of reasons, they give your
team more access to information being recorded, you can interact through data
by playing with filters, and because they're dynamic, they have long-term value.
ut dashboards do have some cons too. For one thing, they take a lot of time to
design and can actually be less efficient than reports, if they're not used very
often. If the base table breaks at any point, they need a lot of maintenance to get
back up and running again. Dashboards can sometimes overwhelm people with
information too. If you aren't used to looking through data on a dashboard, you
might get lost in it.
A pivot table is a data summarization tool that is used in data processing.
Pivot tables are used to summarize, sort, re-organize, group, count, total, or
average data stored in a database. It allows its users to transform columns into
rows and rows into columns.
Creating a dashboard
Here is a process you can follow to create a dashboard:
1. Identify the stakeholders who need to see the data and how they will use it
Use these tips to help make your dashboard design clear, easy to follow, and
simple:
You have a lot of options here and it all depends on what data story you
are telling. If you need to show a change of values over time, line charts or bar
graphs might be the best choice. If your goal is to show how each part
contributes to the whole amount being reported, a pie or donut chart is probably
a better choice.
Filters show certain data while hiding the rest of the data in a dashboard.
This can be a big help to identify patterns while keeping the original data intact. It
is common for data analysts to use and share the same dashboard, but manage
their part of it with a filter.
Types of Dashboard
Strategic dashboards
Operational dashboards
Mathematical Thinking
Small data can be really small. These kinds of data tend to be made up of
data sets concerned with specific metrics over a short, well defined period of
time.
Big data on the other hand has larger, less specific data-sets covering a
longer period of time. They usually have to be broken down to be analyzed. Big
data is useful for looking at large- scale questions and problems, and they help
companies make big decisions.
A lot of organizations deal with data overload and way too much unimportant
or irrelevant information.
Important data can be hidden deep down with all of the non-important data,
which makes it harder to find and use. This can lead to slower and more
inefficient decision-making time frames.
The data you need isn’t always easily accessible.
Current technology tools and solutions still struggle to provide measurable
and reportable data. This can lead to unfair algorithmic bias.
There are gaps in many big data business solutions.
Now for the good news! Here are some benefits that come with big data:
When large amounts of data can be stored and analyzed, it can help
companies identify more efficient ways of doing business and save a lot of
time and money.
Big data helps organizations spot the trends of customer buying patterns and
satisfaction levels, which can help them create new products and solutions
that will make customers happy.
By analyzing big data, businesses get a much better understanding of current
market conditions, which can help them stay ahead of the competition.
As in our earlier social media example, big data helps companies keep track
of their online presence—especially feedback, both good and bad, from
customers. This gives them the information they need to improve and protect
their brand.
When thinking about the benefits and challenges of big data, it helps to think
about the three Vs: volume, variety, and velocity. Volume describes the
amount of data. Variety describes the different kinds of data. Velocity describes
how fast the data can be processed. Some data analysts also consider a fourth
V: veracity. Veracity refers to the quality and reliability of the data. These are all
important considerations related to processing huge, complex data-sets.
Module 3
Click the gray triangle above row number 1 and to the left of Column A to
select all cells in the spreadsheet.
From the main menu, click Home, and then click Conditional Formatting to
select Highlight Cell Rules > More Rules.
For Select a Rule Type, choose Use a formula to determine which cells to
format.
Click the Format button, select the Fill tab, select yellow (or any other color),
and then click OK.
Click OK to close the format rule window.
Problem Domain
The specific area of analysis that encompasses every area of activity affecting or
affected by the problem
Now, as a data analyst, your scope of work will be a bit more technical
and include those basic items we just mentioned, but you'll also focus on things
like data preparation, validation, analysis of quantitative and qualitative datasets,
initial results, and maybe even some visuals to really get the point across
Deliverables are items or tasks you will complete before you can finish
the project.
Milestones are significant tasks you will confirm along your timeline to
help everyone know the project is on track
Deliverables: What work is being done, and what things are being created
as a result of this project? When the project is complete, what are you
expected to deliver to the stakeholders? Be specific here. Will you collect
data for this project? How much, or for how long?
Avoid vague statements. For example, “fixing traffic problems” doesn’t specify
the scope. This could mean anything from filling in a few potholes to building a
new overpass. Be specific! Use numbers and aim for hard, measurable goals
and objectives. For example: “Identify top 10 issues with traffic patterns within the
city limits, and identify the top 3 solutions that are most cost-effective for reducing
traffic congestion.”
Milestones: This is closely related to your timeline. What are the major
milestones for progress in your project? How do you know when a given part
of the project is considered complete?
Milestones can be identified by you, by stakeholders, or by other team members
such as the Project Manager. Smaller examples might include incremental steps
in a larger project like “Collect and process 50% of required data (100 survey
responses)”, but may also be larger examples like ”complete initial data analysis
report” or “deliver completed dashboard visualizations and analysis reports to
stakeholders”.
Timeline: Your timeline will be closely tied to the milestones you create for
your project. The timeline is a way of mapping expectations for how long
each step of the process should take. The timeline should be specific enough
to help all involved decide if a project is on schedule. When will the
deliverables be completed? How long do you expect the project will take to
complete? If all goes as planned, how long do you expect each component of
the project will take? When can we expect to reach each milestone?
Reports: Good SOWs also set boundaries for how and when you’ll give
status updates to stakeholders. How will you communicate progress with
stakeholders and sponsors, and how often? Will progress be reported
weekly? Monthly? When milestones are completed? What information will
status reports contain?
At a minimum, any SOW should answer all the relevant questions in the above
areas. Note that these areas may differ depending on the project. But at their
core, the SOW document should always serve the same purpose by containing
information that is specific, relevant, and accurate. If something changes in the
project, your SOW should reflect those changes.
SOWs should also contain information specific to what is and isn’t considered
part of the project. The scope of your project is everything that you are expected
to complete or accomplish, defined to a level of detail that doesn’t leave any
ambiguity or confusion about whether a given task or item is part of the project or
not.
Notice how the previous example about studying traffic congestion defined its
scope as the area within the city limits. This doesn’t leave any room for confusion
— stakeholders need only to refer to a map to tell if a stretch of road or
intersection is part of the project or not. Defining requirements can be trickier
than it sounds, so it’s important to be as specific as possible in these documents,
and to use quantitative statements whenever possible.
For example, assume that you’re assigned to a project that involves studying the
environmental effects of climate change on the coastline of a city: How do you
define what parts of the coastline you are responsible for studying, and which
parts you are not?
In this case, it would be important to define the area you’re expected to study
using GPS locations, or landmarks. Using specific, quantifiable statements will
help ensure that everyone has a clear understanding of what’s expected.
“The best thing you can do for the fairness and accuracy of your data, is to make
sure you start with an accurate representation of the population, and collect the
data in the most appropriate, and objective way. Then, you'll have the facts so
you can pass on to your team”
Context can turn raw data into meaningful information. It is very important for
data analysts to contextualize their data. This means giving the data perspective
by defining it. To do this, you need to identify:
Who: The person or organization that created, collected, and/or funded the
data collection
What: The things in the world that data could have an impact on
Where: The origin of the data
When: The time when the data was created or collected
Why: The motivation behind the creation or collection
How: The method used to create or collect it
Module 4
Your data analysis project should answer the business task and create
opportunities for data-driven decision-making. That's why it is so important to
focus on project stakeholders. As a data analyst, it is your responsibility to
understand and manage your stakeholders’ expectations while keeping the
project goals front and center.
You might remember that stakeholders are people who have invested time,
interest, and resources into the projects that you are working on. This can be a
pretty broad group, and your project stakeholders may change from project to
project. But there are three common stakeholder groups that you might find
yourself working with: the executive team, the customer-facing team, and the
data science team.
Let’s get to know more about the different stakeholders and their goals. Then
we'll learn some tips for communicating with them effectively.
1. Executive team
For example, you might find yourself working with the vice president of human
resources on an analysis project to understand the rate of employee absences. A
marketing director might look to you for competitive analyses. Part of your job will
be balancing what information they will need to make informed decisions with
their busy schedule.
But you don’t have to tackle that by yourself. Your project manager will be
overseeing the progress of the entire team, and you will be giving them more
regular updates than someone like the vice president of HR. They are able to
give you what you need to move forward on a project, including getting approvals
from the busy executive team. Working closely with your project manager can
help you pinpoint the needs of the executive stakeholders for your project, so
don’t be afraid to ask them for guidance.
2. Customer-facing team
Let’s say a customer-facing team is working with you to build a new version of a
company’s most popular product. Part of your work might involve collecting and
sharing data about consumers’ buying behavior to help inform product features.
Here, you want to be sure that your analysis and presentation focuses on what is
actually in the data-- not on what your stakeholders hope to find.
When you're working with each group of stakeholders- from the executive team,
to the customer-facing team, to the data science team, you'll often have to go
beyond the data. Use the following tips to communicate clearly, establish trust,
and deliver your findings across groups.
Discuss goals. Stakeholder requests are often tied to a bigger project or goal.
When they ask you for something, take the opportunity to learn more. Start a
discussion. Ask about the kind of results the stakeholder wants. Sometimes, a
quick chat about goals can help set expectations and plan the next steps.
Feel empowered to say “no.” Let’s say you are approached by a marketing
director who has a “high-priority” project and needs data to back up their
hypothesis. They ask you to produce the analysis and charts for a presentation
by tomorrow morning. Maybe you realize their hypothesis isn’t fully formed and
you have helpful ideas about a better way to approach the analysis. Or maybe
you realize it will take more time and effort to perform the analysis than
estimated. Whatever the case may be, don’t be afraid to push back when you
need to.
Stakeholders don’t always realize the time and effort that goes into collecting and
analyzing data. They also might not know what they actually need. You can help
stakeholders by asking about their goals and determining whether you can
deliver what they need. If you can’t, have the confidence to say “no,” and provide
a respectful explanation. If there’s an option that would be more helpful, point the
stakeholder toward those resources. If you find that you need to prioritize other
projects first, discuss what you can prioritize and when. When your stakeholders
understand what needs to be done and what can be accomplished in a given
timeline, they will usually be comfortable resetting their expectations. You should
feel empowered to say no-- just remember to give context so others understand
why.
Plan for the unexpected. Before you start a project, make a list of potential
roadblocks. Then, when you discuss project expectations and timeline with your
stakeholders, give yourself some extra time for problem-solving at each stage of
the process.
Know your project. Keep track of your discussions about the project over email
or reports, and be ready to answer questions about how certain aspects are
important for your organization. Get to know how your project connects to the
rest of the company and get involved in providing the most insight possible. If you
have a good understanding about why you are doing an analysis, it can help you
connect your work with other goals and be more effective at solving larger
problems.
Start with words and visuals. It is common for data analysts and stakeholders
to interpret things in different ways while assuming the other is on the same
page. This illusion of agreement* has been historically identified as a cause of
projects going back-and-forth a number of times before a direction is finally
nailed down. To help avoid this, start with a description and a quick visual of what
you are trying to convey. Stakeholders have many points of view and may prefer
to absorb information in words or pictures. Work with them to make changes and
improvements from there. The faster everyone agrees, the faster you can
perform the first analysis to test the usefulness of the project, measure the
feedback, learn from the data, and implement changes.
After the next report is completed, you can also send out a project update
offering more information. The email could look like this:
Telling a Story
Compare the same types of data: Data can get mixed up when you chart it
for visualization. Be sure to compare the same types of data and double
check that any segments in your chart definitely display different metrics.
Visualize with care: A 0.01% drop in a score can look huge if you zoom in
close enough. To make sure your audience sees the full story clearly, it is a
good idea to set your Y-axis to 0.
Leave out needless graphs: If a table can show your story at a glance, stick
with the table instead of a pie chart or a graph. Your busy audience will
appreciate the clarity.
Test for statistical significance: Sometimes two data-sets will look
different, but you will need a way to test whether the difference is real and
important. So remember to run statistical tests to see how much confidence
you can place in that difference.
Pay attention to sample size: Gather lots of data. If a sample size is small,
a few unusual responses can skew the results. If you find that you have too
little data, be careful about using it to form judgments. Look for opportunities
to collect more data, then chart those trends over longer periods.