Google Data Analyst
Data is a collection of facts.
Different processes for data analysis:
Ask: business challenge, objective, or question
Prepare: data generation, collection, storage, and data management
Process: data cleaning and data integrity
Analyze: data exploration, visualization, and analysis
Share: communicating and interpreting results
Act: putting insights to work to solve the problem
Data analysis is the collection, transformation, and organization of data in order to draw conclusions,
make predictions, and drive informed decision-making.
Data science, the discipline of making data useful, is an umbrella term that encompasses three
disciplines:
Machine learning: If you want to automate, in other words, make many, many, many
decisions under uncertainty.
Statistics: If you want to make a few important decisions under uncertainty.
Analytics: if you don't know how many decisions you want to make before you begin? What
if what you're looking for is inspiration? You want to encounter your unknown unknowns.
You want to understand your world.
EMC Corporation's data analytics process
EMC Corporation's data analytics process is cyclical with six steps:
1. Discovery
2. Pre-processing data
3. Model planning
4. Model building
5. Communicate results
6. Operationalize
The phases aren’t static milestones; each step connects and leads to the next, and eventually
repeats. Key questions help analysts test whether they have accomplished enough to move forward
and ensure that teams have spent enough time on each of the phases and don’t start modeling
before the data is ready. It is a little different from the data analysis process on which this program is
based on, but it has some core ideas in common: the first phase is interested in discovering and
asking questions; data has to be prepared before it can be analyzed and used; and then findings
should be shared and acted on.
SAS's iterative process
An iterative data analysis process was created by a company called SAS, a leading data analytics
solutions provider. It can be used to produce repeatable, reliable, and predictive results:
1. Ask
2. Prepare
3. Explore
4. Model
5. Implement
6. Act
7. Evaluate
The SAS model emphasizes the cyclical nature of their model by visualizing it as an infinity symbol. Its
process has seven steps, many of which mirror the other models, like ask, prepare, model, and act.
But this process is also a little different; it includes a step after the act phase designed to help
analysts evaluate their solutions and potentially return to the ask phase again.
Project-based data analytics process
A project-based data analytics process has five simple steps:
1. Identifying the problem
2. Designing data requirements
3. Pre-processing data
4. Performing data analysis
5. Visualizing data
It doesn’t include the sixth phase, or the act phase. However, it still covers a lot of the same steps
described. It begins with identifying the problem, preparing and processing data before analysis, and
ends with data visualization.
Big data analytics process
Authors Thomas Erl, Wajid Khattak, and Paul Buhler proposed a big data analytics process in their
book, Big Data Fundamentals: Concepts, Drivers & Techniques. Their process suggests phases divided
into nine steps:
1. Business case evaluation
2. Data identification
3. Data acquisition and filtering
4. Data extraction
5. Data validation and cleaning
6. Data aggregation and representation
7. Data analysis
8. Data visualization
9. Utilization of analysis results
This process appears to have three or four more steps than the previous models. But in reality, they
have just broken down what has been referred to as prepare and process into smaller steps. It
emphasizes the individual tasks required for gathering, preparing, and cleaning data before the
analysis phase.
What is the data ecosystem?
Data ecosystems are made up of various elements that interact with one another in order to
produce, manage, store, organize, analyze, and share data. These elements include hardware and
software tools, and the people who use them.
Data can also be found in something called the cloud. The cloud is a place to keep data online, rather
than on a computer hard drive. So instead of storing data somewhere inside your organization's
network, that data is accessed over the internet.
The difference between data scientists and data analysts:
Data science is defined as creating new ways of modeling and understanding the unknown by using
raw data.
Data scientists create new questions using data, while analysts find answers to existing questions by
creating insights from data sources.
Google Cloud Data Analyst
The cloud is really important because it's a trajectory of where computation is going.
Cloud computing is the practice of using on demand computing resources as services hosted over the
Internet. Over the Internet is what makes up the Cloud part. It eliminates the need for organizations
to find, set up or manage resources themselves, and they only pay for what they use.
It's the unique infrastructure of a cloud computing model that makes all of this possible. This
infrastructure has four main components:
Hardware: Types of hardware include servers, processors and memory, network switches,
routers, and cables, firewalls, and load balancers, cooling systems, and power supplies. These
are the physical items needed to keep things running.
storage,
network,
and virtualization.