1.1 Introduction To Data Science 1
1.1 Introduction To Data Science 1
Objectives
Defining Data science
Fundamentals of Data science
The paths to Data science
The advice for new Data scientists
2
Defining Data Science
Data domain
Math and Statistics
Computer Science
3
Defining Data Science
The definition or the name came up in the 80s and 90s when
some professors were looking into the statistics curriculum, and
they thought it would be better to call it data science
Data Science is a process, not an event.
It is the process of using data to understand different things, to
understand the world.
Data science is the process of uncovering the insights and
trends that are hiding behind data. It's when you translate data
into a story.
You can make strategic choices for a company or an institution.
4
Defining Data Science
Data science is a field about processes and systems to extract
data from various forms of whether it is unstructured or
structured form.
Data science is the study of data.
What is Data Science? one's attempt to work with data, to
find answers to questions that they are exploring.
If you have data, and you have curiosity, and you're working
with data, and you're manipulating it, you're exploring it, the
very exercise of going through analyzing data, trying to get
some answers from it is data science.
5
Defining Data Science
Why is Data science for today?
We have tons of data available
We have algorithms
We have open source and free
Store datasets for a very low cost
The tools to work with data, the very availability of data, and
the ability to store and analyze data, it's all cheap, it's all
available, it's all ubiquitous, it's here.
There's never been a better time to be a data scientist.
6
Fundamentals of Data Science
Data analysis is a significant component of data science
Data analysis isn't new.
What is new is the vast quantity of data available from massively
varied sources:
log files
email
social media
sales data
patient information files
sports performance data
sensor data
security cameras
…
7
Fundamentals of Data Science
Computing power is needed to make a useful analysis and
reveal new knowledge.
Data science can help organizations understand their
environments, analyze existing issues, and reveal previously
hidden opportunities.
Data scientists use data analysis to add to the knowledge of the
organization by investigating data, exploring the best way to
use it to provide value to the business.
Many organizations will use data science to focus on a specific
problem, and so it's essential to clarify the question that the
organization wants answered.
8
Fundamentals of Data Science
9
Fundamentals of Data Science
Data science steps in general:
This first and most crucial step defines how the data science project
progresses: good data scientists are curious people who ask questions
to clarify the business need.
The next questions are: "what data do we need to solve the problem,
and where will that data come from?". Data scientists can analyze
structured and unstructured data from many sources.
Using multiple models to explore the data reveals patterns
Data scientists can use powerful data visualization tools to help
stakeholders understand the nature of the results, and the
recommended action to take.
Data science is changing the way we work; it's changing the way we
use data and it’s changing the way organizations understand the
world.
10
The Many Paths to Data
Science
Data science didn't exist until 2009, 2011.
Statistics is a classic form of data science. We studied math
statistics, that's how we started.
Strategic consulting firms, they use data science to make
decisions.
What we were doing was data science :
Analyzing electronic point of sale data for retail manufacturers
Transportation research
There are many paths to data science
11
The Many Paths to Data
Science
Example the path to data science: transportation research.
Building large models trying to forecast traffic on streets
Trying to determine congestion and greenhouse gas emissions or tailpipe
emissions.
Working with very large data sets, looking at household samples of, say,
150,000 households from half a million trips
Built even bigger data models that involved data and analytics.
12
Advice for New Data Scientists
An aspiring data scientist is to be curious, extremely
argumentative, and judgmental.
Curiosity is absolute must. If you're not curious, you would not
know what to do with the data.
Judgmental because if you do not have preconceived notions
about things you wouldn't know where to begin with.
Argumentative because if you can argument and if you can
plead a case, at least you can start somewhere and then you
learn from data and then you modify your assumptions and
hypotheses and your data would help you learn.
13
Advice for New Data Scientists
Don't be shy with the wrong point. You may say that I
thought I believed this, but now with data I know this. So, this
allows you a learning process.
The other thing that the data scientist should need is some
comfort and flexibility with analytics platforms: some
software, some computing platform.
The last thing that a data scientist need, and that is the ability
to “tell a story”. That once you have your analytics, once you
have your tabulations, now you should be able to tell a great
story from it. Because if you don't tell a great story from it, your
findings will remain hidden, remain buried, nobody would know.
14
Advice for New Data Scientists
Each field of health, education, and business,… has different
characteristics and therefore requires different set of skills.
Figure out first what you're interested, and what is your
competitive advantage.
Your competitive advantage is your understanding of some
aspect of life where you exceed beyond others in understanding
that.
You need to understand what platforms to learn, those tools
would be specific to the industry that you're interested in.
The next thing would be to apply your skills to real problems,
and then tell the rest of the world what you can do with it.
15
Summary
Defining Data science
Fundamentals of Data science
The paths to Data science
The advice for new Data scientists
16
Q&A
17