Module 2
Module 2
Course Outcome
Textbook: Doing Data Science, Cathy O’Neil and Rachel Schutt, O'Reilly
Media, Inc O'Reilly Media, Inc, 2013
• Rachel, while at Google, learned from former Bell Labs statisticians that
EDA is essential even with huge datasets.
https://www.youtube.com/watch?v=JXaf2I6C0Ho
About RealDirect
• Find Market Data: Since there’s no internal data yet, use external
data from sources like GitHub’s Rolling Sales Update (use
external data from sources like GitHub’s Rolling Sales
Update).
— Josh Wills
Three Basic Machine Learning
Algorithms:
• Linear Regression
• k-Nearest Neighbours (k- NN)
• k-means
Linear Regression
• This line can help predict one thing (like a test score) if you know
the other thing (like hours studied).
• It's like finding the best-fitting path that the dots generally follow.
Linear Regression cont..
• Equation of linear regression:
y= β0+β1x
y: Dependent variable (outcome)
x: Independent variable (predictor)
β0: Intercept (the value of y when x=0)
β1: Slope (the change in y for a one-unit
change in x)
Example 1:
• A social networking site charges a monthly subscription fee of
$25.
• The first four data points are (1, 25), (10, 250), (100, 2500), and
(200, 5000).