Intro to Machine Learning
Data Science Club NYU
Sumedh Garimella, November 6, 2024
About Me
Sumedh Garimella
• Data Scientist at Capgemini
• Python, Azure, GenAI
• Georgia Tech
• BS CS ’22, MS CS ‘23
• Previous experience
• Interned at Coca-Cola, Bank of America, NCR
• Hobbies
• EDM Producer, DJ
• Cooking
What is Machine Learning?
• subset of Arti cial Intelligence
• allowing computers to learn from data and
modeling as opposed to being hard-coded
• Supervised vs Unsupervised Learning
• Unsupervised: let the computer form its own
understanding from unlabeled data
• Supervised: provide labeled training data to
model impact on target variable(s).
fi
Classi cation
• Prediction of discrete values
• Use Cases:
• Spam vs non-spam
• Positive vs negative for
diseases
• Genre classi cation
• Churn prediction
fi
fi
Classi cation Metrics
• Precision
• What % of true predictions are actually
true?
• Recall
• What % of actual trues are recognized?
• F1 score
• Harmonic mean of precision and recall
• Cross-Entropy
• Di erence between predicted and actual
values
ff
fi
Regression
• Prediction of continuous values
• Use Cases:
• Stock price forecasting
• Score predictions
• Infectious disease tracking
• Public sentiment analysis
Regression Metrics
• Mean Absolute Error (MAE)
• Punishes all error proportional to the
size of error
• Mean Squared Error (MSE)
• Punishes larger errors more
• R^2 Score
• Measures t of regression model to data
• Root Mean Squared Error (RMSE)
• Normalized version of MSE
fi
Correlation
• The factor by which an
independent variable is related to
a dependent variable
• i.e. how much of a predictor is
variable A for target variable X
• Pearson’s Correlation
Coe cient (PCC)
ffi
Exercises and Live Coding
Q&A
Thank you!