[go: up one dir, main page]

0% found this document useful (0 votes)
118 views40 pages

WIP - ML-22-DEC Weekend

The document outlines a data science course curriculum covering topics like statistics, machine learning algorithms like KNN and Naive Bayes, linear regression, and data preprocessing. The course includes conceptual overview, code demonstrations, and exercises for each major topic. Sessions cover basics like statistics, Python environment setup, Pandas, Numpy, and plotting, before moving to specific algorithms.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as XLSX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
118 views40 pages

WIP - ML-22-DEC Weekend

The document outlines a data science course curriculum covering topics like statistics, machine learning algorithms like KNN and Naive Bayes, linear regression, and data preprocessing. The course includes conceptual overview, code demonstrations, and exercises for each major topic. Sessions cover basics like statistics, Python environment setup, Pandas, Numpy, and plotting, before moving to specific algorithms.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as XLSX, PDF, TXT or read online on Scribd
You are on page 1/ 40

Day # Topic Sub-topics

Course overview
Organization interest in Data science/ML/Python
Core spaces of DS, AI, ML, DL
BIG DATA definition, Evolution, drivers
V’s of BIG DATA
BIG DATA - Frameworks
Data Science Roles, skills
Overview
Intro on Data Science with Python

Understand DS process framework

Basic Stats
•Observations and Variables
•Types of Variables
•Central Tendency
•Distribution of the Data

Sampling distribution - overview

Sampling distribution - use case (means of 2 groups)

Sampling distribution - use case (proprtion)

•Confidence Intervals
•Confidence Intervals
•Confidence Intervals

•Multi-collinearity
demo of code samples
demo of code samples
demo of code samples
demo of code samples

Hypothesis Tests

STATS
overview, code demo

STATS
Pearson’s Correlation Coefficient - theory
code - demo
Spearman’s Rank Correlation
code - demo
Student’s t-test - theory
code - demo
Paired Student’s t-test
code - demo
Chi2 - theory
code - demo
code - demo
Kendall-Tau - theory
code - demo
Analysis of Variance Test (ANOVA) - theory
code - demo
code - demo

Normality Tests
using visual plots

using statistical tests


1. Shapiro-Wilk Test
2. D’Agostino’s K^2 Test
3. Anderson-Darling Test

PREPARING DATA TABLES


CLEANING THE DATA
code - demo - outlier detection

code - demo - isolation forest

REMOVING OBSERVATIONS AND VARIABLES


TRANSFORMATION
NEW FREQUENCY DISTRIBUTION
CONVERTING TEXT TO NUMBERS
code - demo on PANDAS's categorical to
numeric conversion features

code - demo on scikit learn's categorical to


numeric conversion features

CONVERTING CONTINUOUS DATA TO


CATEGORIES
code - demo on binning
COMBINING VARIABLES
SCALING/NORMALIZATION
Min-Max
DS - framework
Standard scaler
Robust scaler

PREPARING UNSTRUCTURED DATA

UNDERSTANDING RELATIONSHIPS
Summary tables
Specific calculations
Visualization tools

GENERATING GROUPS **
Theory on clustering

Basic concepts (distance measures, linkage measures)

Concepts on Hierarchical clustering


Concepts on partition based clustering
Concepts on density based clustering

code demo -

Overview of Machine learning


DATA - types - continuous, discreet, ordinal
DATA - Predictors and response variables

What is training data and testing data. Validation set


Overfitting, Underfitting
Model complexity
Model evaluation methods
Supervised
Unsupervised
Reinforcement learning

Machine Learning Metrics - classification - confusion metrics


Day 1
Conceptual points Metrics - classification - Precision, recall

Metrics - classification - F1-score, TPR, FPR, ROC, AUC

Metrics - regression - MAE, MSE, RMSE, MAPE,MPE


Metrics - regression - R2 and adjusted R2

Bias Vs Variance (Concepts)

Parametric vs non parametric data model

Install Anaconda and Jupyter notebook


Some of the important data types

basic operators

Python: Environment
Day 1 Setup & Basic
Python Data structures - lists, tuples, sets, Strings and dicts
Functions & lambda functions

List comprehensions

Generators and Iterators

Exception handling

Python - Sample programs


Pandas - overview
Pandas - data types
Series

Dataframes
Day 2 PANDAS aggegation
merge
quick-tips
Exercises
Day 2 PANDAS

Numpy - Basics
Numpy - Basics
Day 2 Numpy
Numpy - Basics

Plotting - basics
Plotting - basics
Day 2 plotting
Plotting from pandas
Parallel plots
Intro to ML
Overview of KNN
Day 2 KNN, Overview

Overview of KNN, evaluation methods

Pre-req (ML Evaluation methods)

KNN - Evaluation method 1 (custom code)


KNN - Evaluation method 2 (custom code)
KKN, Evaluation
Day 2 methods KNN - Evaluation method 3 (custom code)
KNN - Evaluation method 4 (custom code)

KNN - scitkit learn - basic usage

Pre-req - ML - metrics (accuracy, precision, recall etc)


KNN - using scitkit learn

KNN - tuning parameters

KNN - optimum K

KNN

Pre-processing/ Data wrangling/EDA


Pre-processing/ Data wrangling/EDA
Pre-processing/ Data wrangling/EDA
Pre-processing/ Data wrangling/EDA
Pre-processing/ Data wrangling/EDA

KNN, Data
Day 3 preprocessing
KNN - advanced tuning (algorithm, recommdation)
KNN, Data
Day 3
preprocessing

KNN - speeding up (Using KD tree and ball tree)

Excel example of KD tree


Excel example of Ball tree

KD Tree - sample code (scikit learn)

KNN - tree algorithms - overview

KNN - considerations (normality, parametric/non)

KNN - for regression


KNN - for regression (boston data)
KNN - for regression (boston data)

KNN - questions

Pre-processing/ Data wrangling/EDA


Pre-processing/ Data wrangling/EDA

Day 3 Machine Learning/


Data Science

Naïve Bayes - conditional probablity

NB - custom code - flu - no flu


NB - custom code - golf
NB - custom code - income
Day 4 Naïve Bayes
NB - sklearn - iris
NB - sklearn - titanic

NB vs KNN
NB - on adult income data, EDA, Preproc, Normality

Regression - Linear /multiple variables

Lin Reg - Simple use of sklearn

Lin Reg - using advertising data


Lin Reg - using auto data
Lin Reg - using glass data

Lin Reg - Assumptions

Lin Reg - Residual plots


demo - normality test (Statistical test)

Lin Reg - Residual plots

Lin Reg - Manual calculations of coeff, draw line, compare


with scikit learn

Day 4 Linear Regression Lin Reg - Regression metrics


R^2 - explaination

Lin Reg - Using statsmodel (OLS)

Lin Reg - P values, correlation, feature selection


code

Lin Reg - Using gradient descend to form a linear equation


(optimization)

Multi-collinearity - how to handle this

Coding Categorical Variables in Regression Models

polynomial regression
Polynomial
Regression

Overview

Simple code using statsmodel

Example code -
Stepwise Regression

Regularization - overview

Linear regression - OLS - SSE coeff using Matrix


code
Meshgrid, contour plots

Vectors - Norms

Linear regression - contour lines

Describe Ridge Regression


Regularization
Describe Lasso Regression
Describe ElasticNet

Code examples (Linear, Ridge, Lasso, Logistic)


Code examples (Linear, Ridge, Lasso, Logistic)

Ridge Regression - overview (intuitive)


Ridge - basic
Ridge - with housing data

Ridge Regression
Alpha selection (using yellow bricks)

Lasso Regression - overview (intuitive)

Lasso Regression

Lasso Regression - Hitters data, Compare Ridge and Lasso


Lasso - simple example

ElasticNet Regression overview (intuitive)

ElasticNet - basic example

ElasticNet Regression

Logistic regression - overview

Logit - how sigmoid function works

Logit - basic sklearn


logit - using titanic dataset
logit - using bank loan data

Logit reg vs lin reg

Day 4 Logistic Regression


Logit - iris - tuning parameters
Logit - iris - overfitting
Day 4 Logistic Regression
Logit - iris - CV. Multinomial

Logit - Multi-nomial
code (using iris data)

odds, log of odds, probability - overview


odds - excel

code

Logit - Ordinal regression


code
code

Decision tree - overview

Basics - Impurity. Entropy, Gini


entropy
Excel example (IG and Ginni)

Decision tree - attribute selection (IG, Gini)

Decision tree - sklearn

Simple DT explaination

Decision tree - sklearn - iris data


Decision tree - sklearn - iris data - graph

Day 5 Decision Tree Metrics - AUC ROC


Metrics - AUC ROC
code
code

Decision tree - sklearn - titanic - tuning parameter


code

scikit learn parameters


Decision tree - regressor

Decision tree - regressor

Random Forest - overview

Random Forest - basic sklearn


Random Forest - basic sklearn - iris data

Random Forest - basic sklearn - temp data


Random Forest - basic sklearn - cancer data

metric - log loss for classification

Day 5 Random Forest


Random Forest - basic sklearn - titanic - tuning

Out of Bag error rate - overview

code example - demo

code example - comprehensive demo

Random Forest - both classification and regression

Overview

code
Grid Search
code

SVM - overview
Types of SVMs

Intuitive maths behind SVMs

Day 5 SVM (SVC & SVR)


SVM - Basic
Day 5 SVM (SVC & SVR) SVM - Iris
SVM - bill auth

SVM - both classification and regression

SVM - kernel use /scenarios

Vectors, Vector norms

Matrix Operations

Eigen Value/ Eigen Vectors


Why COV
MATHS
Derivatives

Significance to DS/ML

Overview - Intuitive understanding

Gradient descent - basic undersanding (toy example)

housing price predication

Types of Gradient Descent

gradient descent - basic concepts, maths

Gradient Descent
(foundation for Adv
machine learning
algo and Deep
learning)

Use of gradient descent (basic) for linear regression

All variants of gradient descent - compare


UNSUPERVISED learning - overview
Clustering
1- hierarchical clustering (2) - overview
Agglomerative clustering
Basics
code

Agglomerative clustering (scipy and sklearn)

Agglomerative clustering - categorical data


Agglomerative clustering - Mixed data

Divisive clustering
Basics
code

2- KMEANS - Overview
KMEANS - understanding (using excel)
ML-K-MEANS-00-basics.ipynb
ML-K-MEANS-01-basics.ipynb

KMEANS - iris data

K-means on delivery drivers data

K-menas - synthetic data

KMEANS - on Iris data


UN Supervised kmeans - on tiatnic data
Day 5
learning

KMEANS - on xclara data

KMEANS - categorical data


KMEANS - Mixed data

3 - DBSCAN - overview
code

UNSUP - metrics
code - demo on metric for clusterings

code
code (KNN vs DBSCAN)

3.1 - HDBSCAN - variation of DBSCAN

Comparision algorithms

4 - Association Rules

OPTICS algorithm
Anomaly detection
Local Outlier Factor
Neural Networks (overview)
Autoencoders (overview
Deep Belief Nets
Hebbian Learning
Generative Adversarial Networks
Self-organizing map

Hyper-parameter tuning - overview

Machine Learning

- Feature selection
Overview
Filter method
Filter - Variance threshold 00
Filter - Variance threshold 01
Filter - correlation threshold

Filiter - Information value (using bank data)

chi2 test - overview (using excel)

Filter - select K best


Filter - select K best

Wrapper method
SFS - wine quality
SFS - Iris
SBS - Iris
SFFS - Iris
SBFS - Iris
SFS - using parameters for all types
SFS - with regression problems
SFS - with grid search
SFS - select k-best features

RFE - using pima dataset


RFE - using iris dataset
Feature Engg
All combined

Dimensionality reduction
- Feature Extraction
PCA - overview

PCA - Maths
PCA - Maths - Eigen values, vectors

PCA - basics
PCA - basics
PCA - basics

PCA - basics - Generic code (PC variances graph)

PCA - basics

PCA - with kidney disease data

PCA - with iris data

SVD (Singular Value Decomposition)

Day 5
SVD - Examples
Day 5
Linear Discriminant Analysis
LDA - examples
LDA - examples

Smoothing/ Ensemble techniques


•Basic Ensemble Techniques
•Max Voting
•Averaging
•Weighted Average

•Advanced Ensemble Techniques


•Stacking - custom code
•Stacking - mlxtend - classifier
•Stacking - mlxtend - classifier - proba
•Stacking
•Stacking
•Stacking
•Stacking
•Stacking
•Stacking -regression

•Blending

•Bagging

Ensemble methods •Boosting - overview


Gradient Boosting

ensemble.GradientBoostingRegressor

GB- titanic data, tuning


GB - ca housing data, tuning
GB - boston data, regression

AdaBoost
XGBoost

•Algorithms based on Bagging and Boosting


•Bagging meta-estimator
•Random Forest
•AdaBoost
•GBM
•XGB
•Light GBM
CatBoost

Generic sample example - all models


Text Mining – overview

NLP Package - NLTK (install)


NLTK - Overview and basic code snippets
Bag of words (BOW)
Vectorization
Vectorization

Regular expression - basics and example

Day 5 Text Mining/ NLP Text classification


Basics (pos - neg dataset)
Basics (ham - spam dataset)
Bag of words (BOW)
classification - ham - spam
classification - yelp ratings

Vectorization
Vectorization
Vectorization
Vectorization

Sentiment Analysis - overview & challenges

Sentiment Analysis - basics


Sentiment Analysis - IMDB movie reviews

NLP Package - wordcloud


NLP Package - wordcloud
Day 5 Text Mining/ NLP

Sentiment Analysis -US GOP debate


Sentiment Symposium Tutorial
Topic modelling - overview

Clustering - basics
Clustering - basics
Clustering - basics

Topic modelling - package - GENSIM


Topic modelling - package - GENSIM
Topic modelling - package - GENSIM
Day 5 Text Mining/ NLP
Topic modelling - overview
Topic modelling - basics
Topic modelling - basics
Topic modelling - basics
Topic modelling - Simple example
Topic modelling - 20-newsgroups
Topic modelling - brown

Next Week
EDA/Preproc steps
Naïve Bayes
SVM
UNSUP - Kmeans
UNSUP - Agglomerative
UNSUP - DBSCAN
Ensemble learning
Pre-processing/EDA
Project review
Feature Engg (PCA, )
Text analytics

Eig V, Eig Vector (PCA)


L1/l2 Norm - Ridge/Lasso
Gradient Descend
Time Series
Boosting (GD boosting)
PySpark (ML)
Sentiment Analysis (sarcasm, ….)
Teaching Aids Learning Objective
1 page view on DS with Python - overview
Intro/ overview slides

Simplilearn - ebook detailed run of the main topics


BIG DATA, Data Science
slides - 00 - 00 DS - Introduction.pptx framework,

code - demo - sampling-distribution-1

code - sampling-distribution-2-diff-SAMPLE-MEANS

code - sampling-distribution-3-SAMPLE-PROPORTION

overview
code - estimation-CI-0.ipynb
code - estimation-CI-1-t-DISTRIBUTION

slide - 00 - 01 DS - basic STATS.pptx


code - python-multi-collinearity-00-VIF.ipynb
code - python-multi-collinearity-01-determinants.ipynb
code - python-multi-collinearity-03-loan.ipynb
refer the feature selection methods -FILETER (VIF)
code - hypothesis-testing-01-A SINGLE POPULATION MEAN

code - EDA-Relationship-10-1-CORR-iris.ipynb

code - EDA-Relationship-10-3-t-test-iris.ipynb

code - EDA-Relationship-10-4-chi2-census.ipynb
code - EDA-Relationship-10-5-chi2-titanic.ipynb

code - EDA-Relationship-10-2-kendall-tau-iris.ipynb

code - EDA-Relationship-10-6-ANOVA-1-way.ipynb
code - EDA-Relationship-10-7-ANOVA-1-way.ipynb

using demo code for Naïve Bayes algo


code - ML-NB-12-adult-income-EDA-normality-tests

code - ML-NB-12-adult-income-EDA-normality-tests
code - ML-NB-12-adult-income-EDA-normality-tests
code - ML-NB-12-adult-income-EDA-normality-tests

EDA-cleaning-the-data-outlier-detection-00.ipynb

needs idea on decision trees/ensemble methods

EDA-cleaning-the-data-outlier-detection-01-Isolation Forest.ipynb
code - EDA-Transformation-01-categorical-to-numeric.ipynb

code - EDA-Transformation-01-categorical-to-numeric.ipynb

code - EDA-Transformation-02-scaling.ipynb
code - EDA-Transformation-02-scaling.ipynb
code - EDA-Transformation-02-scaling.ipynb

slides - 00 - 04 DS -understanding GROUPS.pptx

slides - 00 - 04 DS -understanding GROUPS.pptx

slides - 00 - 04 DS -understanding GROUPS.pptx


slides - 00 - 04 DS -understanding GROUPS.pptx
slides - 00 - 04 DS -understanding GROUPS.pptx

Refer to UNSUP learning section below

slides - 10 - 0 - ML.pptx
slides - 10 - 0 - ML.pptx
slides - 10 - 0 - ML.pptx

slides - 10 - 0 - ML.pptx
slides - 10 - 0 - ML.pptx
slides - 10 - 0 - ML.pptx
slides - 10 - 0 - ML.pptx
slides - 10 - 0 - ML.pptx
slides - 10 - 0 - ML.pptx
slides - 10 - 0 - ML.pptx

slides - 10 - 1 - ML - Metrics.pptx
slides - 10 - 1 - ML - Metrics.pptx

slides - 10 - 1 - ML - Metrics.pptx

slides - 10 - 1 - ML - Metrics.pptx
slides - 10 - 1 - ML - Metrics.pptx

slides - 10 - 0 - ML.pptx

slides - 10 - 0 - ML.pptx

python-core-01-basics.ipynb

python-data-structures-01-DICT
python-data-structures-02-LIST
python-data-structures-03-SETS
python-data-structures-04-STRINGS
python-data-structures-05-TUPLES
python-core-02-functions.ipynb

python-core-03-generators
python-core-04-Iterators
python-core-05-exception-handling
python-core-25-programs
python-core-26-easy-programs

05 - 05 Python - PANDAS
python-PANDAS-00-data types
python-PANDAS-01-series
python-PANDAS-02-dataframes1
python-PANDAS-03-dataframes2
python-PANDAS-04-aggregation1
python-PANDAS-06-merge
python-PANDAS-05-quick-tips
python-PANDAS-20-exercises
python-NUMPY-00
python-NUMPY-01
python-NUMPY-02

python-PLOTS-basic
python-PLOTS-map
python-PLOTS-pandas
python-PLOTS-parallel
slides - 10 - 0 DS with Python - ML
ebook - (410)

LOOCV, Hold-out method, K-fold CV


10 - 1 DS with Python - ML - SUP - KNN

slides - 10 - 0 - ML.pptx

ML-KNN-01-input-test Python, KNN internal workings


ML-KNN-02-leave-one-out Python, KNN internal workings
ML-KNN-03-random-n-train-test Python, KNN internal workings
ML-KNN-04-k-fold-CV Python, KNN internal workings

Python, KNN internal workings


ML-KNN-06-sklearn-basics

slides - 10 - 1 DS with Python - ML - Metrics Accuracy, classification report


ML-KNN-07-sklearn Prepare data, train test split,
accuracy,
ML-KNN-08-sklearn-tuning stratified K-fold, cross_val_score
functionalities of sklearn

ML-KNN-09-optimum-K-neighbors using for loop to find the best K


which results in lowest error
ML-KNN-10-iris Scaling, histograms - separability

slides - 00 - 03 DS - Pre-proc-EDA scaling


ML-EDA-01-categorical-to-numeric categorical to numeric values
ML-EDA-02-pandas
ML-EDA-11-gdp plots
ML-EDA-12-scaling scaling
orientation on all the scikit learn params of KNN
slides - 10 - 1 DS with Python - ML - SUP - KNN brute-force, KD tree, Ball-tree
neighbors. Metric, weights

slides - 10 - 1 DS with Python - ML - SUP - KNN Understand KD tree and ball tree,
compare with brute force algo

excel - kd-tree.xlsx
excel - Ball-tree.xlsx

code - ML-KNN-24-KD-tree-basics-00
code - ML-KNN-24-KD-tree-basics-01
Brute-force, KD tree, Ball-tree - how
they work
slides - 10 - 1 DS with Python - ML - SUP - KNN

Important considerations around


the KNN algo
slides - 10 - 1 DS with Python - ML - SUP - KNN

code - ML-KNN-20-regression.ipynb
code - ML-KNN-21-regression-boston-housing-00.ipynb KNN reg vs lin reg
code - ML-KNN-21-regression-boston-housing-01.ipynb

10 - 1 DS with Python - ML - SUP - KNN

ML-kNN-EDA-iris
ML-kNN-EDA-tips

Theory on conditional probability


slides - 10 - 1 DS with Python - ML - SUP - NB
excel - ML-NB-sample.xlsx Fruit example
ML-NB-01-flu-no-flu custome code - NB
ML-NB-02-play-golf-or-not custome code - NB
ML-NB-04-income custome code - NB

ML-NB-10-iris
ML-NB-11-titanic

slides - 10 - 1 DS with Python - ML - SUP - NB


code - ML-NB-12-adult-income-EDA-normality-tests.ipynb Normality tests, transformation

slides - 10 - 3 DS with Python - ML - SUP - Linear Regression Theory on linear regression


excel - ML-LINREG-ex1.xlsx excel examples on regression

ML-LINREG-00-basics
regression, feature importance
regression metrics
ML-LINREG-10-advertising
ML-LINREG-12-auto-mpg regression
ML-LINREG-13-glass regression

slides - 10 - 3 DS with Python - ML - SUP - Linear Regression when to use, assumptions

code - ML-LINREG-15-auto-mpg-residuals.ipynb Test of normality

VIS-REG-00-residual-plot using yellowbricks oackage

code - ML-LINREG-16-head-size-1-sklearn.ipynb

code- ML-LINREG-16-head-size-2-custom-OLS-code.ipynb

slides - 10 - 3 DS with Python - ML - SUP - Linear Regression Metric for regression


Jupyter demo - ML-LINREG-04-r2-adj-r2 Metric for regression

code - ML-LINREG-02-STATSmodels-1-OLS.ipynb

slides - 10 - 3 DS with Python - ML - SUP - Linear Regression


code - ML-LINREG-17-corr-p-value-breast-cancer.ipynb Feature selection

(try after gradient descent)

ML-LINREG-25-optimize-advertising

(understanding of multi-collinearity is needed)


code - ML-LINREG-18-multi-coll-cement.ipynb

slides - 10 - 3 DS with Python - ML - SUP - Linear Regression Dummy and Effect Coding
slides - 10 - 3 DS with Python - ML - SUP - Linear Regression

ML-POLY-REG-basics-00.ipynb
ML-POLY-REG-basics-01.ipynb
ML-POLY-REG-basics-02.ipynb
ML-POLY-REG-basics-03.ipynb
ML-POLY-REG-basics-04-basis-fn-regression.ipynb

slides - 10 - 4 - ML - SUP - Stepwise Regression.pptx

ML-STEPWISE-REG-00-basics.ipynb

ML-STEPWISE-REG-using-lin-reg-10-house-prices.ipynb

slides - 10 - 5 - ML - SUP - Regularization

ML-LINREG-06-SSE-using-matrix-formula.ipynb
ML-LINREG-07-using-matrix-formula
python-PLOTS-meshgrid-contour-00

00 - 12 DS - Maths.pptx

ML-LINREG-30-grad-des-contour-lines.ipynb

(below sections)
(below sections)
(below sections)

ML-SUP-REG-10-community-Ridge-Lasso-Logistic.ipynb
ML-SUP-REG-11-boston-housing-Ridge-Lasso-ElasticNet.ipynb

slides - 10 - 3 - ML - SUP - Ridge Regression.pptx


ML-RIDGE-REG-00-basics.ipynb
ML-RIDGE-REG-10-boston-housing.ipynb
ML-RIDGE-REG-11-random-data-ridge-vs-linear.ipynb

VIS-REG-02-Alpha-Selection

slides - 10 - 3 - ML - SUP - Lasso regression.pptx

ML-LASSO-00-basics.ipynb
ML-LASSO-01-basics.ipynb
ML-LASSO-feature-selection-10-boston-housing.ipynb
Ridge to show shrinking, Lasso for
feature selection
ML-LASSO-feature-selection-11-hitters-RIDGE-compare.ipynb

ML-LASSO-12-wine-taste.ipynb
ML-LASSO-13-wine-taste-detailed.ipynb

ML-ElasticNet-00-basics.ipynb
ML-ElasticNet-10-house-prices.ipynb
ML-ElasticNet-11-house-prices-detailed.ipynb

slides - 10 - 4 DS with Python - ML - SUP - Logistic Regression Theory on logistic regression

sigmoid-function.ipynb

ML-LOGIT-00-intro-glass-categorical
ML-LOGIT-11-titanatic
ML-LOGIT-12-bank-deposit-plan

slides - 10 - 4 DS with Python - ML - SUP - Logistic Regression

Idea of L1, L2 regularization (Ridge and Lasso)


ML-LOGIT-15-iris-tuning.ipynb
ML-LOGIT-17-iris-overfitting-reg.ipynb

ML-LOGIT-16-iris-CV.ipynb Multinomial logit

slides - 10 - 4 DS with Python - ML - SUP - Logistic Regression


ML-LOGIT-21-multinomial-iris.ipynb

slides - 10 - 4 DS with Python - ML - SUP - Logistic Regression


odds-odds ratio-log of odds-coeff.xlsx

ML-LOGIT-20-multinomial-glass.ipynb prob, odds, log of odds, coeff

slides - 10 - 4 DS with Python - ML - SUP - Logistic Regression


ML-LOGIT-25-ORD-basics.ipynb
ML-LOGIT-25-ORD-housing.ipynb

slides - 10 - 10 - ML - SUP - Decision Tree.pptx Theory on decision tree

slides - 10 - 10 - ML - SUP - Decision Tree.pptx


excel - entropy.xlsx
excel - Infornation gain.xlsx
Entropy and Gini functional form
code - ML-DECTREE-04-split-criteria-intuition

slides - 10 - 10 - ML - SUP - Decision Tree.pptx

ML-DECTREE-00-basics
ML-DECTREE-01-basics-clf-reg.ipynb
ML-DECTREE-02-basics-explain-tree.ipynb
ML-DECTREE-03-balance-scale-data-implement-tree.ipynb
ML-DECTREE-10-iris
ML-DECTREE-11-iris-graph

slides - 10 - 1 DS with Python - ML - Metrics ROC/AUC


excel - ROCAUC.xlsx
CLF-ROCAUC00.ipynb
CLF-ROCAUC01.ipynb

slides - 10 - 10 - ML - SUP - Decision Tree.pptx


code - ML-DECTREE-12-titanic-tuning

slides - 10 - 10 - ML - SUP - Decision Tree.pptx


Overview using slides and excel
slides - 10 - 10 - ML - SUP - Decision Tree.pptx
excel - dec-tree-reg.xlsx

code - ML-DECTREE-20-Regression-boston-housing.ipynb
code - ML-DECTREE-21-Regression-dummy-data.ipynb

slides - 10 - 11 - ML - SUP - Random Forest.pptx

ML-RANDOM-FOREST-01 feature importances


ML-RANDOM-FOREST-10-iris basic use of RF
ML-RANDOM-FOREST-11-temps preproc, feature importances
preproc, feature importances, ROC,
AUC
ML-RANDOM-FOREST-12-breast-cancer

understand the log loss metric concept log loss metric


understand the warm_start parameter of the RF
preproc, feature importances, ROC,
AUC, param tuning, warm_start
ML-RANDOM-FOREST-13-titanic-param-tuning

slides - 10 - 11 - ML - SUP - Random Forest.pptx

concept of warm_start and


oob_score
code - ML-RF-20-OOB-00-basics.ipynb
GridSearchCV, OOB, number of
estimators, AUC
code - ML-RF-21-OOB-10-breast-cancer.ipynb

slides - 50 - 5 DS with Python - TM - Grid Search.pptx

ML-GRID-search-00.ipynb
ML-GRID-search-10-iris.ipynb

slides - 10 - 7 DS with Python - ML - SUP - SVM Theoretical understanding


slides - 10 - 7 DS with Python - ML - SUP - SVM Theoretical understanding

slides - 10 - 7 DS with Python - ML - SUP - SVM Theoretical understanding


ML-SVM-00-basics
ML-SVM-10-iris
ML-SVM-11-billauth

00 - 12 DS - Maths.pptx

code - ML-FE-PCA-00-eigen vectors values


code - ML-FE-PCA-00-why-COV.ipynb

slides - 10 - 39 - ML - GRADIENT DESCENT.pptx


basic derivative idea & Gradient
descent
GRADIENT-DESCENT-00-Intro.ipynb

ML-LINREG-ex1.xlsx (GRADIENT DESCENT worksheet) Partial derivative, learning rate

slides - 10 - 39 - ML - GRADIENT DESCENT.pptx

GRADIENT-DESCENT-00-Intro.ipynb Maths
GRADIENT-DESCENT-01-learning_rate.ipynb learning rate, high low value
GRADIENT-DESCENT-03-basics.ipynb
GRADIENT-DESCENT-04-basics.ipynb

GRADIENT-DESCENT-10-city-pop-food-truck.ipynb

ML-LINREG-16-students-3-gradient-descent.ipynb Lin reg (using gradient descent)

GRADIENT-DESCENT-11-housing-all-variants.ipynb
slides - 10 - 0 - ML.pptx Theoretical understanding

slides - 20 - 1 - ML - UNSUP - Hierarchical Clustering

ML-UNSUP-agg-00.ipynb
ML-UNSUP-agg-01-basics.ipynb

code - ML-UNSUP-agg-01-scipy-scikit-basics Comparision

slides - 20 - 1 - ML - UNSUP - 01- KMEANS.pptx


k-means.xlsx Theoretical on KMEANS

code - ML-K-MEANS-10-iris-sklearn.ipynb elbow method

code - ML-K-MEANS-11-delv-drivers.ipynb change the n_clusters


K-means does not do well with non-
isotropic data sets
code - ML-K-MEANS-16-assumptions
n_init to only 1 (default is 10), the
amount of times that the algorithm
will be run with different centroid
seeds,
1. K-Means++
2. Random choice of points from
the samples
3. user provided points(vectors)

code - ML-K-MEANS-15-bad-init.ipynb
code - ML-K-MEANS-14-titanic-some-tuning.ipynb Few tuning tips, accuracy
custom k-means, sklearn version,
weakness of k-means - varying
number of clusters
code - ML-K-MEANS-13-xclara-custom-sklearn
slides - 20 - 1 - ML - UNSUP - DBSCAN
code - ML-UNSUP-DBSCAN-00

slides - 10 - 1 - ML - Metrics.pptx
code - ML-UNSUP-metrics

Try with various settings of 0.3, 3, 1,


0.1 .. Discuss noise etc
code -ML-UNSUP-DBSCAN-01-metrics-tuning
code - ML-UNSUP-DBSCAN-03-clustering-compare

slides - 20 - 1 - ML - UNSUP - DBSCAN


ML-UNSUP-DBSCAN-03-clustering-compare

ML-UNSUP-DBSCAN-03-clustering-compare

slides -

ML-ENSEMBLE-models-00-pima

slides - 00 - 02 DS - Feature Engg.pptx

code - ML-FS-filter-VarianceThreshold-00.ipynb
code - ML-FS-filter-VarianceThreshold-01.ipynb
code - ML-FS-filter-corr-Threshold-00.ipynb

code - ML-FS-Filter-Information-Value-01 IV based feature selection

chi-ex1.xlsx

ML-FS-filter-selectKbest-chi2-10-iris.ipynb
ML-FS-filter-selectKbest-chi2-11-pima.ipynb

ML-FS-wrapper-SFS-10-wine-quality-RF
ML-FS-wrapper-SFS-11-iris-KNN
ML-FS-wrapper-SBS-11-iris-KNN
ML-FS-wrapper-SFFS-11-iris-KNN
ML-FS-wrapper-SBFS-11-iris-KNN
ML-FS-wrapper-SFS-12-iris-KNN-all-types
ML-FS-wrapper-SFS-13-regression-boston-data
ML-FS-wrapper-SFS-14-Grid-search-iris
ML-FS-wrapper-SFS-16-best-k-feature-wine

ML-FS-wrapper-RFE-10-pima.ipynb
ML-FS-wrapper-RFE-11-iris.ipynb

ML-FS-99-11-breast-cancer-using-RF.ipynb

slides - 00 - 02 DS - Feature Engg

code - ML-FE-PCA-00-MATHS-detailed
code - ML-FE-PCA-00-eigen vectors values.ipynb

ML-FE-PCA-00-Basics
ML-FE-PCA-01-PCA
ML-FE-PCA-02-PCA
iris, explained variance, varying PCs
ML-FE-PCA-03-PCA

ML-FE-PCA-04-PCA-inner-working-wip

code - ML-FE-PCA-10-PCA-kidney-disease

code - ML-FE-PCA-12-workings-of-pca-iris.ipynb
maths using numpy/scipy
slides - overview
slides - 00 - 02 DS - Feature Engg
ML-FE-LDA-00-basics.ipynb
ML-FE-LDA-11-iris.ipynb

slides - overview
ML-ENSEMBLE-00-scoring-methods.ipynb
ML-ENSEMBLE-00-scoring-methods.ipynb
ML-ENSEMBLE-00-scoring-methods.ipynb

slides - overview
ML-ENSEMBLE-02-stacking-custom.ipynb
ML-ENSEMBLE-03-stacking.ipynb
ML-ENSEMBLE-04-stacking-proba-as-meta-features.ipynb
ML-ENSEMBLE-05-stacking-GridSearch.ipynb
ML-ENSEMBLE-06-stackingCV-classification.ipynb
ML-ENSEMBLE-07-stackingCV-classification-proba-as-meta-features.ipynb
ML-ENSEMBLE-08-stackingCV-classification-grid-search.ipynb
ML-ENSEMBLE-10-stacking-Reg.ipynb

ML-ENSEMBLE-01-blending.ipynb

ML-ENSEMBLE-30-bagging-00.ipynb

slides - overview

first go thru gradient descent concepts and code


code - ML-ENSEMBLE-60-GB-00-basics-boston-housing.ipynb test deviance

code - ML-ENSEMBLE-60-GB-10-titanic-tuning.ipynb
code - ML-ENSEMBLE-60-GB-11-ca-housing-wip.ipynb
code - ML-ENSEMBLE-60-GB-12-boston-housing-regression.ipynb

slides - overview
ML-ENSEMBLE-models-00-pima.ipynb

ebook - lesson 9 - page no 446 Overview on Text mining


slides - 50 - 0 DS with Python - TM - Introduction

slides - installation of NLTK


TM-lib-NLTK-00-basics token, POS, vector, BOW …

slides - 50 - 0 DS with Python - TM - Vectorizer


TM-vectorizer-count-00-parameters

Regular expressions - https://regex101.com

slides - 50 - 1 DS with Python - TM - Text Classification


TM-CLSFN-00-pos-neg
TM-CLSFN-01-sms-ham-spam
TM-CLSFN-03-BOW-dtm
TM-CLSFN-12-sms-ham-spam
TM-CLSFN-13-yelp-ratings

slides - 50 - 0 DS with Python - TM - Vectorizer


TM-vectorizer-TF-IDF-00-parameters
TM-vectorizer-TF-IDF-01-example
TM-vectorizer-TF-IDF-Count Compare tfidf with count

TM-Sentiment-00-basics
TM-Sentiment-10-IMDb-movie-reviews

TM-lib-WordCloud-00-basics
TM-lib-WordCloud-01-basics

TM-Sentiment-11-US-GOP-Debate
http://sentiment.christopherpotts.net/

TM-CLUSTERING-00-basics
TM-CLUSTERING-01-basics
TM-CLUSTERING-03-detailed

TM-lib-GENSIM-00-basics.ipynb
TM-lib-GENSIM-01-basics.ipynb
TM-lib-GENSIM-02-basics.ipynb

TM-TOPIC-modelling-00.ipynb
TM-TOPIC-modelling-01.ipynb
TM-TOPIC-modelling-02.ipynb
TM-TOPIC-modelling-10-Simple
TM-TOPIC-modelling-11-20-newsgroups
TM-TOPIC-modelling-12-brown

0.5
0.25
0.5
1
0.5
0.25
0.25
3.25
Status
2019, revisit HDBSCAN
Linear Algebra
Vectors,Matrices, and Systems of Linear Equations
Linear Transformations
Determinants and Eigenvalues
Inner Product Spaces, Orthogonal Projection, Least Squares
Singular Value Decomposition
Matrices

Advanced Linear Algebra


Matrix Equalities and Inequalities

Calculus
Gradient Descent & Derivatives
Multivariate Calculus
Khan Academy Calculus
Khan Academy – Basic Matrix operations
Khan Academy – Linear Algebra

You might also like