1 Introduction
1 Introduction
ELL409
Tanmoy Chakraborty
IIT Delhi, India
https://tanmoychak.com/
Logistics
• Course Instructor: Tanmoy Chakraborty (NLP)
https://tanmoychak.com/
• Journals
• IEEE Transactions on Pattern Analysis and Machine Intelligence
• IEEE Trans. on Neural Networks and Learning Systems
• Journal of Machine Learning Research (JMLR)
Course Directives
• Class Time: Mon and Thu, 8-9:30
• Office Hour: as per requirement (email me to schedule an appointment)
• TA Hour: TBD
• Room: LH114
• Marks distribution (tentative):
• Midterm: 15% • Audit: A- (discouraged!!)
• Endterm: 25%
• Quiz (4) : 20% • Grading Scheme: TBD
• Assignment (3): 20% • Attendance: 75%
• Project: 15%
• Per-day quiz: 5%
Term Project (15%)
• A fresh idea that leads to a full-fledged system
• Each group should consist of (max) 3 students
• Students are encouraged to propose their own project ideas by Aug 10
• Send your ideas to the TA
• Best Project Award
• You need to Deliverables:
• gather data 1. Project proposal (3%), 2 pages + ppt, end of mid sem. Will include
problem definition, background, proposed solution sketch
• develop models 2. Final project report (5%), Max 8 pages ACM format, end of endsem.
• evaluate your models 3. Repo of dataset and source code (2%)
4. Final project presentation (5%)
• prepare presentation
• write tech report Students are encouraged to publish their projects in
good conferences/journals
List of [potential] Projects
• TBD
https://bsw.iitd.ac.in/faqs.php#0
Acknowledgment
The course and slides are inspired by the following:
• http://cs229.stanford.edu/
• http://www.cs.cmu.edu/~ninamf/courses/601sp15/index.html
• https://dspace.mit.edu/handle/1721.1/46320
• https://nipunbatra.github.io/ml2020/lectures/
• https://sites.google.com/a/iiitd.ac.in/ml-cse-343-543/
• https://cs.ccsu.edu/~markov/ccsu_courses/MachineLearning.html
• https://nptel.ac.in/courses/106/105/106105152/
and many blogs, online articles, scholarly papers, lecture notes, etc. available on
the web.
What is ML?
• Term “Machine Learning” coined by Arthur
Samuel in 1959.
• Samuel Checkers-playing Program was among the world's first
successful self-learning programs
What is ML?
• Term “Machine Learning” coined by Arthur
Samuel in 1959.
• Samuel Checkers-playing Program was among the world's first
successful self-learning programs
https://christophm.github.io/interpretable-ml-
book/terminology.html
History
• 1950s
• Samuel’s checker-playing program
• 1960s
• Neural network: Rosenblatt’s perceptron
• Pattern recognition
• Minsky proved the limitations of perceptron
• 1970s:
• Symbolic concept induction
• Expert systems and knowledge acquisition bottleneck
• Quinlan’s ID3
• NLP (symbolic)
History
• 1980s
• Adv. Decision tree and rule learning
• Learning and planning for problem solving
• Resurgence of NN
• PAC learning
• Focus on experimental methodology
• 1990s
• SVM
• Data Mining
• Adoptive agents and web applications
• Text learning
• RL
• Ensembles
• Byes Net
Evidence that ML is booming
Year # of papers
2021 9,122
2022 10,411
2023 12,345
2024 ???
Year # of papers
2021 9,122
2022 10,411
2023 12,345
2024 ~22,000
• Can be considered as a recipe that defines the inputs, the output and
all the steps needed to get from the inputs to the output.
• Why it is useful
• Finds all kind of unknown patterns in data.
• Help to find features which can be useful for categorization.
• It is taken place in real time, so all the input data to be analyzed and labeled in
the presence of learners.
• It is easier to get unlabeled data from a computer than labeled data, which
needs manual intervention
Metrics to evaluate classification: Confusion Matrix
• Table that describes the performance of a classification model by
grouping predictions into 4 categories.
• True Positives: we correctly predicted they do have diabetes
• True Negatives: we correctly predicted they don’t have diabetes
• False Positives: we incorrectly predicted they do have diabetes (Type I error)
• False Negatives: we incorrectly predicted they don’t have diabetes (Type II error)
Metrics to evaluate classification: Confusion Matrix
• Table that describes the performance of a classification model by
grouping predictions into 4 categories.
• True Positives: we correctly predicted they do have diabetes
• True Negatives: we correctly predicted they don’t have diabetes
• False Positives: we incorrectly predicted they do have diabetes (Type I error)
• False Negatives: we incorrectly predicted they don’t have diabetes (Type II error)
Precision = TP / ( TP + FP )
Recall = TP / ( TP + FN )
Metrics to evaluate classification: Confusion Matrix
▪F allows us to trade off precision against recall.
where
32
Metrics to evaluate regression models
• R Square
how much of variability in dependent variable can be explained by the
model. It is square of Correlation Coefficient (R) and that is why it is
called R Square.
R Square value is between 0 to 1 and bigger value indicates a better fit between
prediction and actual value
Metrics to evaluate regression models
ML Applications
ML evolve farmers into farming-technologists
• Optimizing automated irrigation systems
• Detecting leaks or damage to irrigation
systems
• Crop and soil monitoring
• Detecting disease and pests
• Monitoring livestock health
• Intelligent pesticide application
• Yield mapping and predictive analytics
• Sorting harvested products
• Surveillance
ML for COVID-19 Diagnosis
https://www.medscape.com/viewarticle/900746?form=fpf
https://www.theverge.com/2016/3/24/11297050/tay-
microsoft-chatbot-racist
Gender Bias in Google Translation
ML Challenges
https://towardsdatascience.com/top-8-challenges-for-
machine-learning-practitioners-c4c0130701a1
MLE and MAP
Board work