[go: up one dir, main page]

0% found this document useful (0 votes)
23 views58 pages

1 Introduction

Uploaded by

aditya332006
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views58 pages

1 Introduction

Uploaded by

aditya332006
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 58

Machine Learning

ELL409

Tanmoy Chakraborty
IIT Delhi, India
https://tanmoychak.com/
Logistics
• Course Instructor: Tanmoy Chakraborty (NLP)
https://tanmoychak.com/

• Guest Lecture: TBD (possibly from the industry)


• TAs: Sahil, Aswini, Palash, Prottoy, Vaibhav, Soumyodeep, Anand

• Course page: https://lcs2-iitd.github.io/ELL409-2401/


• Discussion forum: MS Team
• For assignment submission: Moodle

• Group Email: TBD


Books
• Machine Learning. Tom M. Mitchell. McGraw Hill, 1997.
• Pattern Recognition and Machine Learning. Christopher M. Bishop.
Springer, 2006.
• Understanding Machine Learning: From Theory to Algorithms. Shai
Shalev-Shwartz and Shai Ben-David. Cambridge University Press,
2014.
• Pattern Classification. Richard Duda, Peter Hart and David Stork.
Second Ed., Wiley 2006.
Prerequisite
• Basic computer science principles
• Big-O notation
• Comfortably write non-trivial code in Python/numpy
• Probability
• Random Variables
• Expectations
• Distributions
• Linear Algebra & Multivariate/Matrix Calculus
• Matrix algebra (*)
• Gradients and Hessians
• Eigenvalue/vector
Tentative Syllabus
• Introduction More about this course
• This course is meant for the beginners
• Concept learning • A lot of maths and derivations will be covered
• Decision Trees • Less slides will be used for delivering the lecture
• More board work
• Regression • Introductory neural networks will be covered
• Support Vector Machine
• Bias, Variance and ensemble learning
• Instance based learning
• Gaussian Discriminant Analysis and Naive Bayes
• Artificial Neural Networks
• Introduction to Deep Learning
• RNN, Backpropagation, CNN
• Conclusion
Journals/Conferences
• Conferences
• NeurIPS: Neural Information Processing Systems
• ICML: International Conference on Machine Learning
• ICLR: International Conference on Learning Representation
• AAAI: Association for the Advancement of Artificial Intelligence
• IJCAI: International Joint Conference on AI
• SIGKDD: Special Interest Group on Knowledge Discovery and Data Mining
• CVPR: Computer Vision and Pattern Recognition

• Journals
• IEEE Transactions on Pattern Analysis and Machine Intelligence
• IEEE Trans. on Neural Networks and Learning Systems
• Journal of Machine Learning Research (JMLR)
Course Directives
• Class Time: Mon and Thu, 8-9:30
• Office Hour: as per requirement (email me to schedule an appointment)
• TA Hour: TBD
• Room: LH114
• Marks distribution (tentative):
• Midterm: 15% • Audit: A- (discouraged!!)
• Endterm: 25%
• Quiz (4) : 20% • Grading Scheme: TBD
• Assignment (3): 20% • Attendance: 75%
• Project: 15%
• Per-day quiz: 5%
Term Project (15%)
• A fresh idea that leads to a full-fledged system
• Each group should consist of (max) 3 students
• Students are encouraged to propose their own project ideas by Aug 10
• Send your ideas to the TA
• Best Project Award
• You need to Deliverables:
• gather data 1. Project proposal (3%), 2 pages + ppt, end of mid sem. Will include
problem definition, background, proposed solution sketch
• develop models 2. Final project report (5%), Max 8 pages ACM format, end of endsem.
• evaluate your models 3. Repo of dataset and source code (2%)
4. Final project presentation (5%)
• prepare presentation
• write tech report Students are encouraged to publish their projects in
good conferences/journals
List of [potential] Projects
• TBD

Feel free to meet me to discuss more about the project ideas


DO NOT PLAGIARIZE !
Academic Integrity is of utmost importance. If anyone is found cheating/plagiarizing, it will
result in negative penalty (and possibly even more: an F grade or even DisCo).
Collaborate. But do NOT cheat.
• Assignments to be done individually.
• Do not share any part of code.
• Do not copy any part of report from any online resources or published works.
• If you reuse other’s works, always cite.
• If you discuss with others about assignment or outside your group for project, mention their names in
the report.
• Do not use GenAI tools (like ChatGPT).
We will check for pairwise plagiarism in submitted assignment code files among you all.
We will also check the probability of any submitted content being AI generated.
Project reports will be checked for plagiarism across all web resources.

https://bsw.iitd.ac.in/faqs.php#0
Acknowledgment
The course and slides are inspired by the following:
• http://cs229.stanford.edu/
• http://www.cs.cmu.edu/~ninamf/courses/601sp15/index.html
• https://dspace.mit.edu/handle/1721.1/46320
• https://nipunbatra.github.io/ml2020/lectures/
• https://sites.google.com/a/iiitd.ac.in/ml-cse-343-543/
• https://cs.ccsu.edu/~markov/ccsu_courses/MachineLearning.html
• https://nptel.ac.in/courses/106/105/106105152/

and many blogs, online articles, scholarly papers, lecture notes, etc. available on
the web.
What is ML?
• Term “Machine Learning” coined by Arthur
Samuel in 1959.
• Samuel Checkers-playing Program was among the world's first
successful self-learning programs
What is ML?
• Term “Machine Learning” coined by Arthur
Samuel in 1959.
• Samuel Checkers-playing Program was among the world's first
successful self-learning programs

“Field of study that give computers the ability to learn


without being explicitly programmed” - Arthur Samuel
[1959]
What is ML?
Study of algorithms that
• improve their performance P
• at some task T
• with experience E

well-defined learning task: <P,T,E>

Tom Mitchell, 1998


What is ML?
• Machine Learning is a set of methods that allow computers to learn
from data to make and improve predictions.

• Machine learning is a paradigm shift from "normal programming"


where all instructions must be explicitly given to the computer to
"indirect programming" that takes place through providing data.

https://christophm.github.io/interpretable-ml-
book/terminology.html
History
• 1950s
• Samuel’s checker-playing program
• 1960s
• Neural network: Rosenblatt’s perceptron
• Pattern recognition
• Minsky proved the limitations of perceptron
• 1970s:
• Symbolic concept induction
• Expert systems and knowledge acquisition bottleneck
• Quinlan’s ID3
• NLP (symbolic)
History
• 1980s
• Adv. Decision tree and rule learning
• Learning and planning for problem solving
• Resurgence of NN
• PAC learning
• Focus on experimental methodology
• 1990s
• SVM
• Data Mining
• Adoptive agents and web applications
• Text learning
• RL
• Ensembles
• Byes Net
Evidence that ML is booming

Year # of papers
2021 9,122
2022 10,411
2023 12,345
2024 ???

Neural Information Processing Systems (NeurIPS)


https://medium.com/@dcharrezt/neurips-2019-stats-c91346d31c8f
Evidence that ML is booming

Year # of papers
2021 9,122
2022 10,411
2023 12,345
2024 ~22,000

Neural Information Processing Systems (NeurIPS)


https://medium.com/@dcharrezt/neurips-2019-stats-c91346d31c8f
Terminology: Algorithm
• An Algorithm is a set of rules that a machine follows to achieve a
particular goal.

• Can be considered as a recipe that defines the inputs, the output and
all the steps needed to get from the inputs to the output.

• E.g., Cooking recipes are algorithms


• ingredients are the inputs
• the cooked food is the output
• the preparation and cooking steps are the algorithm instructions
Terminology: Blackbox Model
• A system that does not reveal its internal mechanisms.

• “Black box" models cannot be understood by looking at their


parameters (e.g. a neural network).

• The opposite of a black box is sometimes


referred to as White Box, and is referred to as
interpretable model.
Terminology: Dataset
• A Dataset is a table with the data from which the machine learns.

• The dataset contains the features and the target to predict.

• When used to induce a model, the dataset is called training data.

• An Instance is a row in the dataset

• The Features are the inputs used for


prediction or classification. A feature is a column
in the dataset.
Terminology: Variables
• Categorical variable: Contain a finite number of categories or distinct
groups. Categorical data might not have a logical order. E.g., categorical
predictors include gender, material type, and payment method.

• Discrete variable: Numeric variables that have a countable number of


values between any two values. A discrete variable is always numeric. E.g.,
the number of customer complaints or the number of flaws or defects.

• Continuous variable: Numeric variables that have an infinite number of


values between any two values. A continuous variable can be numeric or
date/time. E.g., the length of a part or the date and time a payment is
received.
Categorical vs Discrete
Terminology: Variables
• Independent variables (also referred to as Features) are the input for
a process that is being analyzes.

• Dependent variables are the output of the process.


Classification and Regression
• Classification is about predicting a label and regression is about
predicting a quantity.

• Classification is the task of predicting a discrete class label.


Regression is the task of predicting a continuous quantity.
Classification
• Predicting a categorical output.
• Binary classification predicts one of two possible outcomes (e.g., is
the email spam or not spam?)
• Multi-class classification predicts one of multiple possible outcomes
(e.g. is this a photo of a cat, dog, horse or human?)

Classification Threshold: The lowest probability value at which


we’re comfortable asserting a positive classification. For example, if
the predicted probability of being diabetic is > 50%, return True,
otherwise return False.
Clustering/ Unsupervised Method
• Unsupervised methods do not require any labeled sensor data.
• Instead, they try to automatically find interesting activity patterns in
unlabeled sensor data.
• Mostly heuristic based

• Why it is useful
• Finds all kind of unknown patterns in data.
• Help to find features which can be useful for categorization.
• It is taken place in real time, so all the input data to be analyzed and labeled in
the presence of learners.
• It is easier to get unlabeled data from a computer than labeled data, which
needs manual intervention
Metrics to evaluate classification: Confusion Matrix
• Table that describes the performance of a classification model by
grouping predictions into 4 categories.
• True Positives: we correctly predicted they do have diabetes
• True Negatives: we correctly predicted they don’t have diabetes
• False Positives: we incorrectly predicted they do have diabetes (Type I error)
• False Negatives: we incorrectly predicted they don’t have diabetes (Type II error)
Metrics to evaluate classification: Confusion Matrix
• Table that describes the performance of a classification model by
grouping predictions into 4 categories.
• True Positives: we correctly predicted they do have diabetes
• True Negatives: we correctly predicted they don’t have diabetes
• False Positives: we incorrectly predicted they do have diabetes (Type I error)
• False Negatives: we incorrectly predicted they don’t have diabetes (Type II error)

Precision = TP / ( TP + FP )
Recall = TP / ( TP + FN )
Metrics to evaluate classification: Confusion Matrix
▪F allows us to trade off precision against recall.

where

▪ α ϵ [0, 1] and thus  2 ϵ [0,∞]


▪Most frequently used: balanced F with  = 1 or α = 0.5
▪This is the harmonic mean of P and R:
▪What value range of β weights recall higher than precision?

32
Metrics to evaluate regression models
• R Square
how much of variability in dependent variable can be explained by the
model. It is square of Correlation Coefficient (R) and that is why it is
called R Square.

R Square value is between 0 to 1 and bigger value indicates a better fit between
prediction and actual value
Metrics to evaluate regression models
ML Applications
ML evolve farmers into farming-technologists
• Optimizing automated irrigation systems
• Detecting leaks or damage to irrigation
systems
• Crop and soil monitoring
• Detecting disease and pests
• Monitoring livestock health
• Intelligent pesticide application
• Yield mapping and predictive analytics
• Sorting harvested products
• Surveillance
ML for COVID-19 Diagnosis

• Chest X-rays and CT scans to detect abnormalities.


• Symptom Checker Tools
• Predict the likelihood of a patient having COVID-19
• Clinical Decision Support
• Telemedicine
• Contact Tracing
• Data Integration
Where ML went wrong
IBM’s “Watson for Oncology” Cancelled After $62
million and Unsafe Treatment Recommendations

https://www.medscape.com/viewarticle/900746?form=fpf
https://www.theverge.com/2016/3/24/11297050/tay-
microsoft-chatbot-racist
Gender Bias in Google Translation
ML Challenges

https://towardsdatascience.com/top-8-challenges-for-
machine-learning-practitioners-c4c0130701a1
MLE and MAP
Board work

You might also like