0% found this document useful (0 votes)

224 views32 pages

Framing A Machine Learning Problem

Uploaded by

Edward Bwogi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

224 views32 pages

Framing A Machine Learning Problem

Uploaded by

Edward Bwogi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 32

Framing a

Machine Learning Problem

Facilitators:
Rahman, Brian, Eva, Andrew, George,
Mark, Peter, Confred
Today`s Agenda


Defining a ML problem and proposing a
solution;
 Identifying good ML problems
 Deciding on ML
 Formulating a problem as an ML problem

ML Bootcamp Sept 16 - Oct 7, 2023

Defining a ML problem and proposing a
solution

ML Bootcamp Sept 16 - Oct 7, 2023

Defining a ML problem

ML – process of training a software (or model)
to make predictions by learning from data

Branches of ML
 Supervised learning
 Unsupervised / self-supervised learning
 Reinforcement learning

ML Bootcamp Sept 16 - Oct 7, 2023

Kinds of ML problems

Supervised and unsupervised ML problems fall under
multiple categories

ML Problem Type Description Example

Classification Predict label for previously Identify image of dog from that of
unseen example cat, bicycle from motor bike
Regression Predict numerical values Predicting price of houses

Clustering Group similar examples Most relevant documents

(unsupervised)
Association rule Infer likely association If you buy a bed, you are likely to
learning patterns in data buy a mattress too (unsupervised)
Structured output Create complex output Image recognition bounding
boxes
Ranking Identify position on a scale Search result ranking in a search
or status ML Bootcamp Sept 16 - Oct 7,engine
2023

Check Your Understanding


https://developers.google.com/machine-
learning/problem-framing/cases#check-
your-understanding

ML Bootcamp Sept 16 - Oct 7, 2023

The ML Mindset


"Machine Learning changes the way
you think about a problem. The focus
shifts from a mathematical science to a
natural science, running experiments
and using statistics, not logic, to
analyse its results." - Peter Norvig -
Google Research Director

ML Bootcamp Sept 16 - Oct 7, 2023

Experimental Design


Scientific method
 It is helpful to think of the ML process as an
experiment where we run test after test after test
to converge on a workable model
 Like an experiment, the process can be exciting,
challenging, and ultimately worthwhile

ML Bootcamp Sept 16 - Oct 7, 2023

Step Example

1. Set the research goal I want to predict how heavy traffic will be
on a given day.
2. Make a hypothesis I think the weather forecast is an
informative signal for traffic prediction!
3. Collect the required data Collect historical traffic data and weather
data on each day
4. Test your hypothesis Train a model using this data to predict
traffic.
5. Analyze the results you get Is this model better than existing systems
for traffic prediction?
6. Draw a conclusion I should (not) use this model to make
traffic predictions, because of X, Y, and Z.
7. Refine your hypothesis and Time of year could be a helpful signal for
repeat traffic
ML Bootcamp Sept prediction?
16 - Oct 7, 2023
Identifying good problems for ML

Characteristics of a good ML problem
 Clear use case
 * Start with the problem, not the solution. Make sure you aren't treating ML as a
hammer for your problems
 Focus on problems that would be difficult to solve with traditional
programming e.g,

Smart Reply – automated email reply, saves user time

Google Photos – find a specific photo by keyword search without
manual tagging

* ML solves problems by examining patterns in data/adapting with them
 Ask yourself the following questions,

What is the problem being faced?

Would it be a good problem for ML?
ML Bootcamp Sept 16 - Oct 7, 2023
Identifying good problems for ML

Characteristics of a good ML problem...
 Know the problem before focusing on the data
 * Be prepared to have your assumptions challenged
 Once you`ve clear understanding of problem, list potential
solutions to test in order to generate the best model

Understand that you`ll have to try out a few solutions before you
land on a good working model
 EDA helps you understand your data, but you can't yet
claim that patterns you find generalize until you check
those patterns against previously unseen data

Failure to check could lead you in the wrong direction or reinforce
stereotypes or bias
ML Bootcamp Sept 16 - Oct 7, 2023
Identifying good problems for ML

Characteristics of a good ML problem...
 Data, data, more data
 * ML requires a lot of relevant data
 Data collected specifically for your task is most useful

In practice, secondary data is used in majority of applications
 How much is a lot? - depends on the ML problem

but more data will improve your model (e.g, robustness) and
it's predictive power. A good rule of thumb is to have at least
000`s of examples for basic linear models, and 100`s of
000`s for neural networks. If you have less data, consider a
non-ML solution first and/or transfer learning methods
ML Bootcamp Sept 16 - Oct 7, 2023
Identifying good problems for ML

Characteristics of a good ML problem...
 Predictive Power
 * Your features should contain predictive power
 Ensure your data set contains relevant features that
correlate with the phenomenon being investigated

e.g, is bedroom count a good predictor for house prices?

Don`t try out features arbitrarily without a hypothesis
 Your goal is to build a model that generalizes well to
previously unseen samples and this is possible only
if you use the right features
ML Bootcamp Sept 16 - Oct 7, 2023
Identifying good problems for ML

Characteristics of a good ML problem...
 Predictions vs. Decisions
 * Aim to make decisions, not just predictions
 Your product take action on output of ML model

ML better at making decisions than deriving insight from
data (for the latter, use statistical approaches)

Ensure predictions allow you to take a useful action e.g,
a model that predicts likelihood of clicking certain videos
could allow a system to pre-fetch the videos most likely to
be clicked

ML Bootcamp Sept 16 - Oct 7, 2023

Examples of prediction / decision pairs

Prediction Decision
What video the learner Show those videos in the
wants to watch next recommendation bar

Probability someone will If P(click) > 0.12, prefetch

click on a search result. the web page

What fraction of a video If a small fraction, don't

ad the user will watch show the user the ad

ML Bootcamp Sept 16 - Oct 7, 2023

Hard ML problems

Clustering
 What does each cluster
mean in an unsupervised
learning problem? E.g, if
your model indicates that
the user is in the blue
cluster, you'll have to
determine what the blue
cluster represents
 Semi-supervised learning
may help

ML Bootcamp Sept 16 - Oct 7, 2023

Hard ML problems...

Anomaly detection
 how do you decide what constitutes an anomaly
to get labeled data?

ML Bootcamp Sept 16 - Oct 7, 2023

Hard ML problems...

Causation
 ML can identify correlations – mutual
relationships or connections between two or
more things. Determining causation (one event
or factor causing another) is harder. It is easy to
see that something happened, but much harder
to understand why it happened
 You can't determine causation from only
observational data – you need to run
experiments

ML Bootcamp Sept 16 - Oct 7, 2023

Hard ML problems...

No data
 if you have no data to train a model, then ML
cannot help you. Without data, use a simple,
heuristic, rule-based system
 Some new products with no training data start
with a heuristic rule system, and obtain training
data only after users interact with it

ML Bootcamp Sept 16 - Oct 7, 2023

Deciding to use ML

Set yourself up for success by thinking about these
things before trying to frame a problem for ML
 Start clearly / simply – what would you like the ML model to
do for you?

e.g. I want the ML model to predict the price of a house
 What is your ideal outcome?

e.g tourism recommendations – my ideal outcome is to suggest
tourism destinations that tourists find attractive and worth their
time and money
 Success and failure metrics

Quantify it, measurable, what output would you like the ML model
to produce (based on type of ML problem),
ML Bootcamp Sept 16 - Oct 7, 2023
Formulate problem as an ML problem
1) Suggested approach for framing ML problem
1) Articulate your problem
2) Start simple
3) Identify your data sources
4) Design your data for the model
5) Determine where data will comes from
6) Determine easily obtained inputs
7) Ability to Learn
8) Think about potential Bias
ML Bootcamp Sept 16 - Oct 7, 2023
Articulate your problem

Is it a classification, regression, clustering,
anomaly detection problem?

ML Bootcamp Sept 16 - Oct 7, 2023

Articulate your problem

Write down a succint problem statement
 e.g. Our problem is best framed as 3-class, single-
label classification, which predicts whether a video
will be in one of three classes—{very popular,
somewhat popular, not popular}—28 days after
being uploaded

ML Bootcamp Sept 16 - Oct 7, 2023

Start simple

Simply the problem further if possible e.g,
 We will predict whether an uploaded video is likely
to become popular or not (binary classification)
 We will predict an uploaded video’s popularity in
terms of the number of views it will receive within a
28 day window (regression)

Start by using the simplest model (baseline) possible for
your ML problem

ML Bootcamp Sept 16 - Oct 7, 2023

Identify your data sources

Provide answers to the following questions about your
labels:
 How much labeled data do you have?
 What is the source of your label?
 Is your label closely connected to the decision you will be
making?

Example
 Our data set consists of 100,000 examples about past
uploaded videos with popularity data and video descriptions.

ML Bootcamp Sept 16 - Oct 7, 2023

Design your Data for the Model

Identify the data that your ML system should
use to make predictions (input -> output),
Title Channel Upload time Uploaders recent Output
videos (label)
My silly cat Alice 2018-03-21 08:00 Another cat video, Very popular
yet another cat
A snake video Bob 2018-04-03 12;00 None Not popular

ML Bootcamp Sept 16 - Oct 7, 2023

Determine Where Data Comes From

Assess how much work it will take to develop a data
pipeline to construct each column for a row. When does
the example output become available for training
purposes?

Example
 We applied the labels {very popular, somewhat popular, not
popular} to each video that fell within a determined range of
views and "thumbs ups" and determined keyword descriptions
for each video. Hand-generating descriptions is not sustainable,
so we are considering adding a keyword description to the
upload form.

ML Bootcamp Sept 16 - Oct 7, 2023

Determine Easily Obtained Inputs

Pick 1-3 inputs that are easy to obtain and that
you believe would produce a reasonable, initial
outcome
 Consider the engineering cost to develop a data
pipeline to prepare the inputs, and the expected
benefit of having each input in the model

ML Bootcamp Sept 16 - Oct 7, 2023

Ability to Learn

Will the ML model be able to learn? List aspects
of your problem that might cause difficulty
learning. For example:
 The data set doesn't contain enough positive labels.
 The training data doesn't contain enough examples.
 The labels are too noisy.
 The system memorizes the training data, but has
difficulty generalizing to new cases.

ML Bootcamp Sept 16 - Oct 7, 2023

Think About Potential Bias

Many datasets are biased in some way. These
biases may adversely affect training and the
predictions made e.g,
 A biased data source may not translate across
multiple contexts
 The training sets may not be representative of the
ultimate users of the models and may therefore
provide them with a negative experience

ML Bootcamp Sept 16 - Oct 7, 2023

Conclusion

It is important to frame your problem properly
for ML


Not all problems require or need to be solved
using ML

ML Bootcamp Sept 16 - Oct 7, 2023

Quiz

Complete the quiz at this link
https://elearning.umu.ac.ug/mod/quiz/attempt.
php?attempt=15240&cmid=17874

ML Bootcamp Sept 16 - Oct 7, 2023

UE20CS302 Unit3 Slides
No ratings yet
UE20CS302 Unit3 Slides
308 pages
Machine Learning in Production
No ratings yet
Machine Learning in Production
31 pages
Ue22cs342aa2 20241114095341
No ratings yet
Ue22cs342aa2 20241114095341
23 pages
Gradient Descent
No ratings yet
Gradient Descent
18 pages
02 ML Supervised Learning
No ratings yet
02 ML Supervised Learning
32 pages
Improve Model Accuracy With Data Pre-Processing
No ratings yet
Improve Model Accuracy With Data Pre-Processing
11 pages
ML Lecture 15 Ensemble
No ratings yet
ML Lecture 15 Ensemble
27 pages
Mathematical Foundations of Machine Learning: (NMAG 469, FALL TERM 2018-2019)
No ratings yet
Mathematical Foundations of Machine Learning: (NMAG 469, FALL TERM 2018-2019)
74 pages
Introduction to Machine Learning
No ratings yet
Introduction to Machine Learning
30 pages
Lecture+Notes Intro To MLOps Session3
No ratings yet
Lecture+Notes Intro To MLOps Session3
8 pages
Analysis of ARIMA and GARCH Model
No ratings yet
Analysis of ARIMA and GARCH Model
14 pages
Seminar Report Machine Learning
No ratings yet
Seminar Report Machine Learning
20 pages
Introduction To Data Science Module 3
No ratings yet
Introduction To Data Science Module 3
24 pages
Artificial Intelligence Artificial Neural Networks - : Introduction
No ratings yet
Artificial Intelligence Artificial Neural Networks - : Introduction
43 pages
Presentation On ML
No ratings yet
Presentation On ML
469 pages
Energy Consumption Data Mining Techniques
No ratings yet
Energy Consumption Data Mining Techniques
18 pages
Module2 Ids 240201 162026
No ratings yet
Module2 Ids 240201 162026
11 pages
Class Material - 1
No ratings yet
Class Material - 1
66 pages
Supervised Vs Unsupervised Learning What S The Difference IBM 24062021 035331pm
No ratings yet
Supervised Vs Unsupervised Learning What S The Difference IBM 24062021 035331pm
9 pages
AI ML Roadmap
No ratings yet
AI ML Roadmap
4 pages
ML Cheatsheet Final
No ratings yet
ML Cheatsheet Final
32 pages
RL Introduction
No ratings yet
RL Introduction
225 pages
SMOTE For Imbalanced Classification With Python
No ratings yet
SMOTE For Imbalanced Classification With Python
8 pages
IndiaInvestments Wiki
No ratings yet
IndiaInvestments Wiki
432 pages
A Beginners Guide To Deep Reinforcement Learning PDF
No ratings yet
A Beginners Guide To Deep Reinforcement Learning PDF
9 pages
Machine Learning
No ratings yet
Machine Learning
20 pages
Machine Learning - 2 Books in 1 - The Complete Guide For Beginners To Master Neural Networks, Artificial Intelligence, and Data Science With Python (BooksRack - Net)
No ratings yet
Machine Learning - 2 Books in 1 - The Complete Guide For Beginners To Master Neural Networks, Artificial Intelligence, and Data Science With Python (BooksRack - Net)
201 pages
Knowledge Representation and Reasoning
No ratings yet
Knowledge Representation and Reasoning
155 pages
Real Estate ML Project Guide
No ratings yet
Real Estate ML Project Guide
20 pages
Reinforcement Learning - Introduction
No ratings yet
Reinforcement Learning - Introduction
19 pages
Testbank PyTorch Recipes ProblemSolution Approach To Build Train and Deploy Neural Network Models 2nd Edition Pradeepta Mishra Fast Access
No ratings yet
Testbank PyTorch Recipes ProblemSolution Approach To Build Train and Deploy Neural Network Models 2nd Edition Pradeepta Mishra Fast Access
327 pages
Cuestionarios IA
No ratings yet
Cuestionarios IA
17 pages
7-Knowledge Distillation
No ratings yet
7-Knowledge Distillation
29 pages
Notes PDF ML Day 4
100% (1)
Notes PDF ML Day 4
5 pages
Shreyash's Resume
No ratings yet
Shreyash's Resume
1 page
Unit 6 Mining Social Network Graph
No ratings yet
Unit 6 Mining Social Network Graph
9 pages
Unit 2
No ratings yet
Unit 2
64 pages
2010 12 bb224d4c
100% (1)
2010 12 bb224d4c
43 pages
Data Mining - Classification
No ratings yet
Data Mining - Classification
53 pages
Supervised, Unsupervised, and Reinforcement Learning - by Renu Khandelwal - Medium
No ratings yet
Supervised, Unsupervised, and Reinforcement Learning - by Renu Khandelwal - Medium
12 pages
Csps 1
100% (2)
Csps 1
62 pages
Andrea Martorana Tusa: Failure Prediction For Manufacturing Industry
No ratings yet
Andrea Martorana Tusa: Failure Prediction For Manufacturing Industry
23 pages
Lecture 9 PDF
100% (1)
Lecture 9 PDF
28 pages
Hw1 Theory Solution PuHK4fmHvB
No ratings yet
Hw1 Theory Solution PuHK4fmHvB
4 pages
Knowledge Discovery in Databases
No ratings yet
Knowledge Discovery in Databases
29 pages
Lecture Notes - Random Forests PDF
100% (1)
Lecture Notes - Random Forests PDF
4 pages
Week 4
No ratings yet
Week 4
101 pages
Collaborative Advantage - The Art of Alliances
No ratings yet
Collaborative Advantage - The Art of Alliances
14 pages
Unit 3 Modelling and Evaluation
No ratings yet
Unit 3 Modelling and Evaluation
40 pages
XAI Final
No ratings yet
XAI Final
18 pages
MLOPs
No ratings yet
MLOPs
20 pages
Lecture 14 - Logistic and Softmax Regression - Plain
No ratings yet
Lecture 14 - Logistic and Softmax Regression - Plain
12 pages
Quiz
No ratings yet
Quiz
2 pages
Deep Neural Networks Explained
No ratings yet
Deep Neural Networks Explained
12 pages
AI Bias V3
No ratings yet
AI Bias V3
16 pages
Graph Neural Network The Next Frontier in Deep Learning
No ratings yet
Graph Neural Network The Next Frontier in Deep Learning
1 page
DA Project Report
No ratings yet
DA Project Report
17 pages
Handling Imbalanced Datasets in Machine Learning - by Baptiste Rocca - Towards Data Science
No ratings yet
Handling Imbalanced Datasets in Machine Learning - by Baptiste Rocca - Towards Data Science
24 pages
CS178 Homework #1: Problem 0: Getting Connected
No ratings yet
CS178 Homework #1: Problem 0: Getting Connected
4 pages
How To Be A Good Machine Learning PM by Google Product Manager
No ratings yet
How To Be A Good Machine Learning PM by Google Product Manager
71 pages
Project Planning and Management 2013-12-19
No ratings yet
Project Planning and Management 2013-12-19
3 pages
Human - Resource - Management Exam Nov 2016
No ratings yet
Human - Resource - Management Exam Nov 2016
3 pages
Fundamental - Accounting - II Exam June 2011
No ratings yet
Fundamental - Accounting - II Exam June 2011
5 pages
Cost Accounting Test
No ratings yet
Cost Accounting Test
3 pages
Quran Teachings & Ethics Course
No ratings yet
Quran Teachings & Ethics Course
4 pages
Lesson Plan Pwim: Listening and Speaking Skills Listening and Speaking Skills
No ratings yet
Lesson Plan Pwim: Listening and Speaking Skills Listening and Speaking Skills
3 pages
Houhai English Beijing
No ratings yet
Houhai English Beijing
3 pages
Educational Management in Secondary Schools
No ratings yet
Educational Management in Secondary Schools
20 pages
Critique - Pandemic Teachers
No ratings yet
Critique - Pandemic Teachers
2 pages
Decision Trees
No ratings yet
Decision Trees
26 pages
Reviewer in Principle of Teaching
No ratings yet
Reviewer in Principle of Teaching
6 pages
Chap 3 Educ 103
No ratings yet
Chap 3 Educ 103
6 pages
Education Psychology CAT
No ratings yet
Education Psychology CAT
2 pages
Business
No ratings yet
Business
2 pages
Realization of The Housekeeping Students After Choosing The Housekeeping Track
No ratings yet
Realization of The Housekeeping Students After Choosing The Housekeeping Track
12 pages
English
No ratings yet
English
6 pages
DLL - Science 6 - Q3 - W4
No ratings yet
DLL - Science 6 - Q3 - W4
6 pages
303 Template
No ratings yet
303 Template
4 pages
Change Management Final Report
No ratings yet
Change Management Final Report
2 pages
Adult Learning
No ratings yet
Adult Learning
5 pages
Safe Virtual Learning Tips
No ratings yet
Safe Virtual Learning Tips
21 pages
Tle 8 BPP 8 q1 Week 1 Day 1 Module 1 Slmqa
No ratings yet
Tle 8 BPP 8 q1 Week 1 Day 1 Module 1 Slmqa
11 pages
Training Creativity and Innovation Workshop
No ratings yet
Training Creativity and Innovation Workshop
4 pages
Science Action Plan for Schools
100% (3)
Science Action Plan for Schools
5 pages
Immediate Access Financial Institutions Management A Risk Management Approach 11th Edition Saunders Verified PDF Download
No ratings yet
Immediate Access Financial Institutions Management A Risk Management Approach 11th Edition Saunders Verified PDF Download
406 pages
Teaching Strategies in English Subject
No ratings yet
Teaching Strategies in English Subject
8 pages
Grade IX GP Planner Week 14 and 15
No ratings yet
Grade IX GP Planner Week 14 and 15
4 pages
eIPCRFv4.3-SY 2024-2025 (Annex D)
No ratings yet
eIPCRFv4.3-SY 2024-2025 (Annex D)
17 pages
6 National Creativity Aptitude Test: Present
No ratings yet
6 National Creativity Aptitude Test: Present
4 pages
Grade 3 Term 2 Environmental Schemes
No ratings yet
Grade 3 Term 2 Environmental Schemes
19 pages
Besc 131em2024 25baggsph@9891268050
No ratings yet
Besc 131em2024 25baggsph@9891268050
13 pages
SG Kindergarten Lesson Plan
100% (1)
SG Kindergarten Lesson Plan
3 pages
B.Ed Programs: AIOU Autumn 2020
No ratings yet
B.Ed Programs: AIOU Autumn 2020
41 pages
Abdulnasser A. Alhusaini, M.A
No ratings yet
Abdulnasser A. Alhusaini, M.A
31 pages