Research Methods in Machine Learning - Copy
Research Methods in Machine Learning - Copy
Machine Learning
Tom Dietterich
Distinguished Professor (Emeritus)
Oregon State University
Corvallis, OR USA
New in ML 2019 1
Plan for Today
• Choosing and Solving a Research Problem
• Research Life Cycle
• Exercise 1: What is the position of your project in the life cycle?
• Corresponding Skills
• Exercise 2: Skills inventory
• Write a Successful (NeurIPS) Paper
• Process
• Structure
• Analysis of an Example paper
• Exercise 3: Parsing it into the provided structure
• Writing tips
• Wrap up
New in ML 2019 2
Download
• These slides: http://web.engr.oregonstate.edu/~tgd/talks/new-in-ml-
2019.pdf
• This paper: https://arxiv.org/abs/1809.03113
New in ML 2019 3
Choosing and Solving a Research Problem
Research Life Cycle
Exploratory Initial Refinement Competing Mapping the Engineering
Research Solutions & Evaluation Solutions & Solution & Technology
Comparative Space Transfer
Evaluation
New in ML 2019 4
Example 1: Representation and Algorithms for
Probabilistic Graphical Models
New in ML 2019 5
Example 2: Adversarial Test Queries
Exploratory Initial Refinement Competing Mapping the Engineering
Research Solutions & Evaluation Solutions & Solution & Technology
Comparative Space Transfer
Evaluation
New in ML 2019 6
Exploratory Research
• Defining new problems, new constraints, new opportunities, new
approaches
• Example:
• Multiple-Instance Learning: Labeled bags of instances
• Adversarial examples
• Break out of established paradigms by changing the problem definition
• Examples:
• Transfer learning and domain adaptation: multiple, related learning problems
• Feed forward neural networks: beyond traditional statistical models
• Risks:
• It might not be an important problem
• Need to convince readers it is an important problem
• “Science advances funeral by funeral” (Paul Samuelson gisting Max Planck)
• Benefits
• It is a critical path to major progress in a field
New in ML 2019 7
Drivers of Exploratory Research
• Novel applications
• Multiple instance learning grew out of attempting to apply ML to
drug design
• Mathematical advances and insights
• Support vector machines combined two previous directions
• mathematical programming to classification (Mangasarian et al)
• Vapnik’s insight that the hinge loss is convex
• Kimeldorf & Wahba: representer theorem for spline kernels
• Random Forests grew out of Breiman’s intuitions concerning the
bias-variance tradeoff and stabilization methods
• Importing ideas from other fields
• Convex optimization
• Variational methods from physics
• Mathematics: theory of statistics, information theory, control
theory, ODEs, real analysis, functional analysis, etc.
• Frustration
• AutoML grew out of the pain of tweaking hyper-parameters
New in ML 2019 8
Initial Solutions
• Provide an initial solution to a problem
• Often very narrow or overly complex
• Examples:
• First paper on PAC learning (Valiant, 1984) proved a result for very limited and
impractical cases: k-CNF and monotone DNF
• First paper on multiple instance learning (Dietterich et al, 1997) presented a very
baroque algorithm that combined kernel density estimation with axis-parallel
rectangles
• First paper on Bayesian networks (Pearl 1985) described simple message passing for
tree-structured networks
• Notes
• It is often difficult to propose a new problem definition without also proposing an
initial solution
• Exception: Adversarial examples
• “Nothing stimulates good research like a bad paper about an interesting problem”
(Dietterich)
New in ML 2019 9
Refinement and Evaluation
• Develop refinements of the initial solution
• Fast gradient sign method made it easier to study adversarial examples
• Study the generality and scope of the phenomenon
• Demonstration that adversarial examples exist for many ML classifiers
• random forests, SVMs, etc.
• Demonstration that adversarial examples transfer across classifier types
• Demonstration that simple defenses can easily be evaded
• Develop refinements of the initial evaluation metrics
• Notes:
• The initial authors have a competitive advantage here, if they can grasp it
• Otherwise, it can be a race (favors large groups, not PhD students)
• Lots of creativity is required to ask the right questions about generality and scope
New in ML 2019 10
Competing Solutions and Comparative Biggio & Roli, 2018
Evaluation
• Sequences of improvements and
alternatives are published
• Each is typically compared to
previous methods
• Periodically, it is valuable to conduct
a careful benchmark comparison
New in ML 2019 11
The Incremental Improvement Space can be Very
Crowded
• Example: Generative Adversarial
Networks
• Risks:
• Small improvements are rarely
worthwhile (unless they also provide
some general insight)
• Depends heavily on metrics which may
not reflect real applications (AUC, BLEU)
• Can get scooped easily
• Favors large teams (not PhD students)
• Advantages:
• It is easy
• It feels like we are making progress
• Notes:
• Improvements should be guided by
principles: Don’t search in the space of
mechanisms https://github.com/hindupuravinash/the-gan-zoo
New in ML 2019 12
The Illusion of Progress
• Evaluation metrics for GANs are
notoriously “soft”
• It seems that a lot of effort was
expended for relatively little gain
New in ML 2019 13
But Progress Can Also Be Real
• Recht et al. constructed new
test sets for ImageNet and
evaluated a wide range of
published networks
“accuracy gains on the original test sets translate to larger
• Performance was not as good gains on the new test sets”
as on the original test set, but
this is probably due to the
new test sets being more
difficult
New in ML 2019 14
Mapping the Solution Space
• Can we understand the design space
for a problem?
• Place all algorithms into a single
framework
• What are the key design decisions?
• What are lower bounds on the best any
method can do?
• Example:
• Wainwright developed a comprehensive
theory of the geometry of message Wainwright et al (2008)
passing in Bayesian networks http://www.maths.dur.ac.uk/lms/087/Talks/wainwright.pdf
New in ML 2019 15
Engineering and Technology Transfer
New in ML 2019 16
Exercise 1: Life Cycle Position
Exploratory Initial Refinement Competing Mapping the Engineering
Research Solutions & Evaluation Solutions & Solution & Technology
Comparative Space Transfer
Evaluation
New in ML 2019 17
Different Life Cycle Phases Require Different Skills
Exploratory Initial Solutions Refinement & Competing Mapping Engineering &
Research Evaluation Solutions Solution Space Deployment
Reading X X X X X
Literature
Analysis X X X
Techniques
Theorem X X X
Formulation
Algorithm design X X X X
Coding in DL X X X X
Frameworks
Experiment X X X
design
Story Telling X X X X X X
English Skills X X X X X X
Giving Talks X X X X X X
New in ML 2019 18
Exercise 2: Skills Inventory
SKILLS • Working alone (or in groups) list the skills you need for
Reading
your project
Literature
Analysis
• These can be more specific than my list
Techniques
Theorem
• Assess your skill level for each of them
Formulation
Algorithm design
• Today (or, more likely, later) develop a plan for addressing
any skill gaps
Coding & Testing
• Taking classes (math background, story telling, English skills)
Coding in DL • Studying examples (theorems, proof techniques, code on github)
Frameworks
Experiment • Many universities provide tutoring with writing
design
• Practice (giving talks)
Story Telling
English Skills
Giving Talks
New in ML 2019 19
Part 2: Writing Good Papers
• You have chosen an important problem that matches your interests
and skill set
• You have results
• Time to publish!
New in ML 2019 20
Paper = Claim + Evidence + Story
• Introduction:
• What problem are you attacking?
• Why is it important?
• What is known already? (summary)
• What aspects are still unsolved? What are the shortcomings of existing
solutions? (summary)
• What claim(s) are you making?
• What evidence will you present?
• What conclusions do you draw? “No suspense”
• Current state of knowledge about the problem
• Review of existing work
• Existing solutions and their shortcomings
New in ML 2019 21
Body: Theoretical Claims and Evidence
• Notation and definitions
• Previous results you will be using
• Qualitative analysis: What kind of result can we expect for this kind of
problem?
• Statement of result (theorem)
• Sketch of proof (usually put full proof in appendix)
• Discussion of assumptions and limitations of the result
• Comparison with related results, especially if they are not directly
comparable
New in ML 2019 22
Body: Algorithm/Method Paper
• Definitions and notation
• Qualitative analysis: What kind of result can we expect for this kind of
problem?
• Description of previous algorithm ideas that you will be using
• Overview of your approach: what is the key insight?
• Description of the algorithm with pseudo-code
• Discussion of configuration and hyper-parameter tuning
• Discussion of asymptotic computational complexity (if it is non-obvious or
looks like it might raise scaling issues)
• Discussion of assumptions and limitations of the approach
New in ML 2019 23
Body: Experimental Evidence
• Goal of the experiment (e.g., what are the research questions you the
experiments try to answer?)
• Experiment structure
• Data sets
• Algorithms being compared (including baselines and oracles)
• Manipulations (independent variables being manipulated)
• Evaluation metrics
• Analytical plan (e.g., statistical testing)
• Results of the experiments
• Always include an assessment of uncertainty (confidence intervals, posterior
distribution, statistical tests)
• Discussion of the results
• Explain the relationship between the results and the research questions and claims
of the paper
New in ML 2019 24
Concluding Remarks
• The actual conclusions should be in the introduction
• Concluding remarks can discuss the broader significance of the claims
as well as open problems
New in ML 2019 25
Telling a Story
• For a complex theorem or a complex algorithm, you will want to
“build it” incrementally
• Example:
• Describe a clean, simplified algorithm
• may only work for special cases
• may not be computationally tractable
• Then introduce refinements and approximations
• how to handle more complex cases
• approximations that make it feasible
New in ML 2019 26
The writing process
New in ML 2019 27
Developing the Story
• It is often difficult to structure the story
• When multiple claims and experiments are Try giving a talk
interdependent, you need to find a
sequential order in which to present them
• I find it useful to try giving a talk
• Forces a sequential order
• I interleave this with trying to write the
introduction
Try writing the
• Alternatively create a poster and then introduction
explain it to five different people
• Helps find holes, figure out what questions
people have
• Note: The story is NOT about the
sequence in which the research was
done
New in ML 2019 28
The Abstract and Introduction
• These are the first things you write...
…and the last things you write
New in ML 2019 29
Mistakes to avoid
• Popularity is not a good reason to work on a topic.
• “Adversarial examples have received a lot of attention lately” NO!
• “Adversarial examples demonstrate the vulnerability of machine learning
system to cyberattack” YES!
• Don’t hype the novelty – State the result
• “We show here, for the first time, that ...” NO!
• “Our method provides a non-trivial robustness guarantee on Imagenet, which
has been beyond the capability of previous methods...” YES!
New in ML 2019 30
Paper-Driven Revision of the Method
• Advice I once received from Peter Hart:
• Sometimes as you are writing the paper, you realize it
would be a lot easier to write if the algorithm (or the
experiments) had been slightly different
• If so, fix the algorithm (redo-the experiments) so it is
easier to describe
New in ML 2019 31
Exercise 3: Analyzing a Paper
• Download
https://arxiv.org/abs/1809.03113
• Skim the paper and fill out the following
page
New in ML 2019 32
Paper Analysis
Facet Notes
Problem:
Importance:
Claims:
State of Knowledge:
Evidence: Theoretical
Evidence: Empirical
Story Structure
New in ML 2019 33
Test time robustness
• Given a query 𝑥𝑥𝑞𝑞
• Run it through the network 𝑀𝑀 times, each
time adding a different Gaussian
perturbation 𝛿𝛿𝑚𝑚 and observe the resulting
prediction 𝑦𝑦𝑚𝑚
• Predict the most common prediction
𝑀𝑀
1
𝑝𝑝𝑗𝑗 = � 𝕀𝕀 𝑓𝑓̂ 𝑥𝑥𝑞𝑞 + 𝛿𝛿𝑚𝑚 = 𝑗𝑗 𝛿𝛿𝑚𝑚 ∼ 𝑁𝑁 0, 𝜎𝜎 2 𝐼𝐼
𝑀𝑀
𝑚𝑚=1
New in ML 2019 35
Analysis
• Problem: Adversarial examples at test time
• Importance: No justification (assumes reader already agrees)
• Claims:
• Claim 1: A better adversarial robustness guarantee than [18]
• Claim 2: Training strategy inspired by the analysis that improves the bounds in
practice
• Evidence:
• Formal statement of robustness result with proof
• Experimental evaluation of Stability Training with Noise (STN)
New in ML 2019 36
State of Knowledge
• Robustness to changing distributions
• Criticism: "divergence between distributions is rarely used as an empirical
measure of strength of adversarial attacks" is weak. Popularity is not a good
scientific reason to study something
• Existing guarantees only work under narrow conditions
• single hidden layer ReLU
• feed-forward only
• These methods have been generalized somewhat. [18] connects
adversarial robustness to differential privacy but the bound is loose
• Previous analysis has used concentration of measure; we use Renyi
divergence instead
New in ML 2019 37
Story Structure
• Previous work
• Preliminaries
• Notation
• Concepts (Renyi Divergence)
• State of the art in provable robustness
• State of the art in empirical robustness
• Test time robustness:
• Main theoretical result and proof
• Training time robustness:
• Stability training with noise
• Experiments
• Test time certificates of robustness
• Training time experimental robustness (with and without test time robustness)
New in ML 2019 38
Life Cycle Stage
• Competing Solutions
• Is it time for a comprehensive evaluation?
• Create a web site where controlled experiments can be run?
New in ML 2019 39
Exercise 4: To do at home
• Outline one of your papers using this framework
New in ML 2019 40
Writing Hints to Study At Home
New in ML 2019 41
Citations in Text
• Do not treat a citation as a word in a sentence
• WRONG: “[5] has shown that decision trees can match the accuracy of MLPs”
• RIGHT: “Dietterich [5] has shown that decision trees can match the accuracy
of MLPs”
• This treats people as doing the research rather than papers
• Make captions self-contained so that the reader can understand a
figure by reading the caption
New in ML 2019 42
Dietterich’s Rules of English
1. Avoid "use". Try "apply", 5. Obey parallel form: "The project
"employ", "select", "perform", seeks to develop new methods
"execute", "choose", "evaluate", and to implement them.“
etc. • Parallelize on infinitives ("to develop",
"to implement"), on noun phrases
2. "Utilize" should refer only to ("seeks to develop new algorithms,
resources ("...fully utilize memory new implementations, and new
bandwidth...") results"), on relative clauses ("a new
method that will optimize productivity,
3. Avoid contractions. that will account for computation
4. Use English equivalents of Latin requirements, and that will minimize
phrases outside of parentheses. communication costs.") and on
prepositional phrases (see this
• Replace "etc." with "and so on", "i.e." sentence itself).
with "that is", "e.g." with "for
example", and "vs." with "versus".
New in ML 2019 43
Dietterich’s Rules of English: Common Word
Problems
• "affect" (verb) versus "effect" (noun). identity of the subject of the sentence.
• "that" (introduces restrictive relative • "between" (relates 2 things) versus
clause) versus "which" (introduces "among" (relates >2 things).
unrestrictive relative clause). • Possessive pronouns. Compare "it's"
• Example: "The First iteration that finds and "its", "who's" and "whose". The
a non-null element causes an error possessive forms are "its" and "whose".
message to be displayed.". The phrase The others are contractions ("it's"
"that finds a non-null element" helps means "it is", "who's" means "who is").
identify the iteration in question. If we • Use "or" only when you mean it. Often
omit this phrase, the meaning is lost. "and" is clearer.
• However, consider "Our Meiko CS-2, "led" is the past tense of "lead". "lead"
which was funded by a grant from NSF, • (pronounced like "led") is a chemical
has sixteen high-speed processors." The element with a rather low melting
phrase "which was funded by a grant point.
from NSF" tells us something incidental
to the main clause. It can be deleted
without creating confusion about the
New in ML 2019 44
Dietterich’s Rules of English: Common Syntax
Problems
• A colon must be preceded by a complete clause. "There are • Semicolons. These are used to separate two closely-related
three methods: walking, running, and flying." is correct. "The complete sentences. "Processor speed must be more than a
three methods are: walking, running, and flying." is wrong. single number describing a computer; it must be a function of
the work being done."
• Commas separate complete clauses (typically introduced by
"and", "but", "therefore", "because", "since", etc.). "This • "em" dash (“—”). These are very emphatic separators. They can
proposal shows important problems, and it presents several separate complete sentences or just sentence fragments.
solutions." If the "it" is deleted, the comma preceding the "Vector units—such as the 100Mflops units on the Meiko CS-
"and" should be deleted also. 2—complicate the analysis." "It is difficult to see how to
proceed—something must be done!"
• Commas set off lead-in phrases. "In this proposal, we
discuss...". • Hyphens. These are used to prevent ambiguity, especially for
compound adjectives. "low-latency connection", "run-time
• Commas separate lists of three or more items. "Walking, performance", "machine-learning algorithm", and "problem-
running, and flying." solving system" are examples. Note that when these are not
• Commas set off non-restrictive clauses. "This proposal, which used as adjectives, they are not hyphenated. "The connection
was written for CS519, is excellent." has low latency." "The code is executed at run time." Beware of
the word "speedup". It is never hyphenated. "Speedup" is a
• Commas break up competing adjectives. "A large, very red car" quantity (e.g., "a speedup of 25.") or an adjective ("speedup
or "Object-based, portable, programming environment.“ learning"). "Speed up" is a verb (e.g., "We must find a way to
speed up this algorithm.").
• The word "each" is wonderful. It lets you switch from plural to
singular to avoid ambiguity. "The computer contains 16
processors, each of which has two vector units."
New in ML 2019 45
Wrap Up
• Research Life Cycle
• Skills
New in ML 2019 46