0% found this document useful (0 votes)

33 views136 pages

Cranmer ML SymbolicRegression

Uploaded by

Spencer Xu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

33 views136 pages

Cranmer ML SymbolicRegression

Uploaded by

Spencer Xu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 136

Symbolic Regression for Interpretable Machine Learning

Kouzou Sakai
for Quanta Magazine

University of Cambridge
Miles Cranmer Assistant Prof, DAMTP & IoA
What I want:
What I want:
I want an AI scientist.
What I want:
I want an AI scientist.
What I want:
I want an AI scientist.

Machine learning research:

What I want:
I want an AI scientist.

Machine learning research:

• Driven mostly by computer vision/NLP benchmarks
What I want:
I want an AI scientist.

Machine learning research:

• Driven mostly by computer vision/NLP benchmarks
• Motivated by industry interests, robotics
What I want:
I want an AI scientist.

Machine learning research:

• Driven mostly by computer vision/NLP benchmarks
• Motivated by industry interests, robotics
• Attempts to reach “human-level performance”
What I want:
I want an AI scientist.

Machine learning research:

• Driven mostly by computer vision/NLP benchmarks
• Motivated by industry interests, robotics
• Attempts to reach “human-level performance”
• Narrow stepping stone benchmarks along the way.
Problem:
• Much of ML applied to science takes such approaches, and replaces the datasets
with scienti c ones.
fi
What I want:
What I want:
Instead of vision/language, want AI to reach human-level performance at
research in the natural sciences
What I want:
Instead of vision/language, want AI to reach human-level performance at
research in the natural sciences
What I want:
Instead of vision/language, want AI to reach human-level performance at
research in the natural sciences

What needs to happen?

What I want:
Instead of vision/language, want AI to reach human-level performance at
research in the natural sciences

What needs to happen?

• Natural science is not a regression problem. Need understanding.
What I want:
Instead of vision/language, want AI to reach human-level performance at
research in the natural sciences

What needs to happen?

• Natural science is not a regression problem. Need understanding.
• We need to be able to use machine learning for discovering universal concepts
and theories, and representing them in human language
How?
Traditional approach to physics:

Data
Theory
(low dim.) Describe

(May be summary statistics)

Empirical t: Kepler’s third law

2 3
P ∝a
fi
Empirical t: Kepler’s third law

2 3
P ∝a

Newton’s law of
gravitation,
to explain it
fi
Empirical t: Kepler’s third law Planck’s law
−1
2 3 ( kBT )
P ∝a c ( )
3
2hν hν
B= 2 exp −1

(Partially)

Newton’s law of Quantum

gravitation, mechanics,
to explain it to explain it
fi
Empirical t: Kepler’s third law Planck’s law Neural
−1
Network
2 3 ( kBT )
P ∝a c ( )
3
2hν hν Weights
B= 2 exp −1

(Partially)

Newton’s law of Quantum

gravitation, mechanics,
to explain it to explain it
fi
Empirical t: Kepler’s third law Planck’s law Neural
−1
Network
2 3 ( kBT )
P ∝a c ( )
3
2hν hν Weights
B= 2 exp −1

(Partially)

Newton’s law of Quantum

gravitation, mechanics, ???
to explain it to explain it
fi
How?
Traditional approach to science:

Data
Theory
(low dim.) Describe

(May be summary statistics)

Era of AI?

Data
Neural Net? Theory
(high dim.) Compress Distill
Key point
Key point

Neural nets trained on big datasets can nd new insights.

fi
Key point

Neural nets trained on big datasets can nd new insights.

The remaining challenge is distilling the insights to our language.

fi
Outline

• Interpretability
• Symbolic regression
• Symbolic distillation
• Examples
• Future
CV/NLP strategy of interpretability
Typically involves feature importance

Omeiza et al., 2019

Ribeiro et al., 2016
Science already has a modeling language
Science already has a modeling language
Computer Vision
Science already has a modeling language
Computer Vision

???
Science already has a modeling language
Computer Vision Science

???
We should build interpretations in this existing
language: mathematical expressions!
• For physical problems, even if it is not the “true"
expression, analytic models can often generalize
better than neural networks! (See M. Cranmer+2020)
• For physical problems, even if it is not the “true"
expression, analytic models can often generalize
better than neural networks! (See M. Cranmer+2020)
• For physical problems, even if it is not the “true"
expression, analytic models can often generalize
better than neural networks! (See M. Cranmer+2020)

• This is a type of inductive bias: searching for models

represented as sparse combination of analytic
operators hold geometrical and physical signi cance

fi
• For physical problems, even if it is not the “true"
expression, analytic models can often generalize
better than neural networks! (See M. Cranmer+2020)

• This is a type of inductive bias: searching for models

represented as sparse combination of analytic
operators hold geometrical and physical signi cance

+ × exp
Length Solution to
Translation => Area common ODE
=> Volume y’ ~ y

fi
Symbolic regression
Symbolic regression is a machine learning task, where the objective is to nd
analytic expressions that optimize some objective.
• Popularized by Koza (1990s); and its use in science by Lipson (2000s)

Kouzou Sakai
for Quanta Magazine

fi
SOTA = genetic algorithm
Jointly optimize
accuracy & complexity

Complexity is user-de ned,

but usually = number of nodes
fi
High-level open-source frameworks:

github.com/MilesCranmer/SymbolicRegression.jl/

= MLJ interface
(main search code)

github.com/MilesCranmer/PySR/

= Scikit-Learn wrapper
Build your own symbolic regression algorithm!
github.com/SymbolicML/DynamicExpressions.jl/

github.com/SymbolicML/DynamicQuantities.jl/
Age-Regularized Multi-Population Evolution in PySR

Cranmer, 2023 - arxiv.org/abs/2305.01582

Model discovery at scale:

• Each island evolves independently on a

single core.
• Scale up to ~1000s of cores (=1000s of
independent populations)
• Asynchronous migration between
populations

Migration step
Python API
Dimensional constraints
Custom objectives
“Can I make it so that my equation has exactly 2 sinusoids?” Yes!
https://arxiv.org/abs/2305.01582
Selection of user-contributed
publications that have used
symbolic distillation/PySR/
SymbolicRegression.jl:
astroautomata.com/PySR/papers
Selection of user-contributed
publications that have used
symbolic distillation/PySR/
SymbolicRegression.jl:
astroautomata.com/PySR/papers
We can use Symbolic Regression to Distill a
Neural Network into an Analytic Expression
How this works:
Cranmer et al., 2019, 2020 – Work with: Alvaro
Sanchez-Gonzalez, Shirley Ho, Peter Battaglia,
Kyle Cranmer, David Spergel, Rui Xu

1. Train NN normally,
and freeze parameters.
How this works:
Cranmer et al., 2019, 2020 – Work with: Alvaro
Sanchez-Gonzalez, Shirley Ho, Peter Battaglia,
Kyle Cranmer, David Spergel, Rui Xu

x1 y1
x2 y2
… …

1. Train NN normally, 2. Record input/outputs of

and freeze parameters. network over training set.
How this works:
Cranmer et al., 2019, 2020 – Work with: Alvaro
Sanchez-Gonzalez, Shirley Ho, Peter Battaglia, x1 y1
Kyle Cranmer, David Spergel, Rui Xu
x2 y2
… …

PySR

[y]1 = cos(2.1 ⋅ [x]3) − [x]4

x1 y1
x2 y2
… …

1. Train NN normally, 2. Record input/outputs of 3. Fit the input/outputs of the

and freeze parameters. network over training set. neural network with PySR
Full Symbolic Distillation
Full Symbolic Distillation

Uses features
Learns features?
for calculation?
Full Symbolic Distillation

Uses features
Learns features?
for calculation?
Full Symbolic Distillation
Full Symbolic Distillation

Re-train g, to pick up any errors

in the approximation of f
Full Symbolic Distillation
Full Symbolic Distillation
Full Symbolic Distillation
Full Symbolic Distillation
Full Symbolic Distillation

(g ∘ f )(x1, x2, x3, x4) =

Full Symbolic Distillation

(g ∘ f )(x1, x2, x3, x4) =

Fully-interpretable approximation of the original neural network!

Full Symbolic Distillation

(g ∘ f )(x1, x2, x3, x4) =

Fully-interpretable approximation of the original neural network!

2
(Searching over n expressions → Searching over 2n expressions)
Inductive bias
• Introducing some form of inductive bias is needed to eliminate the
functional degeneracy. For example:

Xi y
Inductive bias
• Introducing some form of inductive bias is needed to eliminate the
functional degeneracy. For example:

Xi y

∑
i

(the latent space between f and g could have some aggregation over a set)
Recall:
Traditional approach to science:

Data
Theory
(low dim.) Describe

(May be summary statistics)

Recall:
Traditional approach to science:

Data
Theory
(low dim.) Describe

(May be summary statistics)

Era of AI?

Data
Neural Net? Theory
(high dim.) Compress Distill
Some examples:
with Alvaro Sanchez Gonzalez, Peter Battaglia, Rui Xu, Kyle Cranmer,
David Spergel, Shirley Ho; (NeurIPS 2020)
Knowledge Discovery
• Predict the dark matter properties in a
simulation with a graph neural network:

Self-supervised
(predict neighbors)
Example 2:
Discovering Orbital Mechanics
Example 2:
Discovering Orbital Mechanics

Can we learn Newton’s law of gravity by modelling the solar system

with a graph neural network?

Unknown masses, and unknown dynamical model.

Example 2:
Discovering Orbital Mechanics

Can we learn Newton’s law of gravity by modelling the solar system

with a graph neural network?

Unknown masses, and unknown dynamical model.

“Rediscovering orbital mechanics with machine learning” (2022)

Pablo Lemos, Niall Jeffrey, Miles Cranmer, Shirley Ho, Peter Battaglia
Next: interpretation