[go: up one dir, main page]

0% found this document useful (0 votes)
33 views136 pages

Cranmer ML SymbolicRegression

Uploaded by

Spencer Xu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
33 views136 pages

Cranmer ML SymbolicRegression

Uploaded by

Spencer Xu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 136

Symbolic Regression for Interpretable Machine Learning

Kouzou Sakai
for Quanta Magazine

University of Cambridge
Miles Cranmer Assistant Prof, DAMTP & IoA
What I want:
What I want:
I want an AI scientist.
What I want:
I want an AI scientist.
What I want:
I want an AI scientist.

Machine learning research:


What I want:
I want an AI scientist.

Machine learning research:


• Driven mostly by computer vision/NLP benchmarks
What I want:
I want an AI scientist.

Machine learning research:


• Driven mostly by computer vision/NLP benchmarks
• Motivated by industry interests, robotics
What I want:
I want an AI scientist.

Machine learning research:


• Driven mostly by computer vision/NLP benchmarks
• Motivated by industry interests, robotics
• Attempts to reach “human-level performance”
What I want:
I want an AI scientist.

Machine learning research:


• Driven mostly by computer vision/NLP benchmarks
• Motivated by industry interests, robotics
• Attempts to reach “human-level performance”
• Narrow stepping stone benchmarks along the way.
What I want:
I want an AI scientist.

Machine learning research:


• Driven mostly by computer vision/NLP benchmarks
• Motivated by industry interests, robotics
• Attempts to reach “human-level performance”
• Narrow stepping stone benchmarks along the way.
Problem:
What I want:
I want an AI scientist.

Machine learning research:


• Driven mostly by computer vision/NLP benchmarks
• Motivated by industry interests, robotics
• Attempts to reach “human-level performance”
• Narrow stepping stone benchmarks along the way.
Problem:
• Much of ML applied to science takes such approaches, and replaces the datasets
with scienti c ones.
fi
What I want:
What I want:
Instead of vision/language, want AI to reach human-level performance at
research in the natural sciences
What I want:
Instead of vision/language, want AI to reach human-level performance at
research in the natural sciences
What I want:
Instead of vision/language, want AI to reach human-level performance at
research in the natural sciences

What needs to happen?


What I want:
Instead of vision/language, want AI to reach human-level performance at
research in the natural sciences

What needs to happen?


• Natural science is not a regression problem. Need understanding.
What I want:
Instead of vision/language, want AI to reach human-level performance at
research in the natural sciences

What needs to happen?


• Natural science is not a regression problem. Need understanding.
• We need to be able to use machine learning for discovering universal concepts
and theories, and representing them in human language
How?
Traditional approach to physics:

Data
Theory
(low dim.) Describe

(May be summary statistics)


Empirical t: Kepler’s third law

2 3
P ∝a
fi
Empirical t: Kepler’s third law

2 3
P ∝a

Newton’s law of
gravitation,
to explain it
fi
Empirical t: Kepler’s third law Planck’s law
−1
2 3 ( kBT )
P ∝a c ( )
3
2hν hν
B= 2 exp −1

Newton’s law of
gravitation,
to explain it
fi
Empirical t: Kepler’s third law Planck’s law
−1
2 3 ( kBT )
P ∝a c ( )
3
2hν hν
B= 2 exp −1

(Partially)

Newton’s law of Quantum


gravitation, mechanics,
to explain it to explain it
fi
Empirical t: Kepler’s third law Planck’s law Neural
−1
Network
2 3 ( kBT )
P ∝a c ( )
3
2hν hν Weights
B= 2 exp −1

(Partially)

Newton’s law of Quantum


gravitation, mechanics,
to explain it to explain it
fi
Empirical t: Kepler’s third law Planck’s law Neural
−1
Network
2 3 ( kBT )
P ∝a c ( )
3
2hν hν Weights
B= 2 exp −1

(Partially)

Newton’s law of Quantum


gravitation, mechanics, ???
to explain it to explain it
fi
How?
Traditional approach to science:

Data
Theory
(low dim.) Describe

(May be summary statistics)

Era of AI?

Data
Neural Net? Theory
(high dim.) Compress Distill
Key point
Key point

Neural nets trained on big datasets can nd new insights.

fi
Key point

Neural nets trained on big datasets can nd new insights.

The remaining challenge is distilling the insights to our language.

fi
Outline

• Interpretability
• Symbolic regression
• Symbolic distillation
• Examples
• Future
CV/NLP strategy of interpretability
Typically involves feature importance

Omeiza et al., 2019


Ribeiro et al., 2016
Science already has a modeling language
Science already has a modeling language
Computer Vision
Science already has a modeling language
Computer Vision

???
Science already has a modeling language
Computer Vision Science

???
Science already has a modeling language
Computer Vision Science

???
We should build interpretations in this existing
language: mathematical expressions!
• For physical problems, even if it is not the “true"
expression, analytic models can often generalize
better than neural networks! (See M. Cranmer+2020)
• For physical problems, even if it is not the “true"
expression, analytic models can often generalize
better than neural networks! (See M. Cranmer+2020)
• For physical problems, even if it is not the “true"
expression, analytic models can often generalize
better than neural networks! (See M. Cranmer+2020)

• This is a type of inductive bias: searching for models


represented as sparse combination of analytic
operators hold geometrical and physical signi cance

fi
• For physical problems, even if it is not the “true"
expression, analytic models can often generalize
better than neural networks! (See M. Cranmer+2020)

• This is a type of inductive bias: searching for models


represented as sparse combination of analytic
operators hold geometrical and physical signi cance

+ × exp
Length Solution to
Translation => Area common ODE
=> Volume y’ ~ y

fi
Symbolic regression
Symbolic regression is a machine learning task, where the objective is to nd
analytic expressions that optimize some objective.
• Popularized by Koza (1990s); and its use in science by Lipson (2000s)

fi
Symbolic regression
Symbolic regression is a machine learning task, where the objective is to nd
analytic expressions that optimize some objective.
• Popularized by Koza (1990s); and its use in science by Lipson (2000s)

Kouzou Sakai
for Quanta Magazine

fi
SOTA = genetic algorithm
Jointly optimize
accuracy & complexity

Complexity is user-de ned,


but usually = number of nodes
fi
High-level open-source frameworks:

github.com/MilesCranmer/SymbolicRegression.jl/

= MLJ interface
(main search code)

github.com/MilesCranmer/PySR/

= Scikit-Learn wrapper
Build your own symbolic regression algorithm!
github.com/SymbolicML/DynamicExpressions.jl/

github.com/SymbolicML/DynamicQuantities.jl/
Age-Regularized Multi-Population Evolution in PySR

Cranmer, 2023 - arxiv.org/abs/2305.01582


Model discovery at scale:

• Each island evolves independently on a


single core.
• Scale up to ~1000s of cores (=1000s of
independent populations)
• Asynchronous migration between
populations

Migration step
Python API
Dimensional constraints
Custom objectives
“Can I make it so that my equation has exactly 2 sinusoids?” Yes!
https://arxiv.org/abs/2305.01582
Selection of user-contributed
publications that have used
symbolic distillation/PySR/
SymbolicRegression.jl:
astroautomata.com/PySR/papers
Selection of user-contributed
publications that have used
symbolic distillation/PySR/
SymbolicRegression.jl:
astroautomata.com/PySR/papers
We can use Symbolic Regression to Distill a
Neural Network into an Analytic Expression
How this works:
Cranmer et al., 2019, 2020 – Work with: Alvaro
Sanchez-Gonzalez, Shirley Ho, Peter Battaglia,
Kyle Cranmer, David Spergel, Rui Xu

1. Train NN normally,
and freeze parameters.
How this works:
Cranmer et al., 2019, 2020 – Work with: Alvaro
Sanchez-Gonzalez, Shirley Ho, Peter Battaglia,
Kyle Cranmer, David Spergel, Rui Xu

x1 y1
x2 y2
… …

1. Train NN normally, 2. Record input/outputs of


and freeze parameters. network over training set.
How this works:
Cranmer et al., 2019, 2020 – Work with: Alvaro
Sanchez-Gonzalez, Shirley Ho, Peter Battaglia, x1 y1
Kyle Cranmer, David Spergel, Rui Xu
x2 y2
… …

PySR

[y]1 = cos(2.1 ⋅ [x]3) − [x]4


x1 y1
x2 y2
… …

1. Train NN normally, 2. Record input/outputs of 3. Fit the input/outputs of the


and freeze parameters. network over training set. neural network with PySR
Full Symbolic Distillation
Full Symbolic Distillation

Uses features
Learns features?
for calculation?
Full Symbolic Distillation

Uses features
Learns features?
for calculation?
Full Symbolic Distillation
Full Symbolic Distillation

Re-train g, to pick up any errors


in the approximation of f
Full Symbolic Distillation
Full Symbolic Distillation
Full Symbolic Distillation
Full Symbolic Distillation
Full Symbolic Distillation

(g ∘ f )(x1, x2, x3, x4) =


Full Symbolic Distillation

(g ∘ f )(x1, x2, x3, x4) =

Fully-interpretable approximation of the original neural network!


Full Symbolic Distillation

(g ∘ f )(x1, x2, x3, x4) =

Fully-interpretable approximation of the original neural network!

2
(Searching over n expressions → Searching over 2n expressions)
Inductive bias
• Introducing some form of inductive bias is needed to eliminate the
functional degeneracy. For example:

Xi y
Inductive bias
• Introducing some form of inductive bias is needed to eliminate the
functional degeneracy. For example:

Xi y


i

(the latent space between f and g could have some aggregation over a set)
Recall:
Traditional approach to science:

Data
Theory
(low dim.) Describe

(May be summary statistics)


Recall:
Traditional approach to science:

Data
Theory
(low dim.) Describe

(May be summary statistics)

Era of AI?

Data
Neural Net? Theory
(high dim.) Compress Distill
Some examples:
with Alvaro Sanchez Gonzalez, Peter Battaglia, Rui Xu, Kyle Cranmer,
David Spergel, Shirley Ho; (NeurIPS 2020)
Knowledge Discovery
• Predict the dark matter properties in a
simulation with a graph neural network:

Self-supervised
(predict neighbors)
Example 2:
Discovering Orbital Mechanics
Example 2:
Discovering Orbital Mechanics

Can we learn Newton’s law of gravity by modelling the solar system


with a graph neural network?

Unknown masses, and unknown dynamical model.


Example 2:
Discovering Orbital Mechanics

Can we learn Newton’s law of gravity by modelling the solar system


with a graph neural network?

Unknown masses, and unknown dynamical model.

“Rediscovering orbital mechanics with machine learning” (2022)


Pablo Lemos, Niall Jeffrey, Miles Cranmer, Shirley Ho, Peter Battaglia
Next: interpretation

Approximate relation between latent


spaces of network with PySR
Interpretation Results for f

Accuracy/Complexity
Tradeo *

*from Cranmer+2020; similar to


Schmidt & Lipson, 2009
Complexity
ff
Interpretation Results for f

Accuracy/Complexity
Tradeo *

*from Cranmer+2020; similar to


Schmidt & Lipson, 2009
Complexity
ff
Interpretation Results for f

Accuracy/Complexity
Tradeo *

*from Cranmer+2020; similar to


Schmidt & Lipson, 2009
Complexity
ff
Interpretation Results for f

Accuracy/Complexity
Tradeo *

*from Cranmer+2020; similar to


Schmidt & Lipson, 2009
Complexity
ff
Interpretation Results for f

Accuracy/Complexity
Tradeo *

*from Cranmer+2020; similar to


Schmidt & Lipson, 2009
Complexity
ff
Interpretation Results for f

Accuracy/Complexity
Tradeo *

*from Cranmer+2020; similar to


Schmidt & Lipson, 2009
Complexity
ff
Interpretation Results for f

Accuracy/Complexity
Tradeo *

*from Cranmer+2020; similar to


Schmidt & Lipson, 2009
Complexity
ff
Interpretation Results for f

Accuracy/Complexity
Tradeo *
d(log(error))
=−
d(complexity)

*from Cranmer+2020; similar to


Schmidt & Lipson, 2009
Complexity
ff
Test the symbolic model:
Why isn’t this working well?
• Let’s look at the mass values in comparison with the true masses:
Why isn’t this working well?
• Let’s look at the mass values in comparison with the true masses:
Solution: re-optimize vi!
Solution: re-optimize vi!

• The vi were optimized for the neural network.


Solution: re-optimize vi!

• The vi were optimized for the neural network.


• The symbolic formula is not a *perfect* approximation of the network.
Solution: re-optimize vi!

• The vi were optimized for the neural network.


• The symbolic formula is not a *perfect* approximation of the network.

• Thus: we need to re-optimize vi for the symbolic function f !


• “Complexity” of an operator = number of
clock cycles on FPGA
• “Complexity” of an operator = number of
clock cycles on FPGA
• Approximate neural net with small
expression = 90% accuracy
• “Complexity” of an operator = number of
clock cycles on FPGA
• Approximate neural net with small
expression = 90% accuracy
• 5 ns inference time on FPGA!
Discussion/Future
Discussion/Future
• Is a pure neural net approach to AI for science (i.e., no interpretation)
possible? How would you get the same level of generalization as we have
had from theory
Discussion/Future
• Is a pure neural net approach to AI for science (i.e., no interpretation)
possible? How would you get the same level of generalization as we have
had from theory
• General relativity derived from only a few postulates/data points, yet
can predict the existence of black holes. Is it hopeless to expect that
level of generalization from foundation models?
Discussion/Future
• Is a pure neural net approach to AI for science (i.e., no interpretation)
possible? How would you get the same level of generalization as we have
had from theory
• General relativity derived from only a few postulates/data points, yet
can predict the existence of black holes. Is it hopeless to expect that
level of generalization from foundation models?
• How do we distill very large models, like large language models, into the
language of science?
Discussion/Future
• Is a pure neural net approach to AI for science (i.e., no interpretation)
possible? How would you get the same level of generalization as we have
had from theory
• General relativity derived from only a few postulates/data points, yet
can predict the existence of black holes. Is it hopeless to expect that
level of generalization from foundation models?
• How do we distill very large models, like large language models, into the
language of science?
• These models may have learned some new unifying principles across
domains. How can we nd it?
fi
Discussion/Future
• Is a pure neural net approach to AI for science (i.e., no interpretation)
possible? How would you get the same level of generalization as we have
had from theory
• General relativity derived from only a few postulates/data points, yet
can predict the existence of black holes. Is it hopeless to expect that
level of generalization from foundation models?
• How do we distill very large models, like large language models, into the
language of science?
• These models may have learned some new unifying principles across
domains. How can we nd it?
• Can you use this symbolic regression technique to interpret language
models directly?
fi
FAQ: Why not t directly?

fi
FAQ: Why not t directly?
• Constraints:

fi
FAQ: Why not t directly?
• Constraints:
• Neural Networks require ~1M evaluations of a loss function to train.

fi
FAQ: Why not t directly?
• Constraints:
• Neural Networks require ~1M evaluations of a loss function to train.
• Genetic algorithm-based symbolic regression requires ~1B evaluations
to nd a complex+accurate expression.
fi
fi
FAQ: Why not t directly?
• Constraints:
• Neural Networks require ~1M evaluations of a loss function to train.
• Genetic algorithm-based symbolic regression requires ~1B evaluations
to nd a complex+accurate expression.
• Need symbolic regression loss to be extremely ef cient!
fi
fi
fi
FAQ: Why not t directly?
• Constraints:
• Neural Networks require ~1M evaluations of a loss function to train.
• Genetic algorithm-based symbolic regression requires ~1B evaluations
to nd a complex+accurate expression.
• Need symbolic regression loss to be extremely ef cient!
• Offline vs online learning:
fi
fi
fi
FAQ: Why not t directly?
• Constraints:
• Neural Networks require ~1M evaluations of a loss function to train.
• Genetic algorithm-based symbolic regression requires ~1B evaluations
to nd a complex+accurate expression.
• Need symbolic regression loss to be extremely ef cient!
• Offline vs online learning:
• Full loss is too expensive.
fi
fi
fi
FAQ: Why not t directly?
• Constraints:
• Neural Networks require ~1M evaluations of a loss function to train.
• Genetic algorithm-based symbolic regression requires ~1B evaluations
to nd a complex+accurate expression.
• Need symbolic regression loss to be extremely ef cient!
• Offline vs online learning:
• Full loss is too expensive.
• So, we do “online” learning of the neural net, and then t the inputs/
outputs of the network afterwards
fi
fi
fi
fi

You might also like