Interpretable Machine Learning for Science with PySR and SymbolicRegression.jl

Cranmer, Miles

PySR is an open-source library for practical symbolic regression, a type of machine learning which aims to discover human-interpretable symbolic models. PySR was developed to democratize and popularize symbolic regression for the sciences, and is built on a high-performance distributed back-end, a flexible search algorithm, and interfaces with several deep learning packages. PySR's internal search algorithm is a multi-population evolutionary algorithm, which consists of a unique evolve-simplify-optimize loop, designed for optimization of unknown scalar constants in newly-discovered empirical expressions. PySR's backend is the extremely optimized Julia library SymbolicRegression.jl, which can be used directly from Julia. It is capable of fusing user-defined operators into SIMD kernels at runtime, performing automatic differentiation, and distributing populations of expressions to thousands of cores across a cluster. In describing this software, we also introduce a new benchmark, "EmpiricalBench," to quantify the applicability of symbolic regression algorithms in science. This benchmark measures recovery of historical empirical equations from original and synthetic datasets.

Publication:

arXiv e-prints

Pub Date:

May 2023

DOI:

10.48550/arXiv.2305.01582

arXiv:

arXiv:2305.01582

Bibcode:

2023arXiv230501582C

Keywords:

Astrophysics - Instrumentation and Methods for Astrophysics;
Computer Science - Machine Learning;
Computer Science - Neural and Evolutionary Computing;
Computer Science - Symbolic Computation;
Physics - Data Analysis;
Statistics and Probability

E-Print:

24 pages, 5 figures, 3 tables. Feedback welcome. Paper source found at https://github.com/MilesCranmer/pysr_paper

Interpretable Machine Learning for Science with PySR and SymbolicRegression.jl

Abstract