Interpretable Machine Learning for Science with PySR and SymbolicRegression.jl
Abstract
PySR is an open-source library for practical symbolic regression, a type of machine learning which aims to discover human-interpretable symbolic models. PySR was developed to democratize and popularize symbolic regression for the sciences, and is built on a high-performance distributed back-end, a flexible search algorithm, and interfaces with several deep learning packages. PySR's internal search algorithm is a multi-population evolutionary algorithm, which consists of a unique evolve-simplify-optimize loop, designed for optimization of unknown scalar constants in newly-discovered empirical expressions. PySR's backend is the extremely optimized Julia library SymbolicRegression.jl, which can be used directly from Julia. It is capable of fusing user-defined operators into SIMD kernels at runtime, performing automatic differentiation, and distributing populations of expressions to thousands of cores across a cluster. In describing this software, we also introduce a new benchmark, "EmpiricalBench," to quantify the applicability of symbolic regression algorithms in science. This benchmark measures recovery of historical empirical equations from original and synthetic datasets.
- Publication:
-
arXiv e-prints
- Pub Date:
- May 2023
- DOI:
- 10.48550/arXiv.2305.01582
- arXiv:
- arXiv:2305.01582
- Bibcode:
- 2023arXiv230501582C
- Keywords:
-
- Astrophysics - Instrumentation and Methods for Astrophysics;
- Computer Science - Machine Learning;
- Computer Science - Neural and Evolutionary Computing;
- Computer Science - Symbolic Computation;
- Physics - Data Analysis;
- Statistics and Probability
- E-Print:
- 24 pages, 5 figures, 3 tables. Feedback welcome. Paper source found at https://github.com/MilesCranmer/pysr_paper