PyPMML-Spark is a Python PMML scoring library for PySpark as SparkML Transformer, it really is the Python API for PMML4S-Spark.
- Java >= 1.8
- Python 2.7 or >= 3.5
Module | PySpark |
---|---|
pypmml-spark | PySpark >= 3.0.0 |
pypmml-spark2 | PySpark >= 2.4.0, < 3.0.0 |
pip install pypmml-spark
Or install the latest version from github:
pip install --upgrade git+https://github.com/autodeployai/pypmml-spark.git
After that, you need to do more to use it in Spark that must know those jars in the package pypmml_spark.jars
. There are several ways to do that:
-
The easiest way is to run the script
link_pmml4s_jars_into_spark.py
that is delivered withpypmml-spark
:link_pmml4s_jars_into_spark.py
-
Use those config options to specify dependent jars properly. e.g.
--jars
, orspark.executor.extraClassPath
andspark.executor.extraClassPath
. See Spark for details about those parameters.
-
Load model from various sources, e.g. filename, string, or array of bytes.
from pypmml_spark import ScoreModel # The model is from http://dmg.org/pmml/pmml_examples/KNIME_PMML_4.1_Examples/single_iris_dectree.xml model = ScoreModel.fromFile('single_iris_dectree.xml')
-
Call
transform(dataset)
to run a batch score against an input dataset.# The data is from http://dmg.org/pmml/pmml_examples/Iris.csv df = spark.read.csv('Iris.csv', header='true') score_df = model.transform(df)
See the PMML4S project. PMML4S is a PMML scoring library for Scala. It provides both Scala and Java Evaluator API for PMML.
See the PyPMML project. PyPMML is a Python PMML scoring library, it really is the Python API for PMML4S.
See the PMML4S-Spark project. PMML4S-Spark is a PMML scoring library for Spark as SparkML Transformer.
See the AI-Serving project. AI-Serving is serving AI/ML models in the open standard formats PMML and ONNX with both HTTP (REST API) and gRPC endpoints.
If you have any questions about the PyPMML-Spark library, please open issues on this repository.
Feedback and contributions to the project, no matter what kind, are always very welcome.
PyPMML-Spark is licensed under APL 2.0.