8000 JSON encoding refactor and orjson encoding by jonmmease · Pull Request #2955 · plotly/plotly.py · GitHub
[go: up one dir, main page]

Skip to content

JSON encoding refactor and orjson encoding #2955

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 49 commits into from
May 27, 2021
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
49 commits
Select commit Hold shift + click to select a range
40b9af1
WIP accelerated encoding with orjson
jonmmease Dec 5, 2020
f79e318
support fig to dict in io without cloning
jonmmease Dec 5, 2020
55720de
Merge branch 'master' into orjson_encoding
jonmmease Dec 5, 2020
7b3593a
fix clone default
jonmmease Dec 5, 2020
da915d6
Add pio.json.config object to configure default encoder
jonmmease Dec 5, 2020
7b235ef
default_encoder to default_engine
jonmmease Dec 5, 2020
7895b6a
blacken
jonmmease Dec 5, 2020
ce05a68
Handle Dash objects in to_json
jonmmease Dec 6, 2020
4ef6510
add JSON encoding tests
jonmmease Dec 31, 2020
101ba85
add testing of from_plotly_json
jonmmease Dec 31, 2020
67d3670
Better error message when orjson not installed and orjson engine requ…
jonmmease Dec 31, 2020
02c00da
Add orjson as optional testing dependency
jonmmease Dec 31, 2020
99ea6a1
Replace Python 3.5 CI tests with 3.8
jonmmease Dec 31, 2020
d44ec26
Try only install orjson with Python 3.6+
jonmmease Dec 31, 2020
b7d8422
Don't test orjson engine when orjson not installed
jonmmease Dec 31, 2020
ddcd6f5
Try new 3.8.7 docker image since prior guess doesn't exist
jonmmease Dec 31, 2020
33359f3
greater than!
jonmmease Dec 31, 2020
c7c1819
Bump scikit image version for Python 3.8 compatibility
jonmmease Dec 31, 2020
a8d52ab
Try to help Python 2 from getting confused about which json module to…
jonmmease Dec 31, 2020
619838f
Update pandas for Python 3
jonmmease Dec 31, 2020
7c7a272
Revert 3.8 CI updates. Too much for this PR
jonmmease Dec 31, 2020
1708703
Doh
jonmmease Dec 31, 2020
66cab10
Don't skip copying during serialization
jonmmease Dec 31, 2020
56a8945
Rename new JSON functions:
jonmmease Jan 2, 2021
0a51020
Ensure cleaned numpy arrays are contiguous
jonmmease Jan 2, 2021
4e9d64e
Use to_json_plotly in html and orca logic
jonmmease Jan 8, 2021
d4068de
Add orjson documentation dependency
jonmmease Jan 8, 2021
58b7192
Handle pandas Timestamp scalars in orjson engine
jonmmease Jan 8, 2021
974fcba
Rework date and string encoding, add and fix tests
jonmmease Jan 8, 2021
a651a63
default JSON engine to "auto"
jonmmease Jan 8, 2021
af1d88d
Fix expected JSON in html export (no spaces)
jonmmease Jan 8, 2021
1d6acc3
Merge remote-tracking branch 'origin/master' into orjson_encoding
jonmmease Jan 8, 2021
d51fd94
blacken
jonmmease Jan 8, 2021
042c54c
Fix expected JSON in matplotlylib test
jonmmease Jan 8, 2021
ddc1b8f
Fix expected JSON in html repr test
jonmmease Jan 8, 2021
d7928b0
Merge remote-tracking branch 'origin/master' into orjson_encoding
jonmmease Jan 13, 2021
76cc625
Don't drop timezones during serialization, just let Plotly.js ignore …
jonmmease Jan 13, 2021
453461d
Merge branch 'numpy_date_serialization' into orjson_encoding
jonmmease Jan 13, 2021
84ba4b5
no need to skip legacy tests now
jonmmease Jan 13, 2021
340aed3
Only try `datetime_as_string` on datetime kinded numpy arrays
jonmmease Jan 13, 2021
6cea61d
Don't store object or unicode numpy arrays in figure. Coerce to lists
jonmmease Jan 21, 2021
93815c1
Try orjson encoding without cleaning first
jonmmease Jan 21, 2021
242d1fa
Merge remote-tracking branch 'origin/master' into orjson_encoding
jonmmease Jan 21, 2021
8a3a4b3
blacken
jonmmease Jan 21, 2021
1de750a
remove scratch file
jonmmease Jan 21, 2021
81f73d5
Remove unused clone
jonmmease Jan 21, 2021
80be8bd
Remove the new "json" encoder
jonmmease Jan 22, 2021
cb54f88
Reorder dict cleaning for performance
jonmmease Jan 22, 2021
1fbfa0d
Merge remote-tracking branch 'origin/master' into orjson_encoding
jonmmease Apr 29, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Next Next commit
WIP accelerated encoding with orjson
  • Loading branch information
jonmmease committed Dec 5, 2020
commit 40b9af19edc60e2d5b1eb5630f321938bbb21dee
2 changes: 2 additions & 0 deletions packages/python/plotly/_plotly_utils/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -61,8 +61,10 @@ def encode(self, o):
# We catch false positive cases (e.g. strings such as titles, labels etc.)
# but this is ok since the intention is to skip the decoding / reencoding
# step when it's completely safe

if not ("NaN" in encoded_o or "Infinity" in encoded_o):
return encoded_o

# now:
# 1. `loads` to switch Infinity, -Infinity, NaN to None
# 2. `dumps` again so you get 'null' instead of extended JSON
Expand Down
29 changes: 21 additions & 8 deletions packages/python/plotly/plotly/basedatatypes.py
Original file line number Diff line number Diff line change
Expand Up @@ -3273,7 +3273,7 @@ def _perform_batch_animate(self, animation_opts):

# Exports
# -------
def to_dict(self):
def to_dict(self, clone=True):
"""
Convert figure to a dictionary

Expand All @@ -3286,31 +3286,41 @@ def to_dict(self):
"""
# Handle data
# -----------
data = deepcopy(self._data)
if clone:
data = deepcopy(self._data)
else:
data = self._data

# Handle layout
# -------------
layout = deepcopy(self._layout)
if clone:
layout = deepcopy(self._layout)
else:
layout = self._layout

# Handle frames
# -------------
# Frame key is only added if there are any frames
res = {"data": data, "layout": layout}
frames = deepcopy([frame._props for frame in self._frame_objs])
if clone:
frames = deepcopy([frame._props for frame in self._frame_objs])
else:
frames = [frame._props for frame in self._frame_objs]

if frames:
res["frames"] = frames

return res

def to_plotly_json(self):
def to_plotly_json(self, clone=True):
"""
Convert figure to a JSON representation as a Python dict

Returns
-------
dict
"""
return self.to_dict()
return self.to_dict(clone=clone)

@staticmethod
def _to_ordered_dict(d, skip_uid=False):
Expand Down Expand Up @@ -5524,15 +5534,18 @@ def on_change(self, callback, *args, **kwargs):
# -----------------
self._change_callbacks[arg_tuples].append(callback)

def to_plotly_json(self):
def to_plotly_json(self, clone=False):
"""
Return plotly JSON representation of object as a Python dict

Returns
-------
dict
"""
return deepcopy(self._props if self._props is not None else {})
if clone:
return deepcopy(self._props if self._props is not None else {})
else:
return self._props if self._props is not None else {}

@staticmethod
def _vals_equal(v1, v2):
Expand Down
180 changes: 170 additions & 10 deletions packages/python/plotly/plotly/io/_json.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,12 +2,28 @@

from six import string_types
import json
import decimal


from plotly.io._utils import validate_coerce_fig_to_dict, validate_coerce_output_type
from _plotly_utils.utils import iso_to_plotly_time_string
from _plotly_utils.optional_imports import get_module
from _plotly_utils.basevalidators import ImageUriValidator


def to_json(fig, validate=True, pretty=False, remove_uids=True):
def coerce_to_strict(const):
"""
This is used to ultimately *encode* into strict JSON, see `encode`

"""
# before python 2.7, 'true', 'false', 'null', were include here.
if const in ("Infinity", "-Infinity", "NaN"):
return None
else:
return const


def to_json(fig, validate=True, pretty=False, remove_uids=True, engine="auto"):
"""
Convert a figure to a JSON string representation

Expand All @@ -32,7 +48,7 @@ def to_json(fig, validate=True, pretty=False, remove_uids=True):
str
Representation of figure as a JSON string
"""
from _plotly_utils.utils import PlotlyJSONEncoder
orjson = get_module("orjson", should_load=True)

# Validate figure
# ---------------
Expand All @@ -44,16 +60,77 @@ def to_json(fig, validate=True, pretty=False, remove_uids=True):
for trace in fig_dict.get("data", []):
trace.pop("uid", None)

# Determine json engine
if engine == "auto":
if orjson is not None:
engine = "orjson"
else:
engine = "json"
elif engine not in ["orjson", "json", "legacy"]:
raise ValueError("Invalid json engine: %s" % engine)

modules = {"sage_all": get_module("sage.all", should_load=False),
"np": get_module("numpy", should_load=False),
"pd": get_module("pandas", should_load=False),
"image": get_module("PIL.Image", should_load=False)}

orjson = get_module("orjson", should_load=True)

# Dump to a JSON string and return
# --------------------------------
opts = {"sort_keys": True}
if pretty:
opts["indent"] = 2
else:
# Remove all whitespace
opts["separators"] = (",", ":")

return json.dumps(fig_dict, cls=PlotlyJSONEncoder, **opts)
if engine in ("json", "legacy"):
opts = {"sort_keys": True}
if pretty:
opts["indent"] = 2
else:
# Remove all whitespace
opts["separators"] = (",", ":")

if engine == "json":
cleaned = clean_to_json_compatible(
fig, numpy_allowed=False,
non_finite_allowed=False,
datetime_allowed=False,
modules=modules,
)
encoded_o = json.dumps(cleaned, **opts)

if not ("NaN" in encoded_o or "Infinity" in encoded_o):
return encoded_o

# now:
# 1. `loads` to switch Infinity, -Infinity, NaN to None
# 2. `dumps` again so you get 'null' instead of extended JSON
try:
new_o = json.loads(encoded_o, parse_constant=coerce_to_strict)
except ValueError:

# invalid separators will fail here. raise a helpful exception
raise ValueError(
"Encoding into strict JSON failed. Did you set the separators "
"valid JSON separators?"
)
else:
return json.dumps(new_o, **opts)
else:
from _plotly_utils.utils import PlotlyJSONEncoder
return json.dumps(fig_dict, cls=PlotlyJSONEncoder, **opts)
elif engine == "orjson":
opts = (orjson.OPT_SORT_KEYS
| orjson.OPT_SERIALIZE_NUMPY
| orjson.OPT_OMIT_MICROSECONDS
)

if pretty:
opts |= orjson.OPT_INDENT_2

cleaned = clean_to_json_compatible(
fig, numpy_allowed=True,
non_finite_allowed=True,
datetime_allowed=True,
modules=modules,
)
return orjson.dumps(cleaned, option=opts).decode("utf8")


def write_json(fig, file, validate=True, pretty=False, remove_uids=True):
Expand Down Expand Up @@ -194,3 +271,86 @@ def read_json(file, output_type="Figure", skip_invalid=False):
# Construct and return figure
# ---------------------------
return from_json(json_str, skip_invalid=skip_invalid, output_type=output_type)


def clean_to_json_compatible(obj, **kwargs):
# Try handling value as a scalar value that we have a conversion for.
# Return immediately if we know we've hit a primitive value

# unpack kwargs
numpy_allowed = kwargs.get("numpy_allowed", False)
non_finite_allowed = kwargs.get("non_finite_allowed", False)
datetime_allowed = kwargs.get("datetime_allowed", False)

modules = kwargs.get("modules", {})
sage_all = modules["sage_all"]
np = modules["np"]
pd = modules["pd"]
image = modules["image"]

# Plotly
try:
obj = obj.to_plotly_json(clone=False)
except (TypeError, NameError, ValueError):
# Try without clone for backward compatibility
obj = obj.to_plotly_json()
except AttributeError:
pass

# Sage
if sage_all is not None:
if obj in sage_all.RR:
return float(obj)
elif obj in sage_all.ZZ:
return int(obj)

# numpy
if np is not None:
if obj is np.ma.core.masked:
return float("nan")
elif numpy_allowed and isinstance(obj, np.ndarray) and obj.dtype.kind in ("b", "i", "u", "f"):
return obj

# pandas
if pd is not None:
if obj is pd.NaT:
return None
elif isinstance(obj, pd.Series):
if numpy_allowed and obj.dtype.kind in ("b", "i", "u", "f"):
return obj.values
elif datetime_allowed and obj.dtype.kind == "M":
return obj.dt.to_pydatetime().tolist()


# datetime and date
if not datetime_allowed:
try:
# Is this cleanup still needed?
return iso_to_plotly_time_string(obj.isoformat())
except AttributeError:
pass

# Try .tolist() convertible
try:
# obj = obj.tolist()
return obj.tolist()
except AttributeError:
pass

# Do best we can with decimal
if isinstance(obj, decimal.Decimal):
return float(obj)

# PIL
if image is not None and isinstance(obj, image.Image):
return ImageUriValidator.pil_image_to_uri(obj)

# Recurse into lists and dictionaries
if isinstance(obj, dict):
return {k: clean_to_json_compatible(v, **kwargs) for k, v in obj.items()}
elif isinstance(obj, (list, tuple)):
if obj:
# Must process list recursively even though it may be slow
return [clean_to_json_compatible(v, **kwargs) for v in obj]

return obj
0