⚡️ Speed up `_get_all_json_refs()` by 34% in `pydantic/json_schema.py` #9650

misrasaurabh1 · 2024-06-12T20:44:25Z

Change Summary

📄 `_get_all_json_refs()` in `pydantic/json_schema.py`

📈 Performance improved by 34% (0.34x faster)

⏱️ Runtime went down from 544 microseconds to 405 microseconds

Explanation and details

Here is the optimized version of the provided Python program. The key optimization here leverages a stack-based approach to avoid the overhead associated with recursive function calls, which can be significant when dealing with deeply nested structures.

Improvements.

Stack-Based Iteration: By using an explicit stack, we avoid the overhead of recursive function calls.
Stack Extension: The extend method is utilized to add multiple items to the stack, reducing the number of loop iterations.

This approach ensures better performance, especially for deeply nested JSON structures, by keeping the processing within the iteration loop instead of the function call stack.

Correctness verification

The new optimized code was tested for correctness. The results are listed below.

🔘 (none found) − ⚙️ Existing Unit Tests

✅ 15 Passed − 🌀 Generated Regression Tests

(click to show generated tests)

# imports
from typing import Any, NewType

import pytest  # used for our unit tests

# function to test
JsonRef = NewType('JsonRef', str)
from pydantic.json_schema import _get_all_json_refs

# unit tests

def test_single_ref_in_flat_dict():
    """Test single $ref in a flat dictionary."""
    input_data = {'$ref': 'reference1'}
    expected_output = {JsonRef('reference1')}
    assert _get_all_json_refs(input_data) == expected_output

def test_multiple_refs_in_flat_dict():
    """Test multiple $refs in a flat dictionary."""
    input_data = {'$ref': 'reference1', 'key': {'$ref': 'reference2'}}
    expected_output = {JsonRef('reference1'), JsonRef('reference2')}
    assert _get_all_json_refs(input_data) == expected_output

def test_nested_dicts_with_refs():
    """Test nested dictionaries with $refs."""
    input_data = {'key1': {'$ref': 'reference1', 'key2': {'$ref': 'reference2'}}}
    expected_output = {JsonRef('reference1'), JsonRef('reference2')}
    assert _get_all_json_refs(input_data) == expected_output

def test_nested_lists_with_refs():
    """Test nested lists with $refs."""
    input_data = {'key1': [{'key2': {'$ref': 'reference1'}}, {'key3': {'$ref': 'reference2'}}]}
    expected_output = {JsonRef('reference1'), JsonRef('reference2')}
    assert _get_all_json_refs(input_data) == expected_output

def test_combination_of_dicts_and_lists():
    """Test combination of dictionaries and lists with $refs."""
    input_data = {'key1': [{'key2': {'$ref': 'reference1'}}, {'key3': {'key4': {'$ref': 'reference2'}}}]}
    expected_output = {JsonRef('reference1'), JsonRef('reference2')}
    assert _get_all_json_refs(input_data) == expected_output

def test_empty_dict():
    """Test empty dictionary."""
    input_data = {}
    expected_output = set()
    assert _get_all_json_refs(input_data) == expected_output

def test_empty_list():
    """Test empty list."""
    input_data = []
    expected_output = set()
    assert _get_all_json_refs(input_data) == expected_output

def test_no_refs_present():
    """Test dictionary with no $refs."""
    input_data = {'key1': 'value1', 'key2': {'key3': 'value3'}}
    expected_output = set()
    assert _get_all_json_refs(input_data) == expected_output

def test_non_string_ref_value():
    """Test dictionary with non-string $ref value."""
    input_data = {'$ref': 123, 'key': {'$ref': 'reference1'}}
    expected_output = {JsonRef('reference1')}
    assert _get_all_json_refs(input_data) == expected_output

def test_deeply_nested_structures():
    """Test deeply nested structures with $refs."""
    input_data = {'key1': {'key2': {'key3': {'key4': {'$ref': 'reference1'}}}}}
    expected_output = {JsonRef('reference1')}
    assert _get_all_json_refs(input_data) == expected_output

def test_large_json_like_structure():
    """Test large JSON-like structure with multiple $refs."""
    input_data = {'key1': [{'key2': {'$ref': 'reference1'}}, {'key3': {'key4': {'$ref': 'reference2'}}}, {'key5': [{'key6': {'$ref': 'reference3'}}, {'key7': {'$ref': 'reference4'}}]}]}
    expected_output = {JsonRef('reference1'), JsonRef('reference2'), JsonRef('reference3'), JsonRef('reference4')}
    assert _get_all_json_refs(input_data) == expected_output

def test_non_dict_non_list_input():
    """Test non-dictionary, non-list input."""
    input_data = 'string'
    expected_output = set()
    assert _get_all_json_refs(input_data) == expected_output

def test_mixed_types_in_list():
    """Test list with mixed types including $refs."""
    input_data = [{'key': {'$ref': 'reference1'}}, 'string', 123, {'$ref': 'reference2'}]
    expected_output = {JsonRef('reference1'), JsonRef('reference2')}
    assert _get_all_json_refs(input_data) == expected_output

def test_large_number_of_nested_references():
    """Test large number of nested references."""
    input_data = {'key': [{'key': [{'key': [{'key': [{'key': [{'key': {'$ref': 'reference1'}}]}]}]}]}]}
    expected_output = {JsonRef('reference1')}
    assert _get_all_json_refs(input_data) == expected_output

def test_large_flat_structure():
    """Test large flat structure with many $refs."""
    input_data = {f'key{i}': {'$ref': f'reference{i}'} for i in range(1000)}
    expected_output = {JsonRef(f'reference{i}') for i in range(1000)}
    assert _get_all_json_refs(input_data) == expected_output

🔘 (none found) − ⏪ Replay Tests

This optimization was discovered automatically using codeflash.ai

Checklist

The pull request title is a good summary of the changes - it will be used in the changelog
Unit tests for the changes exist
Tests pass on CI
Documentation reflects the changes where applicable
My PR is ready to review, please add a comment including the phrase "please review" to assign reviewers

Here is the optimized version of the provided Python program. The key optimization here leverages a stack-based approach to avoid the overhead associated with recursive function calls, which can be significant when dealing with deeply nested structures. ### Improvements. 1. **Stack-Based Iteration**: By using an explicit stack, we avoid the overhead of recursive function calls. 2. **Stack Extension**: The `extend` method is utilized to add multiple items to the stack, reducing the number of loop iterations. This approach ensures better performance, especially for deeply nested JSON structures, by keeping the processing within the iteration loop instead of the function call stack.

…6-05T11.55.21

codspeed-hq · 2024-06-12T20:47:31Z

CodSpeed Performance Report

Merging #9650 will not alter performance

_{Comparing misrasaurabh1:codeflash/optimize-_get_all_json_refs-2024-06-05T11.55.21 (488d819) with main (8128b17)}

Summary

✅ 13 untouched benchmarks

sydney-runkle · 2024-06-12T21:59:47Z

This looks great! I'd like to add a test for a complicated json schema generation function + merge that into main to get a benchmark, then see what codspeed thinks with this improvement. I can add that test tomorrow!

vigneshmanick · 2024-06-13T21:06:27Z

This and the other 2 PRs by the OP look more like an advertisement for this code scanning tool with which the changes were generated . The title, PR description and code change all look auto generated.

misrasaurabh1 · 2024-06-13T21:54:39Z

Hi @vigneshmanick, yes this optimization was auto generated and validated with Codeflash, but before opening this PR I also manually vetted it and I am in touch with the Pydantic team to confirm if these optimizations are good.

vigneshmanick · 2024-06-14T11:19:25Z

The benchmarks haven't changed for this PR (and the others too)
There have been no new benchmarks added for this or the other PRs
No new tests have been added for the changes created
These PRs don't close any open issues from pydantic or pydantic core. (atleast don't see any linked ones)

…6-05T11.55.21

sydney-runkle

This is pretty awesome. Super excited about this kind of capability @misrasaurabh1.

I'm ok with merging this as is, given the pretty exhaustive test suite that was generated + attached to this PR.

codeflash-ai bot and others added 2 commits June 5, 2024 11:55

Merge branch 'main' into codeflash/optimize-_get_all_json_refs-2024-0…

d478ebb

…6-05T11.55.21

github-actions bot added the relnotes-fix Used for bugfixes. label Jun 12, 2024

sydney-runkle added relnotes-performance Used for performance improvements. and removed relnotes-fix Used for bugfixes. labels Jun 12, 2024

Merge branch 'main' into codeflash/optimize-_get_all_json_refs-2024-0…

488d819

…6-05T11.55.21

sydney-runkle approved these changes Jun 20, 2024

View reviewed changes

sydney-runkle merged commit 6b0e7de into pydantic:main Jun 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

⚡️ Speed up `_get_all_json_refs()` by 34% in `pydantic/json_schema.py` #9650

⚡️ Speed up `_get_all_json_refs()` by 34% in `pydantic/json_schema.py` #9650

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

⚡️ Speed up _get_all_json_refs() by 34% in pydantic/json_schema.py #9650

⚡️ Speed up _get_all_json_refs() by 34% in pydantic/json_schema.py #9650

Uh oh!

Conversation

Change Summary

📄 _get_all_json_refs() in pydantic/json_schema.py

Explanation and details

Improvements.

Correctness verification

🔘 (none found) − ⚙️ Existing Unit Tests

✅ 15 Passed − 🌀 Generated Regression Tests

🔘 (none found) − ⏪ Replay Tests

Checklist

Uh oh!

Uh oh!

CodSpeed Performance Report

Merging #9650 will not alter performance

Summary

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

⚡️ Speed up `_get_all_json_refs()` by 34% in `pydantic/json_schema.py` #9650

⚡️ Speed up `_get_all_json_refs()` by 34% in `pydantic/json_schema.py` #9650

📄 `_get_all_json_refs()` in `pydantic/json_schema.py`