10BC0 ⚡️ Speed up `_get_all_json_refs()` by 34% in `pydantic/json_schema.py` by misrasaurabh1 · Pull Request #9650 · pydantic/pydantic · GitHub
[go: up one dir, main page]

Skip to content

Conversation

@misrasaurabh1
Copy link
Contributor

Change Summary

📄 _get_all_json_refs() in pydantic/json_schema.py

📈 Performance improved by 34% (0.34x faster)

⏱️ Runtime went down from 544 microseconds to 405 microseconds

Explanation and details

Here is the optimized version of the provided Python program. The key optimization here leverages a stack-based approach to avoid the overhead associated with recursive function calls, which can be significant when dealing with deeply nested structures.

Improvements.

  1. Stack-Based Iteration: By using an explicit stack, we avoid the overhead of recursive function calls.
  2. Stack Extension: The extend method is utilized to add multiple items to the stack, reducing the number of loop iterations.

This approach ensures better performance, especially for deeply nested JSON structures, by keeping the processing within the iteration loop instead of the function call stack.

Correctness verification

The new optimized code was tested for correctness. The results are listed below.

🔘 (none found) − ⚙️ Existing Unit Tests

✅ 15 Passed − 🌀 Generated Regression Tests

(click to show generated tests)
# imports
from typing import Any, NewType

import pytest  # used for our unit tests

# function to test
JsonRef = NewType('JsonRef', str)
from pydantic.json_schema import _get_all_json_refs

# unit tests

def test_single_ref_in_flat_dict():
    """Test single $ref in a flat dictionary."""
    input_data = {'$ref': 'reference1'}
    expected_output = {JsonRef('reference1')}
    assert _get_all_json_refs(input_data) == expected_output

def test_multiple_refs_in_flat_dict():
    """Test multiple $refs in a flat dictionary."""
    input_data = {'$ref': 'reference1', 'key': {'$ref': 'reference2'}}
    expected_output = {JsonRef('reference1'), JsonRef('reference2')}
    assert _get_all_json_refs(input_data) == expected_output

def test_nested_dicts_with_refs():
    """Test nested dictionaries with $refs."""
    input_data = {'key1': {'$ref': 'reference1', 'key2': {'$ref': 'reference2'}}}
    expected_output = {JsonRef('reference1'), JsonRef('reference2')}
    assert _get_all_json_refs(input_data) == expected_output

def test_nested_lists_with_refs():
    """Test nested lists with $refs."""
    input_data = {'key1': [{'key2': {'$ref': 'reference1'}}, {'key3': {'$ref': 'reference2'}}]}
    expected_output = {JsonRef('reference1'), JsonRef('reference2')}
    assert _get_all_json_refs(input_data) == expected_output

def test_combination_of_dicts_and_lists():
    """Test combination of dictionaries and lists with $refs."""
    input_data = {'key1': [{'key2': {'$ref': 'reference1'}}, {'key3': {'key4': {'$ref': 'reference2'}}}]}
    expected_output = {JsonRef('reference1'), JsonRef('reference2')}
    assert _get_all_json_refs(input_data) == expected_output

def test_empty_dict():
    """Test empty dictionary."""
    input_data = {}
    expected_output = set()
    assert _get_all_json_refs(input_data) == expected_output

def test_empty_list():
    """Test empty list."""
    input_data = []
    expected_output = set()
    assert _get_all_json_refs(input_data) == expected_output

def test_no_refs_present():
    """Test dictionary with no $refs."""
    input_data = {'key1': 'value1', 'key2': {'key3': 'value3'}}
    expected_output = set()
    assert _get_all_json_refs(input_data) == expected_output

def test_non_string_ref_value():
    """Test dictionary with non-string $ref value."""
    input_data = {'$ref': 123, 'key': {'$ref': 'reference1'}}
    expected_output = {JsonRef('reference1')}
    assert _get_all_json_refs(input_data) == expected_output

def test_deeply_nested_structures():
    """Test deeply nested structures with $refs."""
    input_data = {'key1': {'key2': {'key3': {'key4': {'$ref': 'reference1'}}}}}
    expected_output = {JsonRef('reference1')}
    assert _get_all_json_refs(input_data) == expected_output

def test_large_json_like_structure():
    """Test large JSON-like structure with multiple $refs."""
    input_data = {'key1': [{'key2': {'$ref': 'reference1'}}, {'key3': {'key4': {'$ref': 'reference2'}}}, {'key5': [{'key6': {'$ref': 'reference3'}}, {'key7': {'$ref': 'reference4'}}]}]}
    expected_output = {JsonRef('reference1'), JsonRef('reference2'), JsonRef('reference3'), JsonRef('reference4')}
    assert _get_all_json_refs(input_data) == expected_output

def test_non_dict_non_list_input():
    """Test non-dictionary, non-list input."""
    input_data = 'string'
    expected_output = set()
    assert _get_all_json_refs(input_data) == expected_output

def test_mixed_types_in_list():
    """Test list with mixed types including $refs."""
    input_data = [{'key': {'$ref': 'reference1'}}, 'string', 123, {'$ref': 'reference2'}]
    expected_output = {JsonRef('reference1'), JsonRef('reference2')}
    assert _get_all_json_refs(input_data) == expected_output

def test_large_number_of_nested_references():
    """Test large number of nested references."""
    input_data = {'key': [{'key': [{'key': [{'key': [{'key': [{'key': {'$ref': 'reference1'}}]}]}]}]}]}
    expected_output = {JsonRef('reference1')}
    assert _get_all_json_refs(input_data) == expected_output

def test_large_flat_structure():
    """Test large flat structure with many $refs."""
    input_data = {f'key{i}': {'$ref': f'reference{i}'} for i in range(1000)}
    expected_output = {JsonRef(f'reference{i}') for i in range(1000)}
    assert _get_all_json_refs(input_data) == expected_output

🔘 (none found) − ⏪ Replay Tests

This optimization was discovered automatically using codeflash.ai

Checklist

  • The pull request title is a good summary of the changes - it will be used in the changelog
  • Unit tests for the changes exist
  • Tests pass on CI
  • Documentation reflects the changes where applicable
  • My PR is ready to review, please add a comment including the phrase "please review" to assign reviewers

codeflash-ai bot and others added 2 commits June 5, 2024 11:55
Here is the optimized version of the provided Python program. The key optimization here leverages a stack-based approach to avoid the overhead associated with recursive function calls, which can be significant when dealing with deeply nested structures.



### Improvements.
1. **Stack-Based Iteration**: By using an explicit stack, we avoid the overhead of recursive function calls.
2. **Stack Extension**: The `extend` method is utilized to add multiple items to the stack, reducing the number of loop iterations.

This approach ensures better performance, especially for deeply nested JSON structures, by keeping the processing within the iteration loop instead of the function call stack.
@github-actions github-actions bot added the relnotes-fix Used for bugfixes. label Jun 12, 2024
@codspeed-hq
Copy link
codspeed-hq bot commented Jun 12, 2024

CodSpeed Performance Report

Merging #9650 will not alter performance

Comparing misrasaurabh1:codeflash/optimize-_get_all_json_refs-2024-06-05T11.55.21 (488d819) with main (8128b17)

Summary

✅ 13 untouched benchmarks

@sydney-runkle sydney-runkle added relnotes-performance Used for performance improvements. and removed relnotes-fix Used for bugfixes. labels Jun 12, 2024
@sydney-runkle
Copy link
Contributor

This looks great! I'd like to add a test for a complicated json schema generation function + merge that into main to get a benchmark, then see what codspeed thinks with this improvement. I can add that test tomorrow!

@vigneshmanick
Copy link

This and the other 2 PRs by the OP look more like an advertisement for this code scanning tool with which the changes were generated . The title, PR description and code change all look auto generated.

@misrasaurabh1
Copy link
Contributor Author

Hi @vigneshmanick, yes this optimization was auto generated and validated with Codeflash, but before opening this PR I also manually vetted it and I am in touch with the Pydantic team to confirm if these optimizations are good.

@vigneshmanick
Copy link
  1. The benchmarks haven't changed for this PR (and the others too)
  2. There have been no new benchmarks added for this or the other PRs
  3. No new tests have been added for the changes created
  4. These PRs don't close any open issues from pydantic or pydantic core. (atleast don't see any linked ones)

Copy link
Contributor
@sydney-runkle sydney-runkle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is pretty awesome. Super excited about this kind of capability @misrasaurabh1.

I'm ok with merging this as is, given the pretty exhaustive test suite that was generated + attached to this PR.

@sydney-runkle sydney-runkle merged commit 6b0e7de into pydantic:main Jun 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

relnotes-performance Used for performance improvements.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants

0