8000 Partial fix for 1078: Enhance DataFrame Formatter Configuration with Memory and Display Controls by kosiew · Pull Request #1119 · apache/datafusion-python · GitHub
[go: up one dir, main page]

Skip to content

Partial fix for 1078: Enhance DataFrame Formatter Configuration with Memory and Display Controls #1119

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 42 commits into from
May 5, 2025

Conversation

kosiew
Copy link
Contributor
@kosiew kosiew commented Apr 28, 2025

Which issue does this PR close?

partial fix for #1078

Rationale for this change

This change improves the flexibility and performance of DataFrame rendering in notebooks and other environments.
It introduces fine-grained control over memory usage, row display counts, and HTML output optimization, making large data exploration more efficient and user-friendly.
It also cleans up validation logic for formatter settings and supports custom styling providers more robustly.

What changes are included in this PR?

  • Added max_memory_bytes, min_rows_display, and repr_rows parameters to the DataFrame HTML formatter.
  • Updated Python configure_formatter API and documentation to expose new parameters.
  • Improved internal validation for formatter parameters (_validate_positive_int, _validate_bool).
  • Introduced FormatterConfig in Rust to carry display configuration across DataFrame rendering.
  • Updated Rust collect_record_batches_to_display to respect new memory and row limits dynamically.
  • New tests to cover memory limits, row controls, and style provider usage.
  • Documentation updates explaining memory and performance optimizations, including use_shared_styles.

Are these changes tested?

✅ Yes, additional tests have been added:

  • Validation of new parameters in test_html_formatter_memory_and_rows.
  • Verification of custom style provider behavior combined with formatter parameters.
  • Edge case testing for extreme values (e.g., very high/low limits).

Are there any user-facing changes?

✅ Yes:

  • Users can now configure how much memory and how many rows are used when displaying DataFrames.
  • Improved error messages for invalid formatter configurations.
  • Better performance when rendering large numbers of DataFrames in Jupyter notebooks or other rich environments.
  • Documentation updated to reflect the new options available.

kosiew and others added 30 commits April 27, 2025 18:58
…pache#1108)

* docs: enhance user guide with detailed DataFrame operations and examples

* move /docs/source/api/dataframe.rst into user-guide

* docs: remove  DataFrame API documentation

* docs: fix formatting inconsistencies in DataFrame user guide

* Two minor corrections to documentation rendering

---------

Co-authored-by: Tim Saucer <timsaucer@gmail.com>
@kosiew kosiew force-pushed the formatter-options branch from 6588c8c to d5a3f1f Compare April 28, 2025 08:37
@kosiew kosiew force-pushed the formatter-options branch from 347282c to a9668c7 Compare April 28, 2025 14:02
@kosiew kosiew force-pushed the formatter-options branch from 583bc9b to 714aa70 Compare April 28, 2025 14:11
@timsaucer
Copy link
Contributor

Incredible work. Thank you!

@timsaucer timsaucer merged commit 7d8bcd8 into apache:main May 5, 2025
17 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants
0