10000 Partial fix for 1078: Enhance DataFrame Formatter Configuration with Memory and Display Controls by kosiew · Pull Request #1119 · apache/datafusion-python · GitHub
[go: up one dir, main page]

Skip to content

Partial fix for 1078: Enhance DataFrame Formatter Configuration with Memory and Display Controls #1119

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 42 commits into from
May 5, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
42 commits
Select commit Hold shift + click to select a range
f9b78fa
feat: add configurable max table bytes and min table rows for DataFra…
kosiew Apr 27, 2025
4d8fa38
Revert "feat: add configurable max table bytes and min table rows for…
kosiew Apr 27, 2025
a9178fe
feat: add FormatterConfig for configurable DataFrame display options
kosiew Apr 27, 2025
d0209cf
refactor: simplify attribute extraction in get_formatter_config function
kosiew Apr 27, 2025
2ef013f
refactor: remove hardcoded constants and use FormatterConfig for disp…
kosiew Apr 27, 2025
bea52a3
refactor: simplify record batch collection by using FormatterConfig f…
kosiew Apr 27, 2025
ce15f1d
feat: add max_memory_bytes, min_rows_display, and repr_rows parameter…
kosiew Apr 27, 2025
e089d7b
feat: add tests for HTML formatter row display settings and memory limit
kosiew Apr 27, 2025
a6792c9
refactor: extract Python formatter retrieval into a separate function
kosiew Apr 27, 2025
af678b5
Revert "feat: add tests for HTML formatter row display settings and m…
kosiew Apr 27, 2025
4090fd2
feat: add tests for HTML formatter row and memory limit configurations
kosiew Apr 27, 2025
174374a
Revert "feat: add tests for HTML formatter row and memory limit confi…
kosiew Apr 27, 2025
c10d7eb
feat: add tests for new parameters and validation in DataFrameHtmlFor…
kosiew Apr 28, 2025
620d25e
Reorganize tests
kosiew Apr 28, 2025
82ddd28
refactor: rename and restructure formatter functions for clarity and …
kosiew Apr 28, 2025
302f785
feat: implement PythonFormatter struct and refactor formatter retriev…
kosiew Apr 28, 2025
2936455
refactor: improve comments and restructure FormatterConfig usage in P…
kosiew Apr 28, 2025
877226a
Add DataFrame usage guide with HTML rendering customization options (…
kosiew Apr 27, 2025
ccf2549
Update documentation
kosiew Apr 28, 2025
194ed46
refactor: streamline HTML rendering documentation
kosiew Apr 28, 2025
8667128
refactor: extract validation logic into separate functions for clarity
kosiew Apr 28, 2025
15435bf
Implement feature X to enhance user experience and optimize performance
kosiew Apr 28, 2025
412ab3a
feat: add validation method for FormatterConfig to ensure positive in…
kosiew Apr 28, 2025
9a1f59f
add comment - ensure minimum rows are collected even if memory or row…
kosiew Apr 28, 2025
5e3dd06
Merge branch 'main' into formatter-options
kosiew Apr 28, 2025
bfb2b12
Update html_formatter documentation
kosiew Apr 28, 2025
1fdc2c2
update tests
kosiew Apr 28, 2025
4da231b
remove unused type hints from imports in html_formatter.py
kosiew Apr 28, 2025
43a3183
remove redundant tests for DataFrameHtmlFormatter and clean up assert…
kosiew Apr 28, 2025
b52166b
refactor get_attr function to support generic default values
kosiew Apr 28, 2025
9f68c83
build_formatter_config_from_python return PyResult
kosiew Apr 28, 2025
2c0565a
fix ruff errors
kosiew Apr 28, 2025
946dcdc
trigger ci
kosiew Apr 28, 2025
b5ab123
fix: remove redundant newline in test_custom_style_provider_html_form…
kosiew Apr 28, 2025
d5a3f1f
add more tests
kosiew Apr 28, 2025
6cf0689
trigger ci
kosiew Apr 28, 2025
a9668c7
Merge branch 'main' into formatter-options
kosiew Apr 28, 2025
714aa70
Fix ruff errors
kosiew Apr 28, 2025
001988e
fix clippy error
kosiew Apr 29, 2025
2314613
feat: add validation for parameters in configure_formatter
kosiew Apr 29, 2025
f5bec5b
test: add tests for invalid parameters in configure_formatter
kosiew Apr 29, 2025
2da8da9
Fix ruff errors
kosiew Apr 29, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
52 changes: 45 additions & 7 deletions docs/source/user-guide/dataframe.rst
Original file line number Diff line number Diff line change
Expand Up @@ -75,13 +75,17 @@ You can customize how DataFrames are rendered in HTML by configuring the formatt

# Change the default styling
configure_formatter(
max_rows=50, # Maximum number of rows to display
max_width=None, # Maximum width in pixels (None for auto)
theme="light", # Theme: "light" or "dark"
precision=2, # Floating point precision
thousands_separator=",", # Separator for thousands
date_format="%Y-%m-%d", # Date format
truncate_width=20 # Max width for string columns before truncating
max_cell_length=25, # Maximum characters in a cell before truncation
max_width=1000, # Maximum width in pixels
max_height=300, # Maximum height in pixels
max_memory_bytes=2097152, # Maximum memory for rendering (2MB)
min_rows_display=20, # Minimum number of rows to display
repr_rows=10, # Number of rows to display in __repr__
enable_cell_expansion=True,# Allow expanding truncated cells
custom_css=None, # Additional custom CSS
show_truncation_message=True, # Show message when data is truncated
style_provider=None, # Custom styling provider
use_shared_styles=True # Share styles across tables
)

The formatter settings affect all DataFrames displayed after configuration.
Expand Down Expand Up @@ -113,6 +117,25 @@ For advanced styling needs, you can create a custom style provider:
# Apply the custom style provider
configure_formatter(style_provider=MyStyleProvider())

Performance Optimization with Shared Styles
-------------------------------------------
The ``use_shared_styles`` parameter (enabled by default) optimizes performance when displaying
multiple DataFrames in notebook environments:

.. code-block:: python
from datafusion.html_formatter import StyleProvider, configure_formatter
# Default: Use shared styles (recommended for notebooks)
configure_formatter(use_shared_styles=True)

# Disable shared styles (each DataFrame includes its own styles)
configure_formatter(use_shared_styles=False)

When ``use_shared_styles=True``:
- CSS styles and JavaScript are included only once per notebook session
- This reduces HTML output size and prevents style duplication
- Improves rendering performance with many DataFrames
- Applies consistent styling across all DataFrames

Creating a Custom Formatter
---------------------------

Expand Down Expand Up @@ -177,3 +200,18 @@ You can also use a context manager to temporarily change formatting settings:

# Back to default formatting
df.show()

Memory and Display Controls
---------------------------

You can control how much data is displayed and how much memory is used for rendering:

.. code-block:: python

configure_formatter(
max_memory_bytes=4 * 1024 * 1024, # 4MB maximum memory for display
min_rows_display=50, # Always show at least 50 rows
repr_rows=20 # Show 20 rows in __repr__ output
)

These parameters help balance comprehensive data display against performance considerations.
104 changes: 84 additions & 20 deletions python/datafusion/html_formatter.py
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,36 @@
)


def _validate_positive_int(value: Any, param_name: str) -> None:
"""Validate that a parameter is a positive integer.

Args:
value: The value to validate
param_name: Name of the parameter (used in error message)

Raises:
ValueError: If the value is not a positive integer
"""
if not isinstance(value, int) or value <= 0:
msg = f"{param_name} must be a positive integer"
raise ValueError(msg)


def _validate_bool(value: Any, param_name: str) -> None:
"""Validate that a parameter is a boolean.

Args:
value: The value to validate
param_name: Name of the parameter (used in error message)

Raises:
TypeError: If the value is not a boolean
"""
if not isinstance(value, bool):
msg = f"{param_name} must be a boolean"
raise TypeError(msg)


@runtime_checkable
class CellFormatter(Protocol):
"""Protocol for cell value formatters."""
Expand Down Expand Up @@ -91,6 +121,9 @@ class DataFrameHtmlFormatter:
max_cell_length: Maximum characters to display in a cell before truncation
max_width: Maximum width of the HTML table in pixels
max_height: Maximum height of the HTML table in pixels
max_memory_bytes: Maximum memory in bytes for rendered data (default: 2MB)
min_rows_display: Minimum number of rows to display
repr_rows: Default number of rows to display in repr output
enable_cell_expansion: Whether to add expand/collapse buttons for long cell
values
custom_css: Additional CSS to include in the HTML output
Expand All @@ -108,6 +141,9 @@ def __init__(
max_cell_length: int = 25,
max_width: int = 1000,
max_height: int = 300,
max_memory_bytes: int = 2 * 1024 * 1024, # 2 MB
min_rows_display: int = 20,
repr_rows: int = 10,
enable_cell_expansion: bool = True,
custom_css: Optional[str] = None,
show_truncation_message: bool = True,
Expand All @@ -124,6 +160,12 @@ def __init__(
Maximum width of the displayed table in pixels.
max_height : int, default 300
Maximum height of the displayed table in pixels.
max_memory_bytes : int, default 2097152 (2MB)
Maximum memory in bytes for rendered data.
min_rows_display : int, default 20
Minimum number of rows to display.
repr_rows : int, default 10
Default number of rows to display in repr output.
enable_cell_expansion : bool, default True
Whether to allow cells to expand when clicked.
custom_css : str, optional
Expand All @@ -139,7 +181,8 @@ def __init__(
Raises:
------
ValueError
If max_cell_length, max_width, or max_height is not a positive integer.
If max_cell_length, max_width, max_height, max_memory_bytes,
min_rows_display, or repr_rows is not a positive integer.
TypeError
If enable_cell_expansion, show_truncation_message, or use_shared_styles is
not a boolean,
Expand All @@ -148,27 +191,17 @@ def __init__(
protocol.
"""
# Validate numeric parameters

if not isinstance(max_cell_length, int) or max_cell_length <= 0:
msg = "max_cell_length must be a positive integer"
raise ValueError(msg)
if not isinstance(max_width, int) or max_width <= 0:
msg = "max_width must be a positive integer"
raise ValueError(msg)
if not isinstance(max_height, int) or max_height <= 0:
msg = "max_height must be a positive integer"
raise ValueError(msg)
_validate_positive_int(max_cell_length, "max_cell_length")
_validate_positive_int(max_width, "max_width")
_validate_positive_int(max_height, "max_height")
_validate_positive_int(max_memory_bytes, "max_memory_bytes")
_validate_positive_int(min_rows_display, "min_rows_display")
_validate_positive_int(repr_rows, "repr_rows")

# Validate boolean parameters
if not isinstance(enable_cell_expansion, bool):
msg = "enable_cell_expansion must be a boolean"
raise TypeError(msg)
if not isinstance(show_truncation_message, bool):
msg = "show_truncation_message must be a boolean"
raise TypeError(msg)
if not isinstance(use_shared_styles, bool):
msg = "use_shared_styles must be a boolean"
raise TypeError(msg)
_validate_bool(enable_cell_expansion, "enable_cell_expansion")
_validate_bool(show_truncation_message, "show_truncation_message")
_validate_bool(use_shared_styles, "use_shared_styles")

# Validate custom_css
if custom_css is not None and not isinstance(custom_css, str):
Expand All @@ -183,6 +216,9 @@ def __init__(
self.max_cell_length = max_cell_length
self.max_width = max_width
self.max_height = max_height
self.max_memory_bytes = max_memory_bytes
self.min_rows_display = min_rows_display
self.repr_rows = repr_rows
self.enable_cell_expansion = enable_cell_expansion
self.custom_css = custom_css
self.show_truncation_message = show_truncation_message
Expand Down Expand Up @@ -597,6 +633,9 @@ def configure_formatter(**kwargs: Any) -> None:
**kwargs: Formatter configuration parameters like max_cell_length,
max_width, max_height, enable_cell_expansion, etc.

Raises:
ValueError: If any invalid parameters are provided

Example:
>>> from datafusion.html_formatter import configure_formatter
>>> configure_formatter(
Expand All @@ -606,6 +645,31 @@ def configure_formatter(**kwargs: Any) -> None:
... use_shared_styles=True
... )
"""
# Valid parameters accepted by DataFrameHtmlFormatter
valid_params = {
"max_cell_length",
"max_width",
"max_height",
"max_memory_bytes",
"min_rows_display",
"repr_rows",
"enable_cell_expansion",
"custom_css",
"show_truncation_message",
"style_provider",
"use_shared_styles",
}

# Check for invalid parameters
invalid_params = set(kwargs) - valid_params
if invalid_params:
msg = (
f"Invalid formatter parameters: {', '.join(invalid_params)}. "
f"Valid parameters are: {', '.join(valid_params)}"
)
raise ValueError(msg)

# Create and set formatter with validated parameters
set_formatter(DataFrameHtmlFormatter(**kwargs))


Expand Down
Loading
0