8000 Add Configurable HTML Table Formatter for DataFusion DataFrames in Py… · chenkovsky/datafusion-python@818975b · GitHub
[go: up one dir, main page]

Skip to content

Commit 818975b

Browse files
authored
Add Configurable HTML Table Formatter for DataFusion DataFrames in Python (apache#1100)
* feat: add configurable HTML formatter for DataFrames * fix: update schema iteration in DataFrameHtmlFormatter to use correct format * refactor: remove unused constant MAX_LENGTH_CELL_WITHOUT_MINIMIZE in PyTableProvider * refactor: improve HTML rendering structure in DataFrameHtmlFormatter - Added List import to typing for type hints. - Refactored format_html method to modularize HTML component generation. - Created separate methods for building HTML header, table container, header, body, expandable cells, regular cells, and footer for better readability and maintainability. - Updated table_uuid generation to use f-string for consistency. - Ensured all HTML components are returned as lists for efficient joining. * doc: enhance docstrings for DataFrameHtmlFormatter methods to clarify usage * refactor: enhance DataFrameHtmlFormatter with customizable cell and header styles - Added methods `get_cell_style()` and `get_header_style()` to allow subclasses to customize the CSS styles for table cells and headers. - Updated `_build_table_header()` and `_build_regular_cell()` methods to utilize the new styling methods for improved maintainability. - Introduced a registry for custom type formatters in `DataFrameHtmlFormatter` to enable flexible formatting of cell values based on their types. - Enhanced `_format_cell_value()` to check for registered formatters before defaulting to string conversion, improving extensibility. * refactor: enhance DataFrameHtmlFormatter with custom cell and header builders - Introduced CellFormatter and StyleProvider protocols for better extensibility. - Added DefaultStyleProvider class with default CSS styles for cells and headers. - Updated DataFrameHtmlFormatter to support custom cell and header builders. - Refactored methods to utilize the new style provider for consistent styling. - Improved documentation for methods and classes to clarify usage and customization options. * doc: expand module docstring for DataFrameHtmlFormatter with usage examples and customization options * refactor: streamline HTML formatter by removing extensive docstring examples and enhancing cell formatting methods - Removed lengthy examples from the docstring of DataFrameHtmlFormatter to improve readability. - Added methods for extracting and formatting cell values, enhancing the clarity and maintainability of the code. - Updated cell building methods to utilize the new formatting logic, ensuring consistent application of styles and behaviors. - Introduced a reset fixture for tests to ensure the formatter is returned to default settings after each test case. - Added tests for HTML formatter configuration, custom style providers, type formatters, custom cell builders, and complex customizations to ensure robust functionality. * refactor: improve cell rendering logic in DataFrameHtmlFormatter by utilizing raw values for custom cell builders and optimizing expandable cell creation * refactor: enhance HTML representation in DataFrame by integrating latest formatter and improving cell value formatting logic * refactor: improve HTML formatting logic in DataFrame by separating data collection and schema retrieval for clarity refactor: enhance reset_formatter fixture to preserve original formatter configuration during tests * refactor: add debug utilities for HTML formatter integration testing and enhance debugging output in DataFrameHtmlFormatter * refactor: implement HTML formatter patch for DataFrame and enhance value retrieval in cell formatting * fix: correct typo in file extension check for parquet files in test_write_compressed_parquet * test: add test for DataFrame._repr_html_ to validate HTML output structure * refactor: remove monkeypatch for DataFrame._repr_html_ and associated logic * refactor: simplify _repr_html_ method in DataFrame to directly call internal representation * refactor: remove debug utilities for HTML formatter integration in DataFrame * refactor: remove debug print statements from DataFrameHtmlFormatter and add HTML formatter integration tests - Removed debug print statements from format_html, _build_table_body, and get_formatter methods in DataFrameHtmlFormatter to clean up the code. - Introduced a new debug_utils.py file containing a function to check HTML formatter integration. - Updated __init__.py to include configure_formatter for easier access. - Enhanced DataFrame class to include a docstring for _repr_html_ method. - Added comprehensive tests for HTML formatter configuration, custom style providers, type formatters, and cell/header builders in test_dataframe.py. * refactor: streamline imports and enhance HTML formatter integration in tests - Removed redundant import of `configure_formatter` in `__init__.py`. - Added `configure_formatter` to `__all__` in `__init__.py` for better module exposure. - Cleaned up import statements in `html_formatter.py` for clarity. - Consolidated import statements in `test_dataframe.py` for improved readability. - Simplified the `reset_formatter` fixture by removing unnecessary imports and comments. * refactor: remove redundant imports and debug print statements in HTML formatter tests * refactor: add reset_formatter function to reset global HTML formatter state - Implemented reset_formatter to create a new default DataFrame HTML formatter and update the global reference. - Added clean_formatter_state fixture in tests to ensure a fresh formatter state for each test case. - Updated test cases to use clean_formatter_state instead of the previous reset_formatter implementation. * refactor: enhance DataFrameHtmlFormatter initialization with parameter validation * test: add custom cell builder test for HTML formatter with value-based styling * test: enhance DataFrame HTML representation tests for structure and values * feat: enhance DataFrameHtmlFormatter with shared styles support and reset functionality - Added `use_shared_styles` parameter to control loading of styles/scripts. - Implemented logic to conditionally include styles based on `use_shared_styles`. - Updated the constructor to validate `use_shared_styles` as a boolean. - Introduced `reset_styles_loaded_state` function to reset the styles loaded state. - Modified `reset_formatter` to reset the `_styles_loaded` flag. * refactor: update footer comment in DataFrameHtmlFormatter to clarify content * test: enhance HTML representation test to accommodate span-wrapped values * docs: add usage examples to formatter functions in html_formatter.py * test: add HTML formatter tests for shared styles functionality * feat: add method to check if styles are loaded and enhance schema validation in DataFrameHtmlFormatter * refactor: streamline custom cell builder in HTML formatter tests for clarity and maintainability * fix ruff errors * chore: update license header in html_formatter.py for compliance * refactor: improve HTML formatter tests by updating import statements and enhancing regex patterns for body data * fix clippy errors
1 parent 09b929a commit 818975b

File tree

4 files changed

+1061
-114
lines changed

4 files changed

+1061
-114
lines changed

python/datafusion/__init__.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -45,6 +45,7 @@
4545
Expr,
4646
WindowFrame,
4747
)
48+
from .html_formatter import configure_formatter
4849
from .io import read_avro, read_csv, read_json, read_parquet
4950
from .plan import ExecutionPlan, LogicalPlan
5051
from .record_batch import RecordBatch, RecordBatchStream
@@ -76,6 +77,7 @@
7677
"col",
7778
"column",
7879
"common",
80+
"configure_formatter",
7981
"expr",
8082
"functions",
8183
"lit",

0 commit comments

Comments
 (0)
0