Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #294 +/- ##
==========================================
Coverage 100.00% 100.00%
==========================================
Files 54 55 +1
Lines 3121 3250 +129
==========================================
+ Hits 3121 3250 +129 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Pull request overview
This PR adds a new dy.infer_schema() function (addressing Issue #232) that generates dataframely schema code from a Polars DataFrame. The function inspects a DataFrame's column types and null counts to produce schema class definitions with appropriate column types and nullable annotations.
Changes:
- New
dataframely/_generate_schema.pymodule implementinginfer_schema()with three return modes (print to stdout, return as string, or return as an executable Schema class), plus supporting helper functions for code generation. - Public API export of
infer_schemaindataframely/__init__.py. - New test file
tests/test_infer_schema.pycovering basic types, nullable detection, datetime types, nested types, invalid identifiers, and round-trip validation.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 6 comments.
| File | Description |
|---|---|
dataframely/_generate_schema.py |
New module with infer_schema() function and helpers for inferring schema from DataFrame columns, handling type mapping, identifier sanitization, and code generation. |
dataframely/__init__.py |
Exports infer_schema in the public API (import and __all__). |
tests/test_infer_schema.py |
Tests for string output mode across all supported types and round-trip validation via schema return mode. |
|
hello @borchero, does this implementation is close to something mergable ? |
There was a problem hiding this comment.
Hey @gab23r , thanks for the PR! I think the core functionality is pretty solid, but I'd like us to tune the API a little and align the structure of the code more closely with what we usually do in this repo. Could you also add an entry to FAQ docs page?
| df: pl.DataFrame, | ||
| schema_name: str = ..., | ||
| *, | ||
| return_type: None = ..., |
There was a problem hiding this comment.
I'd prefer to not have the function print to the screen by default. Arguably, this is just syntactic sugar for print(dy.infer_schema(...), right? I think it would be ok to let the user print themselves.
| df: pl.DataFrame, | ||
| schema_name: str = ..., | ||
| *, | ||
| return_type: Literal["schema"], |
There was a problem hiding this comment.
What's the use case for creating a dynamic Schema object?
| # Ensure it's not empty | ||
| if not result: | ||
| result = "_column" |
There was a problem hiding this comment.
What's the use case for supporting empty column names? Shouldn't the user just set some name before?
| if dtype == pl.Null(): | ||
| return f"dy.Any({_format_args(alias=alias)})" |
There was a problem hiding this comment.
covered by fallback
| dtype = series.dtype | ||
| nullable = series.null_count() > 0 | ||
|
|
||
| # Simple types |
There was a problem hiding this comment.
this code feels quite repetitive, can we make it more concise?
| import dataframely as dy | ||
|
|
||
|
|
||
| class TestInferSchema: |
There was a problem hiding this comment.
We tend to not use test classes for organizing tests in this repo. Please check out the other tests and adopt a similar organization pattern
Fixes: #232
dy.infer_schema()function to generate dataframely schema code from a Polars DataFramereturn_typeparameter:None(default): prints schema to stdout for quick exploration"string": returns schema code as a string"schema": returns an actual Schema class for direct useThis add the
Not supported (potential future enhancements)