feat: Add `dy.infer_schema` by gab23r · Pull Request #294 · Quantco/dataframely

gab23r · 2026-03-05T09:52:34Z

Fixes: #232

Add dy.infer_schema() function to generate dataframely schema code from a Polars DataFrame
Supports three output modes via return_type parameter:
- None (default): prints schema to stdout for quick exploration
- "string": returns schema code as a string
- "schema": returns an actual Schema class for direct use
Handles all Polars types including nested types (List, Array, Struct) with proper inner nullability detection
Automatically handles invalid Python identifiers and keywords using aliases

This add the

>>> import polars as pl
>>> import dataframely as dy
>>> df = pl.DataFrame({
...     "name": ["Alice", "Bob"],
...     "age": [25, 30],
...     "score": [95.5, None],
... })
>>> dy.infer_schema(df, "PersonSchema")
class PersonSchema(dy.Schema):
    name = dy.String()
    age = dy.Int64()
    score = dy.Float64(nullable=True)
>>> schema = dy.infer_schema(df, "PersonSchema", return_type="schema")
>>> schema.is_valid(df)
True

Not supported (potential future enhancements)

Assess min/max length of string values to suggest min_length/max_length constraints
Suggest Enum if there are fewer than 10-20 distinct string values in a column
Suggest Categorical if there are 50-100 distinct string values in a dataframe with >100k rows

codecov · 2026-03-05T09:54:17Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 100.00%. Comparing base (b3edd6a) to head (7ee32cf).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##              main      #294    +/-   ##
==========================================
  Coverage   100.00%   100.00%            
==========================================
  Files           54        55     +1     
  Lines         3121      3250   +129     
==========================================
+ Hits          3121      3250   +129

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copilot

Pull request overview

This PR adds a new dy.infer_schema() function (addressing Issue #232) that generates dataframely schema code from a Polars DataFrame. The function inspects a DataFrame's column types and null counts to produce schema class definitions with appropriate column types and nullable annotations.

Changes:

New dataframely/_generate_schema.py module implementing infer_schema() with three return modes (print to stdout, return as string, or return as an executable Schema class), plus supporting helper functions for code generation.
Public API export of infer_schema in dataframely/__init__.py.
New test file tests/test_infer_schema.py covering basic types, nullable detection, datetime types, nested types, invalid identifiers, and round-trip validation.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 6 comments.

File	Description
`dataframely/_generate_schema.py`	New module with `infer_schema()` function and helpers for inferring schema from DataFrame columns, handling type mapping, identifier sanitization, and code generation.
`dataframely/__init__.py`	Exports `infer_schema` in the public API (`import` and `__all__`).
`tests/test_infer_schema.py`	Tests for string output mode across all supported types and round-trip validation via schema return mode.

dataframely/__init__.py

tests/test_infer_schema.py

dataframely/_generate_schema.py

tests/test_infer_schema.py

gab23r · 2026-03-10T10:54:28Z

hello @borchero, does this implementation is close to something mergable ?

AndreasAlbertQC

Hey @gab23r , thanks for the PR! I think the core functionality is pretty solid, but I'd like us to tune the API a little and align the structure of the code more closely with what we usually do in this repo. Could you also add an entry to FAQ docs page?

AndreasAlbertQC · 2026-03-12T08:30:59Z

dataframely/_generate_schema.py

+    df: pl.DataFrame,
+    schema_name: str = ...,
+    *,
+    return_type: None = ...,


I'd prefer to not have the function print to the screen by default. Arguably, this is just syntactic sugar for print(dy.infer_schema(...), right? I think it would be ok to let the user print themselves.

AndreasAlbertQC · 2026-03-12T08:31:36Z

dataframely/_generate_schema.py

+    df: pl.DataFrame,
+    schema_name: str = ...,
+    *,
+    return_type: Literal["schema"],


What's the use case for creating a dynamic Schema object?

AndreasAlbertQC · 2026-03-12T08:36:28Z

dataframely/_generate_schema.py

+    # Ensure it's not empty
+    if not result:
+        result = "_column"


What's the use case for supporting empty column names? Shouldn't the user just set some name before?

AndreasAlbertQC · 2026-03-12T08:38:47Z

dataframely/_generate_schema.py

+    if dtype == pl.Null():
+        return f"dy.Any({_format_args(alias=alias)})"


covered by fallback

AndreasAlbertQC · 2026-03-12T08:39:18Z

dataframely/_generate_schema.py

+    dtype = series.dtype
+    nullable = series.null_count() > 0
+
+    # Simple types


this code feels quite repetitive, can we make it more concise?

AndreasAlbertQC · 2026-03-12T08:40:55Z

tests/test_infer_schema.py

+import dataframely as dy
+
+
+class TestInferSchema:


We tend to not use test classes for organizing tests in this repo. Please check out the other tests and adopt a similar organization pattern

mvp infer schema

fa3b9fa

Copilot AI review requested due to automatic review settings March 5, 2026 09:52

gab23r requested review from AndreasAlbertQC, borchero and delsner as code owners March 5, 2026 09:52

Copilot started reviewing on behalf of gab23r March 5, 2026 09:53 View session

Copilot AI reviewed Mar 5, 2026

View reviewed changes

gabriel added 2 commits March 5, 2026 11:20

increase code coverage

6c19bfa

copilot

f0e07fb

gab23r changed the title ~~Feat: Add dy.infer_schema~~ feat: Add dy.infer_schema Mar 5, 2026

github-actions bot added the enhancement New feature or request label Mar 5, 2026

pragma: no cover

7ee32cf

AndreasAlbertQC requested changes Mar 12, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Add `dy.infer_schema`#294

feat: Add `dy.infer_schema`#294
gab23r wants to merge 4 commits intoQuantco:mainfrom
gab23r:infer-schema

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		if dtype == pl.Null():
		return f"dy.Any({_format_args(alias=alias)})"

Conversation

Not supported (potential future enhancements)

Uh oh!

Uh oh!

Codecov Report

Uh oh!

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants