10000 `feat`: Better Polars Schema support by jqnatividad · Pull Request #2703 · dathere/qsv · GitHub
[go: up one dir, main page]

Skip to content

feat: Better Polars Schema support #2703

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Apr 19, 2025
Merged

Conversation

jqnatividad
Copy link
Collaborator
  • schema: add polars schema generation - resolves schema: add a Polars mode to infer a Polars schema #2700
  • sqlp: major refactoring for better polars schema support
    • use pschema.json file when valid and available
    • more robust input processing that auto-retries
      • use pschema if available
      • if not available, use --infer-len to infer schema
      • if loading input fails, try to infer schema and load file with inferred schema
      • if it still fails, scan the whole file to infer the schema and try again
      • only fail, if all the above fails
    • removed fast path optimization where we automatically use the read_csv table function when there is only one input, as its prone to failure with real-world data as the read_csv table function only has a default infer-schema length of 100 rows
  • joinp: will now use an existing valid polars schema so long as
    • --cache-schema is not -1 or -2 and
    • --infer-len is not set to a non-default value (10000)
  • pivotp will now use an existing valid polars schema unless --infer-len is set to a non-default value (10000)

… long as

--cache-schema is not -1 or -2 and
--infer-len is not set to a non-default value (10000)
…fallbacks; remove "fast-path" optimization

- use pschema.json file when valid and available
- more robust input processing that auto-retries
  - use pschema if available
  - if not available, use --infer-len to infer schema
  - if loading input fails, try to infer schema and load file with inferred schema
  - if it still fails, scan the whole file to infer the schema and try again
  - only fail, if all the above fails
- removed fast path optimization where we automatically use the read_csv table function when there is only one input, as its prone to failure with real-world data as the read_csv table function only has a default infer-schema length of 100 rows
@jqnatividad jqnatividad merged commit db2373b into master Apr 19, 2025
14 of 15 checks passed
@jqnatividad jqnatividad deleted the 2700-schema-polars-mode branch April 19, 2025 14:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

schema: add a Polars mode to infer a Polars schema
1 participant
0