10000 Add DataFrame API Documentation for DataFusion Python by kosiew · Pull Request #1132 · apache/datafusion-python · GitHub
[go: up one dir, main page]

Skip to content

Add DataFrame API Documentation for DataFusion Python #1132

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

kosiew
Copy link
Contributor
@kosiew kosiew commented May 23, 2025

Which issue does this PR close?

Rationale for this change

This PR introduces comprehensive documentation for the DataFrame API in DataFusion's Python library, including how to create, transform, and render DataFrames, especially in rich environments like Jupyter notebooks. It helps users better understand and utilize the powerful features of the DataFrame class, including HTML rendering customization options.

What changes are included in this PR?

  • Added a new API reference page: docs/source/api/dataframe.rst that covers:

    • DataFrame creation from multiple sources (CSV, Parquet, JSON, Pandas, Arrow, etc.)
    • Common operations (select, filter, join, aggregate, limit, etc.)
    • Terminal operations (collect, to_pandas, show, etc.)
    • Detailed customization for HTML rendering
    • Best practices and advanced usage examples
  • Added a new API index page at docs/source/api/index.rst and updated the Sphinx TOC in index.rst to include it.

  • Updated conf.py to exclude certain module-level docs from auto-generation.

  • Minor formatting fix to a header in functions.rst.

  • Minor indentation correction in dataframe.rst under "Performance Optimization with Shared Styles".

Are these changes tested?

These changes pertain to documentation and do not include functional code. Validation is done via manual Sphinx doc builds. Existing unit tests for the related Python functionality remain unchanged.

Are there any user-facing changes?

Yes, new user-facing documentation is added under the API Reference section, offering detailed guidance and examples for using DataFrame and customizing HTML rendering in notebook environments. This enhances developer experience and usability.

@renato2099
Copy link

As a user, these docs seem great! Looking forward to have them merged!

@kosiew
Copy link
Contributor Author
kosiew commented Jun 2, 2025

@timsaucer ,

this is ready for review

Copy link
Contributor
@timsaucer timsaucer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At a high level, this looks fantastic. I am getting a few errors when I try to build documentation locally. Are you able to run the ./docs/build.sh script without error?


df = ctx.table("your_table")

* From various data sources:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be nice to cross reference to the io pages.

Copy link
Contributor Author
@kosiew kosiew Jun 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add a dataframe.rst for #1100
3 participants
0