8000 feat/making global context accessible for users by jsai28 · Pull Request #1060 · apache/datafusion-python · GitHub
[go: up one dir, main page]

Skip to content

feat/making global context accessible for users #1060

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

jsai28
Copy link
Contributor
@jsai28 jsai28 commented Mar 12, 2025

Which issue does this PR close?

This addresses #1045 - making global context more accessible.

Rationale for this change

This change allows users to more easily access the global context and register functions with it.

What changes are included in this PR?

The get_global_ctx method is made public and added to the python wrapper code in context.py. A new class method in SessionContext allows users to retrieve the global context. Changes were also made to the read_* functions in io.py to use the global context from the python wrapper.

Are there any user-facing changes?

Yes, a new class method to SessionContext. Documentation will be added upon review.

@jsai28
Copy link
Contributor Author
jsai28 commented Mar 12, 2025

@timsaucer Hopefully I've implemented what you're looking for in #1045. I went ahead and changed the read_* functions in io.py to use the global context from this python wrapper but not sure if that is required.

I think the unit tests could use a little work but hoping to receive your feedback first.

Comment on lines 471 to 472
_global_instance = None

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why move this from the rust code to the python code?

@@ -498,6 +500,18 @@ def __init__(

self.ctx = SessionContextInternal(config, runtime)

@classmethod
def global_ctx(cls) -> "SessionContextInternal":
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we want to return the wrapper SessionContext since end users may be getting this and would want to use the associated methods from the wrapper classes. It would mean updating your methods in read accordingly.

@@ -308,7 +308,7 @@ impl PySessionContext {

#[classmethod]
#[pyo3(signature = ())]
fn _global_ctx(_cls: &Bound<'_, PyType>) -> PyResult<Self> {
fn global_ctx(_cls: &Bound<'_, PyType>) -> PyResult<Self> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As you have it here where you moved the single entry over to the python side, this method goes unused. I would recommend you leave this line as is, but up in the python code you call this method instead of creating _global_instance

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your comments, just to summarize whats needed here:

  1. Expose the global context (_global_ctx -> global_ctx), which I've currently done.
  2. A python wrapper should be created for the global context (in the SessionContext class) which calls the above function and wraps it in SessionContext so that users can still use the other associated methods in this class, but with the global context. This should be a class method so that users dont have to instantiate SessionContext first.
  3. The read_* functions (read_parquet, etc) should use the global context from this python wrapper instead of using the one from the internal implementation.

Am I interpreting this correctly? Sorry if I'm overthinking this 😅. I've updated the PR, currently the test_read_csv and test_read_csv_list tests fail so I'm looking into that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this description looks correct.

@timsaucer
Copy link
Contributor

Can you rebase on main to resolve the conflicts? You'll probably have to handle the updated ruff rules in your code.

@jsai28
Copy link
Contributor Author
jsai28 commented Mar 12, 2025

Ah looks like I needed to run ruff format along with ruff check.

Copy link
Contributor
@timsaucer timsaucer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for all of the work on this!

8000
@timsaucer timsaucer merged commit 3dcf7c7 into apache:main Mar 13, 2025
17 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants
0