[DISCUSSION] We need a Hero for datafusion-python

What this project could be

I think this project needs someone who wants to make a world class python dataframe library and user experience take the helm. I will argue why I think this is a compelling opportunity to make a great piece of technology and have a wide impact across the data analytic space:

What this project could be

I think this project could be one of the most widely used data analysis libraries out there. Imagine a system that allows BOTH a fast dataframe API (ala pol.rs) but also first class SQL support (ala duckdb) that are both screaming fast (due to all the effort that goes into https://github.com/apache/arrow-datafusion) as well as easy to plug into the eco system (arrow / parquet) and extensible (UDFS, UDAs, etc)

DataFusion already posts great benchmark numbers, and I will post datafusion 28.0.0 benchmark when we have them.

How is this different than the mission of DataFusion?

DataFusion is a great project but is currently focused on building the core analytic engine:

DataFusion is a very fast, extensible query engine for building high-quality data-centric systems in Rust, using the Apache Arrow in-memory format.

This repository contains basic python bindings, but the user experience (UX) could be improved in so many ways.

The opportunity

This would be a great opportunity for someone to:

Build some really cool technology
Learn how to help grow an open source project and community with help and guidance from the rest of the DataFusion community
Learn about analytic database technology, Arrow, etc
Influence the direction of Development in DataFusion

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

What this project could be

What this project could be

How is this different than the mission of DataFusion?

The opportunity

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Description

What this project could be

What this project could be

How is this different than the mission of DataFusion?

The opportunity

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions