8000 [DISCUSSION] We need a Hero for datafusion-python · Issue #440 · apache/datafusion-python · GitHub
[go: up one dir, main page]

Skip to content
[DISCUSSION] We need a Hero for datafusion-python #440
Closed
@alamb

Description

@alamb

What this project could be

I think this project needs someone who wants to make a world class python dataframe library and user experience take the helm. I will argue why I think this is a compelling opportunity to make a great piece of technology and have a wide impact across the data analytic space:

What this project could be

I think this project could be one of the most widely used data analysis libraries out there. Imagine a system that allows BOTH a fast dataframe API (ala pol.rs) but also first class SQL support (ala duckdb) that are both screaming fast (due to all the effort that goes into https://github.com/apache/arrow-datafusion) as well as easy to plug into the eco system (arrow / parquet) and extensible (UDFS, UDAs, etc)

DataFusion already posts great benchmark numbers, and I will post datafusion 28.0.0 benchmark when we have them.

How is this different than the mission of DataFusion?

DataFusion is a great project but is currently focused on building the core analytic engine:

DataFusion is a very fast, extensible query engine for building high-quality data-centric systems in Rust, using the Apache Arrow in-memory format.

image

This repository contains basic python bindings, but the user experience (UX) could be improved in so many ways.

The opportunity

This would be a great opportunity for someone to:

  1. Build some really cool technology
  2. Learn how to help grow an open source project and community with help and guidance from the rest of the DataFusion community
  3. Learn about analytic database technology, Arrow, etc
  4. Influence the direction of Development in DataFusion

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      0