8000 remove SQL om polars/pandas/cudf experiment (#602) · 3ok/datafusion-python@1ff22bb · GitHub
[go: up one dir, main page]

Skip to content

Commit 1ff22bb

Browse files
authored
remove SQL om polars/pandas/cudf experiment (apache#602)
1 parent 76ecf56 commit 1ff22bb

File tree

8 files changed

+10
-525
lines changed

8 files changed

+10
-525
lines changed

README.md

Lines changed: 10 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -24,7 +24,16 @@
2424

2525
This is a Python library that binds to [Apache Arrow](https://arrow.apache.org/) in-memory query engine [DataFusion](https://github.com/apache/arrow-datafusion).
2626

27-
DataFusion's Python bindings can be used as an end-user tool as well as providing a foundation for building new systems.
27+
DataFusion's Python bindings can be used as a foundation for building new data systems in Python. Here are some examples:
28+
29+
- [Dask SQL](https://github.com/dask-contrib/dask-sql) uses DataFusion's Python bindings for SQL parsing, query
30+
planning, and logical plan optimizations, and then transpiles the logical plan to Dask operations for execution.
31+
- [DataFusion Ballista](https://github.com/apache/arrow-ballista) is a distributed SQL query engine that extends
32+
DataFusion's Python bindings for distributed use cases.
33+
34+
It is also possible to use these Python bindings directly for DataFrame and SQL operations, but you may find that
35+
[Polars](http://pola.rs/) and [DuckDB](http://www.duckdb.org/) are more suitable for this use case, since they have
36+
more of an end-user focus and are more actively maintained than these Python bindings.
2837

2938
## Features
3039

@@ -35,20 +44,6 @@ DataFusion's Python bindings can be used as an end-user tool as well as providin
3544
- Serialize and deserialize query plans in Substrait format.
3645
- Experimental support for transpiling SQL queries to DataFrame calls with Polars, Pandas, and cuDF.
3746

38-
## Comparison with other projects
39-
40-
Here is a comparison with similar projects that may help understand when DataFusion might be suitable and unsuitable
41-
for your needs:
42-
43-
- [DuckDB](http://www.duckdb.org/) is an open source, in-process analytic database. Like DataFusion, it supports
44-
very fast execution, both from its custom file format and directly from Parquet files. Unlike DataFusion, it is
45-
written in C/C++ and it is primarily used directly by users as a serverless database and query system rather than
46-
as a library for building such database systems.
47-
48-
- [Polars](http://pola.rs/) is one of the fastest DataFrame libraries at the time of writing. Like DataFusion, it
49-
is also written in Rust and uses the Apache Arrow memory model, but unlike DataFusion it does not provide full SQL
50-
support, nor as many extension points.
51-
5247
## Example Usage
5348

5449
The following example demonstrates running a SQL query against a Parquet file using DataFusion, storing the results
@@ -143,12 +138,6 @@ See [examples](examples/README.md) for more information.
143138

144139
- [Serialize query plans using Substrait](./examples/substrait.py)
145140

146-
### Executing SQL against DataFrame Libraries (Experimental)
147-
148-
- [Executing SQL on Polars](./examples/sql-on-polars.py)
149-
- [Executing SQL on Pandas](./examples/sql-on-pandas.py)
150-
- [Executing SQL on cuDF](./examples/sql-on-cudf.py)
151-
152141
## How to install (from pip)
153142

154143
### Pip

datafusion/context.py

Lines changed: 0 additions & 142 deletions
This file was deleted.

datafusion/cudf.py

Lines changed: 0 additions & 97 deletions
This file was deleted.

datafusion/pandas.py

Lines changed: 0 additions & 93 deletions
This file was deleted.

0 commit comments

Comments
 (0)
0