Vortex#
An extensible ecosystem for compressed columnar data. Spans in-memory arrays, on-disk file formats, over-the-wire protocols, and integrations with query engines — all built around the latest research from the database community.
Where to start#
Get started with Vortex in Python, Rust, or Java. Convert from Parquet, compress your data, and query it.
Integrate Vortex with DataFusion, DuckDB, Spark, Trino, or Ray for accelerated queries over compressed data.
Learn how DTypes, Arrays, Encodings, Layouts, and the Scan API fit together as building blocks.
Write your own encodings, layouts, compute functions, or extension types from Rust or Python.
Build a query engine connector or data source using the Scan API, C FFI, or C++ wrapper.
Explore the crate architecture, async runtime, session system, and integration internals. Build and benchmark locally.
Highlights#
Compressed arrays: Operate directly on compressed data with encodings like FastLanes, FSST, and ALP — no decompression needed for many operations.
Extensible file format: Zero-allocation reads, FlatBuffer metadata for O(1) column access, and optional WASM decompression kernels for forward compatibility.
Query engine integration: Filter and projection pushdown through the Scan API, with native integrations for DataFusion, DuckDB, Spark, Trino, and Ray.
Language bindings: First-class support for Python (PyO3), Java (JNI + Spark/Trino connectors), and C/C++ (FFI).