Vortex#

An extensible ecosystem for compressed columnar data. Spans in-memory arrays, on-disk file formats, over-the-wire protocols, and integrations with query engines — all built around the latest research from the database community.

Where to start#

Read & write Vortex files

Get started with Vortex in Python, Rust, or Java. Convert from Parquet, compress your data, and query it.

Quickstart

Use with a query engine

Integrate Vortex with DataFusion, DuckDB, Spark, Trino, or Ray for accelerated queries over compressed data.

User Guides

Understand the architecture

Learn how DTypes, Arrays, Encodings, Layouts, and the Scan API fit together as building blocks.

Concepts

Extend Vortex

Write your own encodings, layouts, compute functions, or extension types from Rust or Python.

Developer Guide

Create an engine integration

Build a query engine connector or data source using the Scan API, C FFI, or C++ wrapper.

Developer Guide

Internals

Explore the crate architecture, async runtime, session system, and integration internals. Build and benchmark locally.

Developer Guide

Highlights#

Compressed arrays: Operate directly on compressed data with encodings like FastLanes, FSST, and ALP — no decompression needed for many operations.
Extensible file format: Zero-allocation reads, FlatBuffer metadata for O(1) column access, and optional WASM decompression kernels for forward compatibility.
Query engine integration: Filter and projection pushdown through the Scan API, with native integrations for DataFusion, DuckDB, Spark, Trino, and Ray.
Language bindings: First-class support for Python (PyO3), Java (JNI + Spark/Trino connectors), and C/C++ (FFI).