:::{warning}
The Python API surface is not yet complete and is subject to change. Many operations available in
the Rust API are not yet exposed. See the {doc}/api/python/index for the full reference.
:::
```bash
pip install vortex-data
```
```bash
uv add vortex-data
```
{func}~vortex.array constructs a Vortex array from Python values:
>>> import vortex as vx
>>> arr = vx.array([1, 2, 3, 4])
>>> arr.dtype
int(64, nullable=False)
>>> len(arr)
4
Python's {obj}None represents a missing value and makes the dtype nullable:
>>> arr = vx.array([1, 2, None, 4])
>>> arr.dtype
int(64, nullable=True)
A list of {class}dict produces a struct array. Missing values may appear at any level:
>>> arr = vx.array([
... {'name': 'Joseph', 'age': 25},
... {'name': None, 'age': 31},
... None,
... ])
>>> arr.dtype
struct({"age": int(64, nullable=True), "name": utf8(nullable=True)}, nullable=True)
{func}~vortex.array also accepts {class}pyarrow.Array, {class}pyarrow.Table,
{class}pandas.DataFrame, and {class}range objects.
DType factory functions are available at the top level of the vortex module:
>>> vx.int_(32)
int(32, nullable=False)
>>> vx.utf8(nullable=True)
utf8(nullable=True)
>>> vx.list_(vx.float_(64))
list(float(64, nullable=False), nullable=False)
>>> vx.struct({'x': vx.int_(32), 'y': vx.int_(32)})
struct({"x": int(32, nullable=False), "y": int(32, nullable=False)}, nullable=False)
Available types: {func}~vortex.null, {func}~vortex.bool_,
{func}~vortex.int_, {func}~vortex.uint, {func}~vortex.float_,
{func}~vortex.decimal, {func}~vortex.utf8, {func}~vortex.binary,
{func}~vortex.struct, {func}~vortex.list_,
{func}~vortex.fixed_size_list, {func}~vortex.date,
{func}~vortex.time, {func}~vortex.timestamp.
>>> arr = vx.array([10, 20, 30, 40, 50])
>>> arr.scalar_at(0).as_py()
10
>>> arr.to_arrow_array().to_pylist()
[10, 20, 30, 40, 50]
>>> arr.slice(1, 3).to_arrow_array().to_pylist()
[20, 30]
>>> indices = vx.array([0, 2, 4])
>>> arr.take(indices).to_arrow_array().to_pylist()
[10, 30, 50]
>>> mask = vx.array([True, False, True, False, True])
>>> arr.filter(mask).to_arrow_array().to_pylist()
[10, 30, 50]
>>> other = vx.array([10, 25, 25, 45, 50])
>>> (arr > other).to_arrow_array().to_pylist()
[False, False, True, False, False]
The vortex.expr module provides expressions for filtering and projecting. These
are primarily used with {meth}.VortexFile.scan and {meth}.VortexFile.to_arrow but can also be
applied directly:
>>> import vortex.expr as ve
>>> arr = vx.array([
... {'name': 'Alice', 'age': 30},
... {'name': 'Bob', 'age': 25},
... {'name': 'Carol', 'age': 35},
... ])
>>> expr = ve.column('age') > 28
>>> arr.apply(expr).to_arrow_array().to_pylist()
[True, False, True]
{func}~vortex.open lazily opens a Vortex file for reading:
>>> import pyarrow.parquet as pq
>>> vx.io.write(pq.read_table("_static/example.parquet"), 'example.vortex')
>>>
>>> f = vx.open('example.vortex')
>>> len(f)
1000
Use {meth}.VortexFile.scan to read data with optional projection, filtering, and limit:
>>> result = f.scan(['tip_amount'], limit=3).read_all()
>>> result.to_arrow_array()
<pyarrow.lib.StructArray object at ...>
-- is_valid: all not null
-- child 0 type: double
[
0,
5.1,
16.54
]
{class}.ArrayIterator streams batches of arrays from a scan or other source. It supports
iteration, collecting into a single array, and conversion to Arrow.
{meth}.ArrayIterator.read_all collects all batches into a single in-memory {class}.Array:
>>> arr = f.scan(['tip_amount'], limit=5).read_all()
>>> len(arr)
5
{meth}.ArrayIterator.to_arrow converts to a {class}pyarrow.RecordBatchReader for use with
Arrow-based tools:
>>> reader = f.scan(['tip_amount']).to_arrow()
>>> reader.schema
tip_amount: double
>>> table = reader.read_all()
>>> len(table)
1000
Arrays convert to other formats:
| Method | Result |
|---|---|
{meth}.Array.to_arrow_array |
{class}pyarrow.Array |
{meth}.Array.to_arrow_table |
{class}pyarrow.Table |
{meth}.Array.to_numpy |
{class}numpy.ndarray |
{meth}.Array.to_pandas |
{class}pandas.DataFrame |
{meth}.Array.to_pylist |
{class}list |