polars.datatypes.Categories#

class polars.datatypes.Categories(
name: str | None = None,
namespace: str = '',
physical: PolarsDataType = UInt32,
)[source]#

A named collection of categories for Categorical.

Two categories are considered equal (and will use the same physical mapping of categories to strings) if they have the same name, namespace and physical backing type, even if they are created in separate calls to Categories.

Warning

This functionality is currently considered unstable. It may be changed at any point without it being considered a breaking change.

Parameters:
name

The name of this Categories. If set to None or an empty string, this refers to the global categories.

namespace

An optional namespace for this Categories. Defaults to the empty string. If the name is empty or None indicating the global categories, the namespace must also be empty.

physical{UInt8, UInt16, UInt32}

The physical type used to represent the categories. Defaults to UInt32.

See also

Categorical

Examples

A Categories instance can be indexed using either string or integer keys:

>>> fruit = pl.Categories("fruit")
>>> s = pl.Series(["apple", "banana", "orange"], dtype=pl.Categorical(fruit))
>>> fruit[0]
'apple'
>>> fruit["apple"]
0

All Categories objects with the same name, namespace and physical type share the same mapping, even if they’re created separately:

>>> fruit2 = pl.Categories("fruit")
>>> fruit2["banana"]
1

Note that the Categories instance is only a weak reference to the actual mapping stored in Polars; if no actual data exists using this mapping (like a Series or DataFrame), the mapping is cleaned up by Polars:

>>> del s
>>> fruit["apple"] is None
True

If you wish to keep a persistent mapping, simply keep alive some object which uses the mapping, e.g. keepalive = pl.Series([], dtype=pl.Categorical(fruit)).

__init__(
name: str | None = None,
namespace: str = '',
physical: PolarsDataType = UInt32,
) None[source]#

Methods

__init__([name, namespace, physical])

is_global()

Returns whether this refers to the global categories.

name()

The name of this Categories.

namespace()

The namespace of this Categories.

physical()

The physical type used to represent the categories.

random([namespace, physical])

Creates a new Categories with a random name.