8000 Expose named_struct in python · Issue #692 · apache/datafusion-python · GitHub
[go: up one dir, main page]

Skip to content
Expose named_struct in python #692
Closed
@timsaucer

Description

@timsaucer

Is your feature request related to a problem or challenge? Please describe what you are trying to do.
Currently we can only create a struct of expressions using datafusion.functions.struct which assigns fixed field names of c0, c1, and so on. This is difficult to work with. In the rust implementation there is a named_struct function which would serve the purpose.

Describe the solution you'd like
In an ideal world, the name of the field in a struct would come from the name of the expression. It would be great to do something like

df = df.with_column("d", F.struct(col("a"), col("b"), col("c")))

And then the struct would contain field names a, b, and c.

From a brief look at the code this may not be simple to implement. If that is not feasible, I would at least like to expose the named_struct function in the python code.

Describe alternatives you've considered
No additional alternatives I have considered beyond the two described above.

Additional context
Minimal example showing current state:

from datafusion import SessionContext, col, functions as F
import pyarrow as pa

ctx = SessionContext()

batch = pa.RecordBatch.from_arrays(
    [pa.array([1, 2, 3]), pa.array([4, 5, 6]), pa.array([7, 8, 9])],
    names=["a", "b", "c"],
)

df = ctx.create_dataframe([[batch]])

df = df.with_column("d", F.struct(col("a"), col("b"), col("c")))

df.show()

Creates

DataFrame()
+---+---+---+-----------------------+
| a | b | c | d                     |
+---+---+---+-----------------------+
| 1 | 4 | 7 | {c0: 1, c1: 4, c2: 7} |
| 2 | 5 | 8 | {c0: 2, c1: 5, c2: 8} |
| 3 | 6 | 9 | {c0: 3, c1: 6, c2: 9} |
+---+---+---+-----------------------+

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      0