Description
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
Currently we can only create a struct of expressions using datafusion.functions.struct
which assigns fixed field names of c0, c1, and so on. This is difficult to work with. In the rust implementation there is a named_struct
function which would serve the purpose.
Describe the solution you'd like
In an ideal world, the name of the field in a struct would come from the name of the expression. It would be great to do something like
df = df.with_column("d", F.struct(col("a"), col("b"), col("c")))
And then the struct would contain field names a
, b
, and c
.
From a brief look at the code this may not be simple to implement. If that is not feasible, I would at least like to expose the named_struct
function in the python code.
Describe alternatives you've considered
No additional alternatives I have considered beyond the two described above.
Additional context
Minimal example showing current state:
from datafusion import SessionContext, col, functions as F
import pyarrow as pa
ctx = SessionContext()
batch = pa.RecordBatch.from_arrays(
[pa.array([1, 2, 3]), pa.array([4, 5, 6]), pa.array([7, 8, 9])],
names=["a", "b", "c"],
)
df = ctx.create_dataframe([[batch]])
df = df.with_column("d", F.struct(col("a"), col("b"), col("c")))
df.show()
Creates
DataFrame()
+---+---+---+-----------------------+
| a | b | c | d |
+---+---+---+-----------------------+
| 1 | 4 | 7 | {c0: 1, c1: 4, c2: 7} |
| 2 | 5 | 8 | {c0: 2, c1: 5, c2: 8} |
| 3 | 6 | 9 | {c0: 3, c1: 6, c2: 9} |
+---+---+---+-----------------------+