Description
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
Currently the DataFrame.collect()
returns a list
of all of the buffered RecordBatches
. This is often not desirable as a user may, for example, want to write the result out to disk as it is materialized to save memory.
Describe the solution you'd like
It would be great to have a to_arrow_batches()
method which returned a RecordBatchReader
which deferred the execution of the batches until they are requested from the RecordBatchReader
.
Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.
Additional context
Add any other context or screenshots about the feature request here.