[SPARK-54179][SQL][FOLLOW-UP] Add Dataframe API support for Tuple sketches #54041
+3,679
−7
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What changes were proposed in this pull request?
This PR adds DataFrame API support for Tuple sketch functions that were previously only available in Spark SQL. The following functions are now accessible through the DataFrame API:
tuple_sketch_agg_*- Aggregation functions for creating tuple sketchestuple_union_agg_*- Union aggregation functionstuple_intersection_agg_*- Intersection aggregation functionstuple_sketch_estimate_*- Estimation functions for tuple sketchestuple_sketch_summary_*- Summary functions for tuple sketchestuple_sketch_theta_*- Theta extraction functionstuple_union_*- Union operationstuple_intersection_*- Intersection operationstuple_difference_*- Difference operationsThis is a follow-up to SPARK-54179 that completes the DataFrame API parity for tuple sketch operations.
Why are the changes needed?
This PR expands the sketch feature's adoption in DataFrame cases.
Does this PR introduce any user-facing change?
yes, it provides the DataFrame API for Tuple sketches.
How was this patch tested?
Added tests to DataFrameAggregateSuite.scala
Was this patch authored or co-authored using generative AI tooling?
Generated-by: Claude Sonnet 4.5