8000 [SPARK-54179][SQL][FOLLOW-UP] Add Dataframe API support for Tuple sketches by cboumalh · Pull Request #54041 · apache/spark · GitHub
[go: up one dir, main page]

Skip to content

Conversation

@cboumalh
Copy link
Contributor

What changes were proposed in this pull request?

This PR adds DataFrame API support for Tuple sketch functions that were previously only available in Spark SQL. The following functions are now accessible through the DataFrame API:

  • tuple_sketch_agg_* - Aggregation functions for creating tuple sketches
  • tuple_union_agg_* - Union aggregation functions
  • tuple_intersection_agg_* - Intersection aggregation functions
  • tuple_sketch_estimate_* - Estimation functions for tuple sketches
  • tuple_sketch_summary_* - Summary functions for tuple sketches
  • tuple_sketch_theta_* - Theta extraction functions
  • tuple_union_* - Union operations
  • tuple_intersection_* - Intersection operations
  • tuple_difference_* - Difference operations

This is a follow-up to SPARK-54179 that completes the DataFrame API parity for tuple sketch operations.

Why are the changes needed?

This PR expands the sketch feature's adoption in DataFrame cases.

Does this PR introduce any user-facing change?

yes, it provides the DataFrame API for Tuple sketches.

  * tuple_sketch_agg_*
  * tuple_union_agg_*
  * tuple_intersection_agg_*
  * tuple_sketch_estimate_*
  * tuple_sketch_summary_*
  * tuple_sketch_theta_*
  * tuple_union_*
  * tuple_intersection_*
  * tuple_difference_*

How was this patch tested?

Added tests to DataFrameAggregateSuite.scala

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Sonnet 4.5

@github-actions
Copy link

JIRA Issue Information

=== New Feature SPARK-54179 ===
Summary: Add Native Support for Apache Tuple Sketches
Assignee: Christopher Boumalhab
Status: Resolved
Affected: ["4.2.0"]


This comment was automatically generated by GitHub Actions

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant

0