E52B [SPARK-55228][SPARK-55230][SQL][CONNECT] Implement Dataset.zipWithIndex in Scala API by fangchenli · Pull Request #54014 · apache/spark · GitHub
[go: up one dir, main page]

Skip to content

Conversation

@fangchenli
Copy link
Contributor

What changes were proposed in this pull request?

Implement Dataset.zipWithIndex in Scala API

Why are the changes needed?

Align Dataset and RDD api

Does this PR introduce any user-facing change?

Yes, this PR added Dataset.zipWithIndex() and Dataset.zipWithIndex(indexColName: String)

How was this patch tested?

Unittests added

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Opus 4.5

@github-actions
Copy link
github-actions bot commented Jan 27, 2026

JIRA Issue Information

=== Sub-task SPARK-55228 ===
Summary: Implement Dataset.zipWithIndex in Scala API
Assignee: None
Status: Open
Affected: ["4.2.0"]

=== Sub-task SPARK-55230 ===
Summary: Implement Dataset.zipWIthIndex on Spark Connect Scala Client
Assignee: None
Status: Open
Affected: ["4.2"]


This comment was automatically generated by GitHub Actions

Copy link
Member
@ueshin ueshin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Otherwise, LGTM.
cc @zhengruifeng

Co-authored-by: Takuya UESHIN <ueshin@happy-camper.st>
@fangchenli fangchenli marked this pull request as ready for review January 27, 2026 23:12
@zhengruifeng zhengruifeng changed the title [SPARK-55228][SQL] Implement Dataset.zipWithIndex in Scala API [SPARK-55228][SPARK-55230][SQL] Implement Dataset.zipWithIndex in Scala API Jan 28, 2026
@zhengruifeng
Copy link
Contributor

Add SPARK-55230 since it also supports scala client

}

/** @inheritdoc */
def zipWithIndex(indexColName: String): DataFrame = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since scala client is supported in this PR, we may want to add some tests in

src/test/scala/org/apache/spark/sql/PlanGenerationTestSuite.scala
org/apache/spark/sql/connect/ClientE2ETestSuite.scala

but the tests can be added in a followup PR

Copy link
Contributor
@zhengruifeng zhengruifeng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this

@zhengruifeng
Copy link
Contributor

@fangchenli linter and doc build failed, ptal

@zhengruifeng zhengruifeng changed the title [SPARK-55228][SPARK-55230][SQL] Implement Dataset.zipWithIndex in Scala API [SPARK-55228][SPARK-55230][SQL][CONNECT] Implement Dataset.zipWithIndex in Scala API Jan 28, 2026
@github-actions github-actions bot added the BUILD label Jan 28, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants

0