[SPARK-55228][SPARK-55230][SQL][CONNECT] Implement Dataset.zipWithIndex in Scala API #54014

fangchenli · 2026-01-27T22:09:19Z

What changes were proposed in this pull request?

Implement Dataset.zipWithIndex in Scala API

Why are the changes needed?

Align Dataset and RDD api

Does this PR introduce any user-facing change?

Yes, this PR added Dataset.zipWithIndex() and Dataset.zipWithIndex(indexColName: String)

How was this patch tested?

Unittests added

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Opus 4.5

github-actions · 2026-01-27T22:09:30Z

JIRA Issue Information

=== Sub-task SPARK-55228 ===
Summary: Implement Dataset.zipWithIndex in Scala API
Assignee: None
Status: Open
Affected: ["4.2.0"]

=== Sub-task SPARK-55230 ===
Summary: Implement Dataset.zipWIthIndex on Spark Connect Scala Client
Assignee: None
Status: Open
Affected: ["4.2"]

This comment was automatically generated by GitHub Actions

ueshin

Otherwise, LGTM.
cc @zhengruifeng

sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala

Co-authored-by: Takuya UESHIN <ueshin@happy-camper.st>

zhengruifeng · 2026-01-28T01:52:57Z

Add SPARK-55230 since it also supports scala client

zhengruifeng · 2026-01-28T01:55:08Z

sql/connect/common/src/main/scala/org/apache/spark/sql/connect/Dataset.scala

  }

+  /** @inheritdoc */
+  def zipWithIndex(indexColName: String): DataFrame = {


since scala client is supported in this PR, we may want to add some tests in

src/test/scala/org/apache/spark/sql/PlanGenerationTestSuite.scala
org/apache/spark/sql/connect/ClientE2ETestSuite.scala

but the tests can be added in a followup PR

zhengruifeng

Thanks for working on this

zhengruifeng · 2026-01-28T02:59:02Z

@fangchenli linter and doc build failed, ptal

…index

[SPARK-55228][SQL] Implement Dataset.zipWithIndex in Scala API

d270fb1

github-actions bot added SQL CONNECT labels Jan 27, 2026

ueshin approved these changes Jan 27, 2026

View reviewed changes

sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala Outdated Show resolved Hide resolved

sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala Outdated Show resolved Hide resolved

Update sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala

cb71b7a

Co-authored-by: Takuya UESHIN <ueshin@happy-camper.st>

fangchenli marked this pull request as ready for review January 27, 2026 23:12

fangchenli added 2 commits January 27, 2026 15:15

move tests to correct location

0db3e91

fix deprecated method

7ce850e

zhengruifeng changed the title ~~[SPARK-55228][SQL] Implement Dataset.zipWithIndex in Scala API~~ [SPARK-55228][SPARK-55230][SQL] Implement Dataset.zipWithIndex in Scala API Jan 28, 2026

zhengruifeng reviewed Jan 28, 2026

View reviewed changes

zhengruifeng approved these changes Jan 28, 2026

View reviewed changes

HyukjinKwon approved these changes Jan 28, 2026

View reviewed changes

fangchenli added 2 commits January 27, 2026 21:31

Merge remote-tracking branch 'upstream/master' into dataset-zip-with-…

8a942b8

…index

exclude new method from Mima

f44459c

zhengruifeng changed the title ~~[SPARK-55228][SPARK-55230][SQL] Implement Dataset.zipWithIndex in Scala API~~ [SPARK-55228][SPARK-55230][SQL][CONNECT] Implement Dataset.zipWithIndex in Scala API Jan 28, 2026

github-actions bot added the BUILD label Jan 28, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-55228][SPARK-55230][SQL][CONNECT] Implement Dataset.zipWithIndex in Scala API #54014

[SPARK-55228][SPARK-55230][SQL][CONNECT] Implement Dataset.zipWithIndex in Scala API #54014

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[SPARK-55228][SPARK-55230][SQL][CONNECT] Implement Dataset.zipWithIndex in Scala API #54014

Are you sure you want to change the base?

[SPARK-55228][SPARK-55230][SQL][CONNECT] Implement Dataset.zipWithIndex in Scala API #54014

Conversation

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

Uh oh!

JIRA Issue Information

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants