8000 HADOOP-19464. S3A: Restore Compatibility with EMRFS FileSystem by shameersss1 · Pull Request #7410 · apache/hadoop · GitHub
[go: up one dir, main page]

Skip to content

HADOOP-19464. S3A: Restore Compatibility with EMRFS FileSystem #7410

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Feb 25, 2025

Conversation

shameersss1
Copy link
Contributor

Description of PR

After HADOOP-19278 , The S3N folder marker _$folder$ is not skipped during listing of S3 directories leading to S3A filesystem not able to read data written by legacy Hadoop S3N filesystem and AWS EMR's EMRFS (S3 filesystem) leading to compatibility issues and possible migration risks to S3A filesystem.

How was this patch tested?

Added integration test ITestEMRFSCompatibility and ran other UT/IT in us-east-1 region

For code changes:

  • Does the title or this PR starts with the corresponding JIRA issue id (e.g. 'HADOOP-17799. Your PR title ...')?
  • Object storage: have the integration tests been executed and the endpoint declared according to the connector-specific documentation?
  • If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under ASF 2.0?
  • If applicable, have you updated the LICENSE, LICENSE-binary, NOTICE-binary files?

Copy link
Contributor
@steveloughran steveloughran left a comment

Choose a reason for hiding this comment

8000 The reason will be displayed to describe this comment to others. Learn more.

suggested some new tests, to verify semantics across more operations

* This test verifies that the EMRFS or legacy S3N filesystem compatibility with
* S3A works as expected.
*/
public class ITestEMRFSCompatibility extends AbstractS3ATestBase {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are there other tests here? I think I'd like

  • list parent path reports an empty dir
  • delete parent dir results in a getFileStatus(parent) => 404, and same for marker.

What does dir rename do? I know for normal / markers we only create a dir marker if there's nothing underneath, but that's just an optimisation. Here we'd want:

touch parent/src/subdir/$folder$
mv parent/src parent/dest
isDir(parent/dest/subdir)
isNotFound(parent/src)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@steveloughran - It is a good call out.

list parent path reports an empty dir

This is already covered

delete parent dir results in a getFileStatus(parent) => 404, and same for marker.

This is not happening right now (even with hadoop-3.4.1)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added additional coverage

@hadoop-yetus
Copy link

🎊 +1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 31s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 1s codespell was not available.
+0 🆗 detsecrets 0m 1s detect-secrets was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 1 new or modified test files.
_ trunk Compile Tests _
+1 💚 mvninstall 34m 46s trunk passed
+1 💚 compile 0m 43s trunk passed with JDK Ubuntu-11.0.26+4-post-Ubuntu-1ubuntu120.04
+1 💚 compile 0m 35s trunk passed with JDK Private Build-1.8.0_442-8u442-b06us1-0ubuntu120.04-b06
+1 💚 checkstyle 0m 32s trunk passed
+1 💚 mvnsite 0m 42s trunk passed
+1 💚 javadoc 0m 43s trunk passed with JDK Ubuntu-11.0.26+4-post-Ubuntu-1ubuntu120.04
+1 💚 javadoc 0m 34s trunk passed with JDK Private Build-1.8.0_442-8u442-b06us1-0ubuntu120.04-b06
+1 💚 spotbugs 1m 9s trunk passed
+1 💚 shadedclient 33m 42s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+1 💚 mvninstall 0m 30s the patch passed
+1 💚 compile 0m 34s the patch passed with JDK Ubuntu-11.0.26+4-post-Ubuntu-1ubuntu120.04
+1 💚 javac 0m 34s the patch passed
+1 💚 compile 0m 27s the patch passed with JDK Private Build-1.8.0_442-8u442-b06us1-0ubuntu120.04-b06
+1 💚 javac 0m 27s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
+1 💚 checkstyle 0m 21s the patch passed
+1 💚 mvnsite 0m 32s the patch passed
+1 💚 javadoc 0m 29s the patch passed with JDK Ubuntu-11.0.26+4-post-Ubuntu-1ubuntu120.04
+1 💚 javadoc 0m 26s the patch passed with JDK Private Build-1.8.0_442-8u442-b06us1-0ubuntu120.04-b06
+1 💚 spotbugs 1m 7s the patch passed
+1 💚 shadedclient 33m 53s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 3m 17s hadoop-aws in the patch passed.
+1 💚 asflicense 0m 38s The patch does not generate ASF License warnings.
117m 22s
Subsystem Report/Notes
Docker ClientAPI=1.48 ServerAPI=1.48 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7410/1/artifact/out/Dockerfile
GITHUB PR #7410
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets
uname Linux d28f75e7ac98 5.15.0-131-generic #141-Ubuntu SMP Fri Jan 10 21:18:28 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / be20556
Default Java Private Build-1.8.0_442-8u442-b06us1-0ubuntu120.04-b06
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.26+4-post-Ubuntu-1ubuntu120.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_442-8u442-b06us1-0ubuntu120.04-b06
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7410/1/testReport/
Max. process+thread count 547 (vs. ulimit of 5500)
modules C: hadoop-tools/hadoop-aws U: hadoop-tools/hadoop-aws
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7410/1/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

@shameersss1
Copy link
Contributor Author

@steveloughran - Thanks a lot for the review.
I have added more tests covering list, rename(copy + delete)

@shameersss1 shameersss1 changed the title HADOOP-19464: Skip S3N Folder Marker During File Listing HADOOP-19464: Make S3A FileSystem Compatible With Legacy S3N & EMRFS FileSystem Feb 21, 2025
@hadoop-yetus
Copy link

🎊 +1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 54s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 1s codespell was not available.
+0 🆗 detsecrets 0m 1s detect-secrets was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 1 new or modified test files.
_ trunk Compile Tests _
+1 💚 mvninstall 42m 36s trunk passed
+1 💚 compile 0m 44s trunk passed with JDK Ubuntu-11.0.26+4-post-Ubuntu-1ubuntu120.04
+1 💚 compile 0m 33s trunk passed with JDK Private Build-1.8.0_442-8u442-b06us1-0ubuntu120.04-b06
+1 💚 checkstyle 0m 29s trunk passed
+1 💚 mvnsite 0m 40s trunk passed
+1 💚 javadoc 0m 40s trunk passed with JDK Ubuntu-11.0.26+4-post-Ubuntu-1ubuntu120.04
+1 💚 javadoc 0m 31s trunk passed with JDK Private Build-1.8.0_442-8u442-b06us1-0ubuntu120.04-b06
+1 💚 spotbugs 1m 9s trunk passed
+1 💚 shadedclient 38m 49s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+1 💚 mvninstall 0m 33s the patch passed
+1 💚 compile 0m 44s the patch passed with JDK Ubuntu-11.0.26+4-post-Ubuntu-1ubuntu120.04
+1 💚 javac 0m 44s the patch passed
+1 💚 compile 0m 29s the patch passed with JDK Private Build-1.8.0_442-8u442-b06us1-0ubuntu120.04-b06
+1 💚 javac 0m 29s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
+1 💚 checkstyle 0m 22s the patch passed
+1 💚 mvnsite 0m 36s the patch passed
+1 💚 javadoc 0m 29s the patch passed with JDK Ubuntu-11.0.26+4-post-Ubuntu-1ubuntu120.04
+1 💚 javadoc 0m 28s the patch passed with JDK Private Build-1.8.0_442-8u442-b06us1-0ubuntu120.04-b06
+1 💚 spotbugs 1m 16s the patch passed
+1 💚 shadedclient 42m 15s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 3m 40s hadoop-aws in the patch passed.
+1 💚 asflicense 0m 41s The patch does not generate ASF License warnings.
139m 43s
Subsystem Report/Notes
Docker ClientAPI=1.48 ServerAPI=1.48 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7410/2/artifact/out/Dockerfile
GITHUB PR #7410
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets
uname Linux 971f923d295a 5.15.0-131-generic #141-Ubuntu SMP Fri Jan 10 21:18:28 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / b26863b
Default Java Private Build-1.8.0_442-8u442-b06us1-0ubuntu120.04-b06
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.26+4-post-Ubuntu-1ubuntu120.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_442-8u442-b06us1-0ubuntu120.04-b06
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7410/2/testReport/
Max. process+thread count 525 (vs. ulimit of 5500)
modules C: hadoop-tools/hadoop-aws U: hadoop-tools/hadoop-aws
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7410/2/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

@shameersss1
Copy link
Contributor Author

@steveloughran Gentle reminder for the review
Thanks

@steveloughran
Copy link
Contributor

+1 pending you saying what your test run parameters were for the full test suite, +what kind of bucket. Really someone needs to make an s3 express bucket their default bucket, shouldn't they...

@shameersss1
Copy link
Contributor Author

Thanks a lot @steveloughran
I ran the Integration test with command mvn -Dparallel-tests -DtestsThreadCount=8 clean verify for a standard bucket in us-east-1 and the tests are passing.

I will setup a S3 express bucket for future testing. Please do let me know if you need any testing for that bucket as well.

@steveloughran steveloughran merged commit bb07ff8 into apache:trunk Feb 25, 2025
4 checks passed
@steveloughran steveloughran changed the title HADOOP-19464: Make S3A FileSystem Compatible With Legacy S3N & EMRFS FileSystem HADOOP-19464. S3A: Restore Compatibility with EMRFS FileSystem Feb 25, 2025
adideshpande pushed a commit to adideshpande/hadoop that referenced this pull request Feb 27, 2025
…e#7410)

After HADOOP-19278, The S3N folder marker _$folder$ is not skipped during
listing of S3 directories. This can lead to the S3A filesystem failing to read
data written by the legacy Hadoop S3N filesystem and AWS EMR's EMRFS
("S3" filesystem)

Contributed by Syed Shameerur Rahman
YanivKunda pushed a commit to YanivKunda/hadoop that referenced this pull request Mar 23, 2025
…e#7410)

After HADOOP-19278, The S3N folder marker _$folder$ is not skipped during
listing of S3 directories. This can lead to the S3A filesystem failing to read
data written by the legacy Hadoop S3N filesystem and AWS EMR's EMRFS
("S3" filesystem)

Contributed by Syed Shameerur Rahman
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants
0