8000 HADOOP-19554. LocalDirAllocator still doesn't always recover from directory deletion by steveloughran · Pull Request #7651 · apache/hadoop · GitHub
[go: up one dir, main page]

Skip to content

HADOOP-19554. LocalDirAllocator still doesn't always recover from directory deletion #7651

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

steveloughran
Copy link
Contributor

HADOOP-19554. LocalDirAllocator still doesn't always recover from directory deletion

  • Pull up recreation logic for all branch paths
  • Lots of logging info
  • Exception test includes the list of paths.
  • tests for the failure recovery and reporting changes

How was this patch tested?

New and modified unit test cases.

For code changes:

  • Does the title or this PR starts with the corresponding JIRA issue id (e.g. 'HADOOP-17799. Your PR title ...')?
  • Object storage: have the integration tests been executed and the endpoint declared according to the connector-specific documentation?
  • If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under ASF 2.0?
  • If applicable, have you updated the LICENSE, LICENSE-binary, NOTICE-binary files?

…ectory deletion

* Pull up recreation logic for all branch paths
* Lots of logging info
* Exception test includes the list of paths.
* tests for the failure recovery and reporting changes
@hadoop-yetus
Copy link

💔 -1 overall

8000
Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 20s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+0 🆗 detsecrets 0m 0s detect-secrets was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 1 new or modified test files.
_ trunk Compile Tests _
+1 💚 mvninstall 23m 10s trunk passed
+1 💚 compile 9m 16s trunk passed with JDK Ubuntu-11.0.26+4-post-Ubuntu-1ubuntu120.04
+1 💚 compile 8m 29s trunk passed with JDK Private Build-1.8.0_442-8u442-b06us1-0ubuntu120.04-b06
+1 💚 checkstyle 0m 40s trunk passed
+1 💚 mvnsite 1m 0s trunk passed
+1 💚javadoc 0m 49s trunk passed with JDK Ubuntu-11.0.26+4-post-Ubuntu-1ubuntu120.04
+1 💚 javadoc 0m 29s trunk passed with JDK Private Build-1.8.0_442-8u442-b06us1-0ubuntu120.04-b06
+1 💚 spotbugs 1m 35s trunk passed
-1 ❌ shadedclient 26m 29s branch has errors when building and testing our client artifacts.
_ Patch Compile Tests _
-1 ❌ mvninstall 0m 6s /patch-mvninstall-hadoop-common-project_hadoop-common.txt hadoop-common in the patch failed.
-1 ❌ compile 0m 20s /patch-compile-root-jdkUbuntu-11.0.26+4-post-Ubuntu-1ubuntu120.04.txt root in the patch failed with JDK Ubuntu-11.0.26+4-post-Ubuntu-1ubuntu120.04.
-1 ❌ javac 0m 20s /patch-compile-root-jdkUbuntu-11.0.26+4-post-Ubuntu-1ubuntu120.04.txt root in the patch failed with JDK Ubuntu-11.0.26+4-post-Ubuntu-1ubuntu120.04.
-1 ❌ compile 0m 18s /patch-compile-root-jdkPrivateBuild-1.8.0_442-8u442-b06us1-0ubuntu120.04-b06.txt root in the patch failed with JDK Private Build-1.8.0_442-8u442-b06us1-0ubuntu120.04-b06.
-1 ❌ javac 0m 18s /patch-compile-root-jdkPrivateBuild-1.8.0_442-8u442-b06us1-0ubuntu120.04-b06.txt root in the patch failed with JDK Private Build-1.8.0_442-8u442-b06us1-0ubuntu120.04-b06.
+1 💚 blanks 0m 0s The patch has no blanks issues.
-0 ⚠️ checkstyle 0m 26s /results-checkstyle-hadoop-common-project_hadoop-common.txt hadoop-common-project/hadoop-common: The patch generated 1 new + 23 unchanged - 0 fixed = 24 total (was 23)
-1 ❌ mvnsite 0m 33s /patch-mvnsite-hadoop-common-project_hadoop-common.txt hadoop-common in the patch failed.
-1 ❌ javadoc 0m 14s /patch-javadoc-hadoop-common-project_hadoop-common-jdkUbuntu-11.0.26+4-post-Ubuntu-1ubuntu120.04.txt hadoop-common in the patch failed with JDK Ubuntu-11.0.26+4-post-Ubuntu-1ubuntu120.04.
+1 💚 javadoc 0m 17s the patch passed with JDK Private Build-1.8.0_442-8u442-b06us1-0ubuntu120.04-b06
-1 ❌ spotbugs 1m 18s /new-spotbugs-hadoop-common-project_hadoop-common.html hadoop-common-project/hadoop-common generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0)
-1 ❌ shadedclient 26m 32s patch has errors when building and testing our client artifacts.
_ Other Tests _
-1 ❌ unit 0m 9s /patch-unit-hadoop-common-project_hadoop-common.txt hadoop-common in the patch failed.
+0 🆗 asflicense 0m 15s ASF License check generated no output?
102m 36s
Reason Tests
SpotBugs module:hadoop-common-project/hadoop-common
Exceptional return value of java.io.File.mkdirs() ignored in org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(String, long, Configuration, boolean) At LocalDirAllocator.java:ignored in org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(String, long, Configuration, boolean) At LocalDirAllocator.java:[line 433]
Subsystem Report/Notes
Docker ClientAPI=1.49 ServerAPI=1.49 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7651/1/artifact/out/Dockerfile
GITHUB PR #7651
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets
uname Linux e68470bb8f81 5.15.0-136-generic #147-Ubuntu SMP Sat Mar 15 15:53:30 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / da6917e
Default Java Private Build-1.8.0_442-8u442-b06us1-0ubuntu120.04-b06
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.26+4-post-Ubuntu-1ubuntu120.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_442-8u442-b06us1-0ubuntu120.04-b06
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7651/1/testReport/
Max. process+thread count 550 (vs. ulimit of 5500)
modules C: hadoop-common-project/hadoop-common U: hadoop-common-project/hadoop-common
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7651/1/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

@steveloughran steveloughran changed the title HADOOP-19554. LocalDirAllocator still doesn't always recover from directory deletio HADOOP-19554. LocalDirAllocator still doesn't always recover from directory deletion Apr 28, 2025
Copy link
Contributor
@ahmarsuhail ahmarsuhail left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM.

8000
@@ -393,6 +396,8 @@ int getCurrentDirectoryIndex() {
*/
public Path getLocalPathForWrite(String pathStr, long size,
Configuration conf, boolean checkWrite) throws IOException {
LOG.debug("searchng for directory for file at {}, size = {}; checkWrite={}",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor typo: "searching".

@steveloughran
Copy link
Contributor Author

I've decided not to target 3.4.2 with this

  • HDFS uses this, so needs care and attention
  • for s3a, we can fall back to using memory buffering and a shorter queue of pending blocks/stream

…ectory deletion

Logging now builds up the history string and debug logging, with
the history logged at ERROR when there's a failure to allocate.

goal: find out WTF went wrong even when debug logging was off
@hadoop-yetus
Copy link

🎊 +1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 8m 22s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 1s codespell was not available.
+0 🆗 detsecrets 0m 1s detect-secrets was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 1 new or modified test files.
_ trunk Compile Tests _
+1 💚 mvninstall 23m 44s trunk passed
+1 💚 compile 8m 25s trunk passed with JDK Ubuntu-11.0.27+6-post-Ubuntu-0ubuntu120.04
+1 💚 compile 7m 19s trunk passed with JDK Private Build-1.8.0_452-8u452-gaus1-0ubuntu120.04-b09
+1 💚 checkstyle 0m 42s trunk passed
+1 💚 mvnsite 0m 55s trunk passed
+1 💚 javadoc 0m 41s trunk passed with JDK Ubuntu-11.0.27+6-post-Ubuntu-0ubuntu120.04
+1 💚 javadoc 0m 32s trunk passed with JDK Private Build-1.8.0_452-8u452-gaus1-0ubuntu120.04-b09
+1 💚 spotbugs 1m 29s trunk passed
+1 💚 shadedclient 21m 40s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+1 💚 mvninstall 0m 31s the patch passed
+1 💚 compile 8m 2s the patch passed with JDK Ubuntu-11.0.27+6-post-Ubuntu-0ubuntu120.04
+1 💚 javac 8m 2s the patch passed
+1 💚 compile 7m 28s the patch passed with JDK Private Build-1.8.0_452-8u452-gaus1-0ubuntu120.04-b09
+1 💚 javac 7m 28s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
-0 ⚠️ checkstyle 0m 40s /results-checkstyle-hadoop-common-project_hadoop-common.txt hadoop-common-project/hadoop-common: The patch generated 1 new + 23 unchanged - 0 fixed = 24 total (was 23)
+1 💚 mvnsite 0m 53s the patch passed
+1 💚 javadoc 0m 44s the patch passed with JDK Ubuntu-11.0.27+6-post-Ubuntu-0ubuntu120.04
+1 💚 javadoc 0m 36s the patch passed with JDK Private Build-1.8.0_452-8u452-gaus1-0ubuntu120.04-b09
+1 💚 spotbugs 1m 29s the patch passed
+1 💚 shadedclient 20m 57s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 13m 11s hadoop-common in the patch passed.
+1 💚 asflicense 0m 42s The patch does not generate ASF License warnings.
130m 13s
Subsystem Report/Notes
Docker ClientAPI=1.49 ServerAPI=1.49 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7651/2/artifact/out/Dockerfile
GITHUB PR #7651
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets
uname Linux 0d1d951ea4d5 5.15.0-136-generic #147-Ubuntu SMP Sat Mar 15 15:53:30 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / 3334658
Default Java Private Build-1.8.0_452-8u452-gaus1-0ubuntu120.04-b09
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.27+6-post-Ubuntu-0ubuntu120.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_452-8u452-gaus1-0ubuntu120.04-b09
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7651/2/testReport/
Max. process+thread count 3149 (vs. ulimit of 5500)
modules C: hadoop-common-project/hadoop-common U: hadoop-common-project/hadoop-common
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7651/2/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

Copy link
Contributor
@cnauroth cnauroth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1. Thanks @steveloughran !

@steveloughran steveloughran merged commit 4769feb into apache:trunk May 12, 2025
4 checks passed
asf-gitbox-commits pushed a commit that referenced this pull request May 12, 2025
… from directory deletion (#7651)"

not the final commit

This reverts commit 4769feb.
@steveloughran
Copy link
Contributor Author

Aah, I left out the last commit. reverted that and will create a new PR with the final changes, which are

  • logging at error before throwing
  • not throwing the full history (too much, too confusing)
  • but reporting the local hostname. so if there is a failure tricking up the logs, it's clearer where to look

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants
0