8000 HADOOP-19348. Integrate analytics accelerator into S3A. by ahmarsuhail · Pull Request #7334 · apache/hadoop · GitHub
[go: up one dir, main page]

Skip to content

HADOOP-19348. Integrate analytics accelerator into S3A. #7334

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

ahmarsuhail
Copy link
Contributor
@ahmarsuhail ahmarsuhail commented Jan 28, 2025

Description of PR

Initial integration of analytics accelerator.

How was this patch tested?

Tested in us-east-2, with:

    <property>
        <name>fs.s3a.scale.test.enabled</name>
        <value>true</value>
    </property>


    <property>
        <name>fs.s3a.input.stream.type</name>
        <value>analytics</value>
  </property>

A failure in unit test, TestS3AUnbuffer which needs to be skipped. No other failures.

For code changes:

  • Does the title or this PR starts with the corresponding JIRA issue id (e.g. 'HADOOP-17799. Your PR title ...')?
  • Object storage: have the integration tests been executed and the endpoint declared according to the connector-specific documentation?
  • If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under ASF 2.0?
  • If applicable, have you updated the LICENSE, LICENSE-binary, NOTICE-binary files?

@ahmarsuhail ahmarsuhail marked this pull request as draft January 28, 2025 13:29
@ahmarsuhail ahmarsuhail changed the title /HADOOP-19348. Integrate analytics accelerator into S3A. HADOOP-19348. Integrate analytics accelerator into S3A. Jan 28, 2025
@ahmarsuhail ahmarsuhail force-pushed the feature-HADOOP-19363-analytics-accelerator-s3 branch 2 times, most recently from e18d0a4 to d45beae Compare January 31, 2025 14:57
@ahmarsuhail
Copy link
Contributor Author

Few things to discuss here:

  • Now that we're using S3A's async client, which already has the execution interceptors attached, a lot of tests fail as out of span operations get rejected. Since we're not support auditing right now, can we recommend that if you're running with AAL turned on, turn off fs.s3a.audit.reject.out.of.span.operations?

  • The async client from the current SDK version doesn't do ranged GETs if multipartEnabled is enabled on it. For ranged GETs, either upgrade SDK or disable multipartEnabled temporary when AAL is enabled, similar to

private static final String LOGICAL_IO_PREFIX = "logicalio";

@Test
public void testConnectorFrameWorkIntegration() throws IOException {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

small parquet file, src/test/parquet

can we read the file ~10sKB
does it just complete and not complete

malformed footer

Copy link
Contributor
@mukund-thakur mukund-thakur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some old comments about javadoc

public void testOverwriteExistingFile() throws Throwable {
// Will remove this when Analytics Accelerator supports overwrites
skipIfAnalyticsAcceleratorEnabled(this.createConfiguration(),
"Analytics Accelerator does not support overwrites yet");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Analytics Accelerator is about read optimizations right? How does this relate to overwrite?
Is it because the file will be changed? You mean it doesn't support the RemoteFileChangedException?

@@ -65,6 +66,8 @@ protected Configuration createConfiguration() {
*/
@Test
public void testNotFoundFirstRead() throws Exception {
skipIfAnalyticsAcceleratorEnabled(getConfiguration(),
"Temporarily disabling to fix Exception handling on Analytics Accelerator");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

needs to be enabled.

@ahmarsuhail ahmarsuhail force-pushed the feature-HADOOP-19363-analytics-accelerator-s3 branch 2 times, most recently from e18d0a4 to 0d1f291 Compare February 7, 2025 14:57
@apache apache deleted a comment from hadoop-yetus Feb 7, 2025
@apache apache deleted a comment from hadoop-yetus Feb 7, 2025
@apache apache deleted a comment from hadoop-yetus Feb 7, 2025
@apache apache deleted a comment from hadoop-yetus Feb 7, 2025
@apache apache deleted a comment from hadoop-yetus Feb 7, 2025
@apache apache deleted a comment from hadoop-yetus Feb 7, 2025
@apache apache deleted a comment from hadoop-yetus Feb 10, 2025
@apache apache deleted a comment from hadoop-yetus Feb 10, 2025
@apache apache deleted a comment from hadoop-yetus Feb 10, 2025
@apache apache deleted a comment from hadoop-yetus Feb 10, 2025
@ahmarsuhail ahmarsuhail force-pushed the feature-HADOOP-19363-analytics-accelerator-s3 branch from a3c7498 to f408ec5 Compare February 11, 2025 16:23
@apache apache deleted a comment from hadoop-yetus Feb 11, 2025
@apache apache deleted a comment from hadoop-yetus Feb 11, 2025
@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 0s Docker mode activated.
-1 ❌ patch 0m 20s #7334 does not apply to trunk. Rebase required? Wrong Branch? See https://cwiki.apache.org/confluence/display/HADOOP/How+To+Contribute for help.
Subsystem Report/Notes
GITHUB PR #7334
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7334/13/console
versions git=2.34.1
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

import org.apache.hadoop.fs.s3a.VectoredIOContext;

/**
* Requirements for requirements for streams from this factory,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

java doc correction.

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 0s Docker mode activated.
-1 ❌ patch 0m 20s #7334 does not apply to trunk. Rebase required? Wrong Branch? See https://cwiki.apache.org/confluence/display/HADOOP/How+To+Contribute for help.
Subsystem Report/Notes
GITHUB PR #7334
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7334/14/console
versions git=2.34.1
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

@ahmarsuhail
Copy link
Contributor Author

Commit 99fbdeb means this will no longer build as is, as AAL with the new constructor that lets you pass in file information awslabs/analytics-accelerator-s3#223 must be merged in and released first (WIP!)

To test this currently, set the branch to commit: 038a692

import java.io.EOFException;
import java.io.IOException;

import org.apache.hadoop.fs.FSExceptionMessages;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit, imports are out of order


package org.apache.hadoop.fs.s3a.impl.streams;

import org.apache.hadoop.conf.Configuration;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

usual nit: import ordering, and I'd prefer an explicit import of those Constants which are being used

@Override
public void bind(final FactoryBindingParameters factoryBindingParameters) throws IOException {
super.bind(factoryBindingParameters);
this.s3SeekableInputStreamFactory = new LazyAutoCloseableReference<>(createS3SeekableInputStreamFactory());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you chop this line down..it's too wide fo side-by-side reviews

@@ -115,7 +115,7 @@ public class RequestFactoryImpl implements RequestFactory {
/**
* Callback to prepare requests.
*/
private final PrepareRequest requestPreparer;
private PrepareRequest requestPreparer;
< 8000 /svg> Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this doesn't need to be non-final any more, I shall fix in my PR

Copy link
Contributor
@steveloughran steveloughran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 pending:

  • those little nits
  • my PR in
  • new release of your library (which I've just been looking at...may need a bit of resilience there, especially to premature -1 calls.

@ahmarsuhail
Copy link
Contributor Author

nice, thanks for the review!

What do you mean by premature -1 calls?

@steveloughran
Copy link
Contributor

sometimes a read can return -1 due to network errors, not EOF. in that situation (look at read()) we abort the stream so it doesn't go back into the pool, then ask for a new one. Apparently before the abort() you could get back the same stream again, even through it was now failing. Inevitably, this is a consequence of the stream's long retention of the same connection; if it returned them after 60s this'd be less likely

@ahmarsuhail ahmarsuhail force-pushed the feature-HADOOP-19363-analytics-accelerator-s3 branch from 99fbdeb to b92a661 Compare February 24, 2025 10:54
@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 18m 7s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 1s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+0 🆗 detsecrets 0m 0s detect-secrets was not available.
+0 🆗 xmllint 0m 0s xmllint was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 23 new or modified test files.
_ trunk Compile Tests _
+0 🆗 mvndep 5m 40s Maven dependency ordering for branch
+1 💚 mvninstall 37m 0s trunk passed
+1 💚 compile 18m 7s trunk passed with JDK Ubuntu-11.0.26+4-post-Ubuntu-1ubuntu120.04
+1 💚 compile 17m 2s trunk passed with JDK Private Build-1.8.0_442-8u442-b06us1-0ubuntu120.04-b06
+1 💚 checkstyle 4m 58s trunk passed
+1 💚 mvnsite 3m 14s trunk passed
+1 💚 javadoc 2m 34s trunk passed with JDK Ubuntu-11.0.26+4-post-Ubuntu-1ubuntu120.04
+1 💚 javadoc 2m 59s trunk passed with JDK Private Build-1.8.0_442-8u442-b06us1-0ubuntu120.04-b06
+0 🆗 spotbugs 0m 51s branch/hadoop-project no spotbugs output file (spotbugsXml.xml)
+1 💚 shadedclient 40m 29s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+0 🆗 mvndep 0m 34s Maven dependency ordering for patch
-1 ❌ mvninstall 0m 21s /patch-mvninstall-hadoop-tools_hadoop-aws.txt hadoop-aws in the patch failed.
-1 ❌ compile 16m 13s /patch-compile-root-jdkUbuntu-11.0.26+4-post-Ubuntu-1ubuntu120.04.txt root in the patch failed with JDK Ubuntu-11.0.26+4-post-Ubuntu-1ubuntu120.04.
-1 ❌ javac 16m 13s /patch-compile-root-jdkUbuntu-11.0.26+4-post-Ubuntu-1ubuntu120.04.txt root in the patch failed with JDK Ubuntu-11.0.26+4-post-Ubuntu-1ubuntu120.04.
-1 ❌ compile 16m 24s /patch-compile-root-jdkPrivateBuild-1.8.0_442-8u442-b06us1-0ubuntu120.04-b06.txt root in the patch failed with JDK Private Build-1.8.0_442-8u442-b06us1-0ubuntu120.04-b06.
-1 ❌ javac 16m 24s /patch-compile-root-jdkPrivateBuild-1.8.0_442-8u442-b06us1-0ubuntu120.04-b06.txt root in the patch failed with JDK Private Build-1.8.0_442-8u442-b06us1-0ubuntu120.04-b06.
-1 ❌ blanks 0m 0s /blanks-eol.txt The patch has 1 line(s) that end in blanks. Use git apply --whitespace=fix <<patch_file>>. Refer https://git-scm.com/docs/git-apply
-0 ⚠️ checkstyle 4m 40s /results-checkstyle-root.txt root: The patch generated 70 new + 10 unchanged - 0 fixed = 80 total (was 10)
-1 ❌ mvnsite 0m 44s /patch-mvnsite-hadoop-tools_hadoop-aws.txt hadoop-aws in the patch failed.
-1 ❌ javadoc 0m 37s /patch-javadoc-hadoop-tools_hadoop-aws-jdkUbuntu-11.0.26+4-post-Ubuntu-1ubuntu120.04.txt hadoop-aws in the patch failed with JDK Ubuntu-11.0.26+4-post-Ubuntu-1ubuntu120.04.
-1 ❌ javadoc 1m 21s /results-javadoc-javadoc-hadoop-tools_hadoop-aws-jdkPrivateBuild-1.8.0_442-8u442-b06us1-0ubuntu120.04-b06.txt hadoop-tools_hadoop-aws-jdkPrivateBuild-1.8.0_442-8u442-b06us1-0ubuntu120.04-b06 with JDK Private Build-1.8.0_442-8u442-b06us1-0ubuntu120.04-b06 generated 4 new + 0 unchanged - 0 fixed = 4 total (was 0)
+0 🆗 spotbugs 0m 32s hadoop-project has no data from spotbugs
-1 ❌ spotbugs 0m 38s /patch-spotbugs-hadoop-tools_hadoop-aws.txt hadoop-aws in the patch failed.
+1 💚 shadedclient 41m 46s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 0m 32s hadoop-project in the patch passed.
+1 💚 unit 15m 21s hadoop-common in the patch passed.
-1 ❌ unit 0m 39s /patch-unit-hadoop-tools_hadoop-aws.txt hadoop-aws in the patch failed.
+1 💚 asflicense 0m 59s The patch does not generate ASF License warnings.
268m 23s
Subsystem Report/Notes
Docker ClientAPI=1.48 ServerAPI=1.48 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7334/15/artifact/out/Dockerfile
GITHUB PR #7334
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets xmllint
uname Linux 02465e99e881 5.15.0-130-generic #140-Ubuntu SMP Wed Dec 18 17:59:53 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / b92a661
Default Java Private Build-1.8.0_442-8u442-b06us1-0ubuntu120.04-b06
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.26+4-post-Ubuntu-1ubuntu120.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_442-8u442-b06us1-0ubuntu120.04-b06
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7334/15/testReport/
Max. process+thread count 1517 (vs. ulimit of 5500)
modules C: hadoop-project hadoop-common-project/hadoop-common hadoop-tools/hadoop-aws U: .
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7334/15/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

@ahmarsuhail ahmarsuhail marked this pull request as ready for review February 24, 2025 15:34
@ahmarsuhail
Copy link
Contributor Author

@steveloughran @mukund-thakur this is now ready for final reviews.

Have addressed previous review comments, and just did a 0.0.4 release for AAL, it's not available in maven just yet, but should be there soon.

Have a failing unit test, fixing that now.

@steveloughran
Copy link
Contributor

will test tomorrow, and do a local hadoop build with this set up as my default stream to see how that works.

now, one thing I don't think I got right was the ability to switch stream for test runs based on a -D maven option. Does it work for you? if not, what does work?

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 48s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 1s codespell was not available.
+0 🆗 detsecrets 0m 1s detect-secrets was not available.
+0 🆗 xmllint 0m 1s xmllint was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 23 new or modified test files.
_ trunk Compile Tests _
+0 🆗 mvndep 5m 44s Maven dependency ordering for branch
+1 💚 mvninstall 39m 48s trunk passed
+1 💚 compile 18m 53s trunk passed with JDK Ubuntu-11.0.26+4-post-Ubuntu-1ubuntu120.04
+1 💚 compile 16m 20s trunk passed with JDK Private Build-1.8.0_442-8u442-b06us1-0ubuntu120.04-b06
+1 💚 checkstyle 4m 45s trunk passed
+1 💚 mvnsite 3m 2s trunk passed
+1 💚 javadoc 2m 32s trunk passed with JDK Ubuntu-11.0.26+4-post-Ubuntu-1ubuntu120.04
+1 💚 javadoc 2m 5s trunk passed with JDK Private Build-1.8.0_442-8u442-b06us1-0ubuntu120.04-b06
+0 🆗 spotbugs 0m 37s branch/hadoop-project no spotbugs output file (spotbugsXml.xml)
+1 💚 shadedclient 42m 29s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+0 🆗 mvndep 0m 33s Maven dependency ordering for patch
-1 ❌ mvninstall 0m 22s /patch-mvninstall-hadoop-tools_hadoop-aws.txt hadoop-aws in the patch failed.
-1 ❌ compile 16m 46s /patch-compile-root-jdkUbuntu-11.0.26+4-post-Ubuntu-1ubuntu120.04.txt root in the patch failed with JDK Ubuntu-11.0.26+4-post-Ubuntu-1ubuntu120.04.
-1 ❌ javac 16m 46s /patch-compile-root-jdkUbuntu-11.0.26+4-post-Ubuntu-1ubuntu120.04.txt root in the patch failed with JDK Ubuntu-11.0.26+4-post-Ubuntu-1ubuntu120.04.
-1 ❌ compile 15m 24s /patch-compile-root-jdkPrivateBuild-1.8.0_442-8u442-b06us1-0ubuntu120.04-b06.txt root in the patch failed with JDK Private Build-1.8.0_442-8u442-b06us1-0ubuntu120.04-b06.
-1 ❌ javac 15m 24s /patch-compile-root-jdkPrivateBuild-1.8.0_442-8u442-b06us1-0ubuntu120.04-b06.txt root in the patch failed with JDK Private Build-1.8.0_442-8u442-b06us1-0ubuntu120.04-b06.
-1 ❌ blanks 0m 0s /blanks-eol.txt The patch has 1 line(s) that end in blanks. Use git apply --whitespace=fix <<patch_file>>. Refer https://git-scm.com/docs/git-apply
-0 ⚠️ checkstyle 4m 36s /results-checkstyle-root.txt root: The patch generated 67 new + 10 unchanged - 0 fixed = 77 total (was 10)
-1 ❌ mvnsite 0m 40s /patch-mvnsite-hadoop-tools_hadoop-aws.txt hadoop-aws in the patch failed.
-1 ❌ javadoc 0m 38s /patch-javadoc-hadoop-tools_hadoop-aws-jdkUbuntu-11.0.26+4-post-Ubuntu-1ubuntu120.04.txt hadoop-aws in the patch failed with JDK Ubuntu-11.0.26+4-post-Ubuntu-1ubuntu120.04.
-1 ❌ javadoc 0m 43s /results-javadoc-javadoc-hadoop-tools_hadoop-aws-jdkPrivateBuild-1.8.0_442-8u442-b06us1-0ubuntu120.04-b06.txt hadoop-tools_hadoop-aws-jdkPrivateBuild-1.8.0_442-8u442-b06us1-0ubuntu120.04-b06 with JDK Private Build-1.8.0_442-8u442-b06us1-0ubuntu120.04-b06 generated 4 new + 0 unchanged - 0 fixed = 4 total (was 0)
+0 🆗 spotbugs 0m 30s hadoop-project has no data from spotbugs
-1 ❌ spotbugs 0m 38s /patch-spotbugs-hadoop-tools_hadoop-aws.txt hadoop-aws in the patch failed.
+1 💚 shadedclient 42m 11s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 0m 34s hadoop-project in the patch passed.
+1 💚 unit 15m 20s hadoop-common in the patch passed.
-1 ❌ unit 0m 42s /patch-unit-hadoop-tools_hadoop-aws.txt hadoop-aws in the patch failed.
+1 💚 asflicense 1m 2s The patch does not generate ASF License warnings.
253m 8s
Subsystem Report/Notes
Docker ClientAPI=1.48 ServerAPI=1.48 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7334/16/artifact/out/Dockerfile
GITHUB PR #7334
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets xmllint
uname Linux 9ac62bbd2357 5.15.0-130-generic #140-Ubuntu SMP Wed Dec 18 17:59:53 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / 5a2db0a
Default Java Private Build-1.8.0_442-8u442-b06us1-0ubuntu120.04-b06
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.26+4-post-Ubuntu-1ubuntu120.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_442-8u442-b06us1-0ubuntu120.04-b06
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7334/16/testReport/
Max. process+thread count 1294 (vs. ulimit of 5500)
modules C: hadoop-project hadoop-common-project/hadoop-common hadoop-tools/hadoop-aws U: .
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7334/16/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 24s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 1s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+0 🆗 detsecrets 0m 0s detect-secrets was not available.
+0 🆗 xmllint 0m 0s xmllint was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 23 new or modified test files.
_ trunk Compile Tests _
+0 🆗 mvndep 6m 6s Maven dependency ordering for branch
+1 💚 mvninstall 18m 27s trunk passed
+1 💚 compile 8m 20s trunk passed with JDK Ubuntu-11.0.26+4-post-Ubuntu-1ubuntu120.04
+1 💚 compile 7m 16s trunk passed with JDK Private Build-1.8.0_442-8u442-b06us1-0ubuntu120.04-b06
+1 💚 checkstyle 1m 58s trunk passed
+1 💚 mvnsite 2m 2s trunk passed
+1 💚 javadoc 1m 45s trunk passed with JDK Ubuntu-11.0.26+4-post-Ubuntu-1ubuntu120.04
+1 💚 javadoc 1m 34s trunk passed with JDK Private Build-1.8.0_442-8u442-b06us1-0ubuntu120.04-b06
+0 🆗 spotbugs 0m 31s branch/hadoop-project no spotbugs output file (spotbugsXml.xml)
+1 💚 shadedclient 20m 14s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+0 🆗 mvndep 0m 24s Maven dependency ordering for patch
+1 💚 mvninstall 1m 1s the patch passed
+1 💚 compile 7m 59s the patch passed with JDK Ubuntu-11.0.26+4-post-Ubuntu-1ubuntu120.04
+1 💚 javac 7m 59s the patch passed
+1 💚 compile 7m 19s the patch passed with JDK Private Build-1.8.0_442-8u442-b06us1-0ubuntu120.04-b06
+1 💚 javac 7m 19s the patch passed
-1 ❌ blanks 0m 0s /blanks-eol.txt The patch has 1 line(s) that end in blanks. Use git apply --whitespace=fix <<patch_file>>. Refer https://git-scm.com/docs/git-apply
-0 ⚠️ checkstyle 1m 54s /results-checkstyle-root.txt root: The patch generated 66 new + 10 unchanged - 0 fixed = 76 total (was 10)
+1 💚 mvnsite 1m 57s the patch passed
+1 💚 javadoc 1m 45s the patch passed with JDK Ubuntu-11.0.26+4-post-Ubuntu-1ubuntu120.04
+1 💚 javadoc 1m 29s the patch passed with JDK Private Build-1.8.0_442-8u442-b06us1-0ubuntu120.04-b06
+0 🆗 spotbugs 0m 26s hadoop-project has no data from spotbugs
+1 💚 shadedclient 20m 21s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 0m 25s hadoop-project in the patch passed.
+1 💚 unit 13m 1s hadoop-common in the patch passed.
+1 💚 unit 2m 36s hadoop-aws in the patch passed.
+1 💚 asflicense 0m 42s The patch does not generate ASF License warnings.
137m 12s
Subsystem Report/Notes
Docker ClientAPI=1.48 ServerAPI=1.48 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7334/18/artifact/out/Dockerfile
GITHUB PR #7334
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets xmllint
uname Linux 1671b9fceb53 5.15.0-130-generic #140-Ubuntu SMP Wed Dec 18 17:59:53 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / b2b045f
Default Java Private Build-1.8.0_442-8u442-b06us1-0ubuntu120.04-b06
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.26+4-post-Ubuntu-1ubuntu120.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_442-8u442-b06us1-0ubuntu120.04-b06
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7334/18/testReport/
Max. process+thread count 1329 (vs. ulimit of 5500)
modules C: hadoop-project hadoop-common-project/hadoop-common hadoop-tools/hadoop-aws U: .
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7334/18/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 12m 23s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 1s codespell was not available.
+0 🆗 detsecrets 0m 1s detect-secrets was not available.
+0 🆗 xmllint 0m 1s xmllint was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 23 new or modified test files.
_ trunk Compile Tests _
+0 🆗 mvndep 6m 28s Maven dependency ordering for branch
+1 💚 mvninstall 31m 48s trunk passed
+1 💚 compile 15m 29s trunk passed with JDK Ubuntu-11.0.26+4-post-Ubuntu-1ubuntu120.04
+1 💚 compile 13m 36s trunk passed with JDK Private Build-1.8.0_442-8u442-b06us1-0ubuntu120.04-b06
+1 💚 checkstyle 4m 12s trunk passed
+1 💚 mvnsite 3m 12s trunk passed
+1 💚 javadoc 2m 46s trunk passed with JDK Ubuntu-11.0.26+4-post-Ubuntu-1ubuntu120.04
+1 💚 javadoc 2m 22s trunk passed with JDK Private Build-1.8.0_442-8u442-b06us1-0ubuntu120.04-b06
+0 🆗 spotbugs 0m 45s branch/hadoop-project no spotbugs output file (spotbugsXml.xml)
+1 💚 shadedclient 35m 1s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+0 🆗 mvndep 0m 35s Maven dependency ordering for patch
-1 ❌ mvninstall 0m 21s /patch-mvninstall-hadoop-tools_hadoop-aws.txt hadoop-aws in the patch failed.
-1 ❌ compile 14m 1s /patch-compile-root-jdkUbuntu-11.0.26+4-post-Ubuntu-1ubuntu120.04.txt root in the patch failed with JDK Ubuntu-11.0.26+4-post-Ubuntu-1ubuntu120.04.
-1 ❌ javac 14m 1s /patch-compile-root-jdkUbuntu-11.0.26+4-post-Ubuntu-1ubuntu120.04.txt root in the patch failed with JDK Ubuntu-11.0.26+4-post-Ubuntu-1ubuntu120.04.
-1 ❌ compile 12m 58s /patch-compile-root-jdkPrivateBuild-1.8.0_442-8u442-b06us1-0ubuntu120.04-b06.txt root in the patch failed with JDK Private Build-1.8.0_442-8u442-b06us1-0ubuntu120.04-b06.
-1 ❌ javac 12m 58s /patch-compile-root-jdkPrivateBuild-1.8.0_442-8u442-b06us1-0ubuntu120.04-b06.txt root in the patch failed with JDK Private Build-1.8.0_442-8u442-b06us1-0ubuntu120.04-b06.
-1 ❌ blanks 0m 0s /blanks-eol.txt The patch has 1 line(s) that end in blanks. Use git apply --whitespace=fix <<patch_file>>. Refer https://git-scm.com/docs/git-apply
-0 ⚠️ checkstyle 4m 9s /results-checkstyle-root.txt root: The patch generated 66 new + 10 unchanged - 0 fixed = 76 total (was 10)
-1 ❌ mvnsite 0m 42s /patch-mvnsite-hadoop-tools_hadoop-aws.txt hadoop-aws in the patch failed.
-1 ❌ javadoc 0m 45s /patch-javadoc-hadoop-tools_hadoop-aws-jdkUbuntu-11.0.26+4-post-Ubuntu-1ubuntu120.04.txt hadoop-aws in the patch failed with JDK Ubuntu-11.0.26+4-post-Ubuntu-1ubuntu120.04.
-1 ❌ javadoc 0m 48s /results-javadoc-javadoc-hadoop-tools_hadoop-aws-jdkPrivateBuild-1.8.0_442-8u442-b06us1-0ubuntu120.04-b06.txt hadoop-tools_hadoop-aws-jdkPrivateBuild-1.8.0_442-8u442-b06us1-0ubuntu120.04-b06 with JDK Private Build-1.8.0_442-8u442-b06us1-0ubuntu120.04-b06 generated 4 new + 0 unchanged - 0 fixed = 4 total (was 0)
+0 🆗 spotbugs 0m 37s hadoop-project has no data from spotbugs
-1 ❌ spotbugs 0m 42s /patch-spotbugs-hadoop-tools_hadoop-aws.txt hadoop-aws in the patch failed.
+1 💚 shadedclient 36m 22s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 0m 36s hadoop-project in the patch passed.
+1 💚 unit 14m 48s hadoop-common in the patch passed.
-1 ❌ unit 0m 42s /patch-unit-hadoop-tools_hadoop-aws.txt hadoop-aws in the patch failed.
+1 💚 asflicense 1m 1s The patch does not generate ASF License warnings.
233m 22s
Subsystem Report/Notes
Docker ClientAPI=1.48 ServerAPI=1.48 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7334/17/artifact/out/Dockerfile
GITHUB PR #7334
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets xmllint
uname Linux afbc519884bb 5.15.0-131-generic #141-Ubuntu SMP Fri Jan 10 21:18:28 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / 2fa69eb
Default Java Private Build-1.8.0_442-8u442-b06us1-0ubuntu120.04-b06
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.26+4-post-Ubuntu-1ubuntu120.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_442-8u442-b06us1-0ubuntu120.04-b06
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7334/17/testReport/
Max. process+thread count 2149 (vs. ulimit of 5500)
modules C: hadoop-project hadoop-common-project/hadoop-common hadoop-tools/hadoop-aws U: .
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7334/17/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

Copy link
Contributor
@steveloughran steveloughran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minimal changes to production code -once they are addressed and my test failure fixed, this is good to merge

The comments on the tests do also need addressing, but they can be done as a followup unless you really want to do them today.

/**
* Is this S3A FS instance using analytics accelerator?
*/
private boolean isAnalyticsAccelaratorEnabled;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW I've got a draft plan in my head to move all these bools and other config options into some config class under store...probably start with the Conditional Create stuff where the new flags go in. Microsoft's reflection-based resolution is what I'm looking at.

@@ -202,6 +202,11 @@ final class S3ClientCreationParameters {
*/
private boolean fipsEnabled;

/**
* Is analytics accelerator enabled
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit, add ?

super(InputStreamType.Analytics, parameters);
S3ObjectAttributes s3Attributes = parameters.getObjectAttributes();
this.inputStream = s3SeekableInputStreamFactory.createStream(S3URI.of(s3Attributes.getBucket(), s3Attributes.getKey()), buildOpenStreamInformation(parameters));
getS3AStreamStatistics().streamOpened(InputStreamType.Analytics);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IDE is unhappy; make superclass getS3AStreamStatistics() final

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

S3APrefetchingInputStream overrides getS3AStreamStatistics, so can't make it final..

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

aah. no worries then

* @return true if the given {@code capability} is supported by this stream, false otherwise.
*/
@Override
public boolean hasCapability(String capability) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cut. the superclass does implement two capabilities now: leak detection and IOStatistics

try {
bytesRead = inputStream.readTail(buf, off, len);
} catch (IOException ioe) {
onReadFailure(ioe);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think recovery will need some future work here, either here or underneath. Complex to do and test


ConnectorConfiguration connectorConfiguration =
new ConnectorConfiguration(conf, ANALYTICS_ACCELERATOR_CONFIGURATION_PREFIX);
Assertions.assertThatExceptionOfType(IllegalArgumentException.class)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LambaTestUtils.intercept, as if an exception isn't raised you'll get the configuration.toString() of what was built

import static org.apache.hadoop.fs.s3a.S3ATestUtils.getS3AInputStream;
import static org.apache.hadoop.fs.s3a.S3ATestUtils.removeBaseAndBucketOverrides;
import static org.apache.hadoop.fs.s3a.S3ATestUtils.streamType;
import static org.apache.hadoop.fs.s3a.S3ATestUtils.*;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

revert

public void testInputStreamStatisticRead() throws Throwable {
// Analytics accelerator currently does not support IOStatistics, this will be added as
// part of https://issues.apache.org/jira/browse/HADOOP-19364
skipIfAnalyticsAcceleratorEnabled(createConfiguration(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getConfiguration().

@@ -42,6 +44,10 @@ public class ITestS3AFileSystemStatistic extends AbstractS3ATestBase {
*/
@Test
public void testBytesReadWithStream() throws IOException {
// Analytics accelerator currently does not support IOStatistics, this will be added as
// part of https://issues.apache.org/jira/browse/HADOOP-19364
skipIfAnalyticsAcceleratorEnabled(createConfiguration(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getConfiguration()

@steveloughran
Copy link
Contributor

I got failures on the vector read stuff. As I noted in the tests, a separate contract tests for the analytics stream is needed.

Prefetch was set everywhere

  <property>
    <name>fs.s3a.input.stream.type</name>
    <value>prefetch</value>
  </property>

note you can subclass assumeEnabled() for contract tests, invoke the superclass and then add more assumptions.

I have suggested using getConfiguration() in some tests, but I see that actually we don't export that from the contract tests. I think we should, but there'll be problems with subclasses. Better to have the superclass setup run then ask for its conf from the contract

setup() {
  super.setup()
  conf = getContract().getConf();
}

or the fileystem getFilesystem.getConf() which is this gives you the fully resolved config of the FS

[ERROR] ITestS3AContractVectoredRead.testVectoredReadWholeFilePlusOne
[ERROR]   Run 1: Expected a java.io.EOFException to be thrown, but got the result: : "triggered read of 1 ranges against org.apache.hadoop.fs.FSDataInputStream@114f7a7e: ObjectInputStream{streamType=Prefetch, uri='s3a://stevel-london/job-00-fork-0009/test/vectored_file.txt', contentLength=65536, inputPolicy=random, vectoredIOContext=VectoredIOContext{minSeekForVectorReads=0, maxReadSizeForVectorReads=2097152, vectoredActiveRangeReads=4}} org.apache.hadoop.fs.s3a.prefetch.S3APrefetchingInputStream@3a0c0511{counters=((stream_read_version_mismatches=0) (stream_read_bytes_discarded_in_close=0) (stream_read_operations=1) (stream_read_fully_operations=0) (stream_read_block_read.failures=0) (stream_read_seek_forward_operations=0) (stream_read_vectored_combined_ranges=0) (stream_read_seek_policy_changed=2) (stream_read_total_bytes=65536) (action_http_get_request.failures=0) (stream_read_exceptions=0) (stream_read_vectored_read_bytes_discarded=0) (stream_read_remote_stream_aborted.failures=0) (stream_read_bytes_discarded_in_abort=0) (stream_read_seek_bytes_discarded=0) (stream_file_cache_eviction=0) (stream_read_bytes=65536) (stream_evict_blocks_from_cache=0) (stream_read_opened=1) (action_file_opened=1) (stream_leaks=0) (stream_read_operations_incomplete=0) (stream_read_prefetch_operations=0) (action_executor_acquired.failures=0) (stream_read_remote_stream_drain.failures=0) (stream_read_vectored_incoming_ranges=0) (action_http_get_request=1) (stream_read_seek_operations=0) (stream_read_analytics_opened=0) (stream_read_block_acquire_read=0) (stream_read_prefetch_operations.failures=0) (stream_read_closed=1) (action_file_opened.failures=0) (stream_read_close_operations=0) (stream_file_cache_eviction.failures=0) (stream_read_bytes_backwards_on_seek=0) (action_executor_acquired=0) (stream_read_seek_backward_operations=0) (stream_read_remote_stream_drain=1) (stream_read_unbuffered=0) (stream_aborted=0) (stream_read_seek_bytes_skipped=0) (stream_read_remote_stream_aborted=0) (stream_read_block_read=1) (stream_read_block_acquire_read.failures=0) (stream_read_vectored_operations=0));
gauges=((stream_read_active_prefetch_operations=0) (stream_read_active_memory_in_use=0) (stream_read_blocks_in_cache=0) (stream_read_gauge_input_policy=1));
minimums=((action_file_opened.min=0) (stream_read_remote_stream_aborted.failures.min=-1) (stream_read_remote_stream_drain.failures.min=-1) (stream_read_remote_stream_aborted.min=-1) (stream_read_block_acquire_read.failures.min=-1) (action_http_get_request.min=45) (stream_read_block_acquire_read.min=-1) (stream_read_block_read.failures.min=-1) (stream_read_prefetch_operations.min=-1) (stream_read_prefetch_operations.failures.min=-1) (action_executor_acquired.min=-1) (stream_file_cache_eviction.min=-1) (action_http_get_request.failures.min=-1) (stream_read_remote_stream_drain.min=0) (stream_read_block_read.min=170) (stream_file_cache_eviction.failures.min=-1) (action_executor_acquired.failures.min=-1) (action_file_opened.failures.min=-1));
maximums=((stream_read_prefetch_operations.max=-1) (action_executor_acquired.max=-1) (stream_read_block_read.failures.max=-1) (action_http_get_request.max=45) (stream_file_cache_eviction.failures.max=-1) (stream_read_prefetch_operations.failures.max=-1) (stream_read_block_acquire_read.max=-1) (action_http_get_request.failures.max=-1) (action_file_opened.failures.max=-1) (stream_read_block_acquire_read.failures.max=-1) (action_executor_acquired.failures.max=-1) (stream_read_remote_stream_drain.failures.max=-1) (action_file_opened.max=0) (stream_read_remote_stream_drain.max=0) (stream_read_remote_stream_aborted.max=-1) (stream_file_cache_eviction.max=-1) (stream_read_block_read.max=170) (stream_read_remote_stream_aborted.failures.max=-1));
means=((stream_read_remote_stream_drain.mean=(samples=1, sum=0, mean=0.0000)) (stream_read_prefetch_operations.mean=(samples=0, sum=0, mean=0.0000)) (stream_read_block_acquire_read.failures.mean=(samples=0, sum=0, mean=0.0000)) (stream_read_block_read.mean=(samples=1, sum=170, mean=170.0000)) (action_executor_acquired.failures.mean=(samples=0, sum=0, mean=0.0000)) (action_file_opened.mean=(samples=1, sum=0, mean=0.0000)) (stream_read_remote_stream_aborted.failures.mean=(samples=0, sum=0, mean=0.0000)) (action_file_opened.failures.mean=(samples=0, sum=0, mean=0.0000)) (action_http_get_request.failures.mean=(samples=0, sum=0, mean=0.0000)) (stream_read_prefetch_operations.failures.mean=(samples=0, sum=0, mean=0.0000)) (stream_read_block_acquire_read.mean=(samples=0, sum=0, mean=0.0000)) (stream_file_cache_eviction.mean=(samples=0, sum=0, mean=0.0000)) (stream_file_cache_eviction.failures.mean=(samples=0, sum=0, mean=0.0000)) (action_http_get_request.mean=(samples=1, sum=45, mean=45.0000)) (action_executor_acquired.mean=(samples=0, sum=0, mean=0.0000)) (stream_read_remote_stream_drain.failures.mean=(samples=0, sum=0, mean=0.0000)) (stream_read_block_read.failures.mean=(samples=0, sum=0, mean=0.0000)) (stream_read_remote_stream_aborted.mean=(samples=0, sum=0, mean=0.0000)));
}"
[ERROR]   Run 2: Expected a java.io.EOFException to be thrown, but got the result: : "triggered read of 1 ranges against org.apache.hadoop.fs.FSDataInputStream@5b4f1c30: ObjectInputStream{streamType=Prefetch, uri='s3a://stevel-london/job-00-fork-0009/test/vectored_file.txt', contentLength=65536, inputPolicy=random, vectoredIOContext=VectoredIOContext{minSeekForVectorReads=0, maxReadSizeForVectorReads=2097152, vectoredActiveRangeReads=4}} org.apache.hadoop.fs.s3a.prefetch.S3APrefetchingInputStream@57150126{counters=((stream_read_vectored_read_bytes_discarded=0) (stream_read_vectored_combined_ranges=0) (stream_read_seek_operations=0) (stream_read_seek_bytes_discarded=0) (stream_read_opened=1) (stream_read_remote_stream_drain=1) (action_executor_acquired.failures=0) (stream_evict_blocks_from_cache=0) (stream_read_vectored_incoming_ranges=0) (stream_read_fully_operations=0) (stream_read_block_read.failures=0) (stream_read_bytes_discarded_in_abort=0) (stream_read_bytes_discarded_in_close=0) (action_executor_acquired=0) (stream_read_operations=1) (action_http_get_request=1) (stream_read_bytes=65536) (stream_read_operations_incomplete=0) (stream_aborted=0) (action_file_opened.failures=0) (stream_file_cache_eviction.failures=0) (stream_read_prefetch_operations.failures=0) (stream_read_bytes_backwards_on_seek=0) (stream_read_exceptions=0) (stream_read_remote_stream_drain.failures=0) (action_file_opened=1) (stream_read_block_acquire_read.failures=0) (stream_read_close_operations=0) (stream_read_unbuffered=0) (stream_read_seek_bytes_skipped=0) (stream_read_vectored_operations=0) (stream_read_version_mismatches=0) (stream_read_block_read=1) (stream_read_total_bytes=65536) (stream_read_seek_policy_changed=2) (stream_read_seek_backward_operations=0) (stream_read_analytics_opened=0) (stream_leaks=0) (stream_read_prefetch_operations=0) (stream_read_seek_forward_operations=0) (stream_read_closed=1) (stream_file_cache_eviction=0) (action_http_get_request.failures=0) (stream_read_remote_stream_aborted=0) (stream_read_remote_stream_aborted.failures=0) (stream_read_block_acquire_read=0));
gauges=((stream_read_active_prefetch_operations=0) (stream_read_active_memory_in_use=0) (stream_read_blocks_in_cache=0) (stream_read_gauge_input_policy=1));
minimums=((action_executor_acquired.failures.min=-1) (action_file_opened.failures.min=-1) (stream_read_block_acquire_read.failures.min=-1) (action_http_get_request.min=55) (action_file_opened.min=0) (stream_read_prefetch_operations.failures.min=-1) (stream_read_block_acquire_read.min=-1) (action_http_get_request.failures.min=-1) (stream_read_prefetch_operations.min=-1) (stream_read_block_read.min=192) (stream_read_remote_stream_aborted.failures.min=-1) (stream_read_remote_stream_aborted.min=-1) (stream_read_block_read.failures.min=-1) (stream_read_remote_stream_drain.failures.min=-1) (stream_file_cache_eviction.failures.min=-1) (stream_read_remote_stream_drain.min=0) (stream_file_cache_eviction.min=-1) (action_executor_acquired.min=-1));
maximums=((action_file_opened.max=0) (stream_read_remote_stream_aborted.max=-1) (stream_read_block_acquire_read.failures.max=-1) (action_http_get_request.failures.max=-1) (action_executor_acquired.failures.max=-1) (stream_read_remote_stream_drain.failures.max=-1) (stream_file_cache_eviction.failures.max=-1) (stream_file_cache_eviction.max=-1) (stream_read_prefetch_operations.failures.max=-1) (action_http_get_request.max=55) (stream_read_block_read.max=192) (stream_read_remote_stream_drain.max=0) (action_file_opened.failures.max=-1) (action_executor_acquired.max=-1) (stream_read_prefetch_operations.max=-1) (stream_read_block_read.failures.max=-1) (stream_read_block_acquire_read.max=-1) (stream_read_remote_stream_aborted.failures.max=-1));
means=((action_executor_acquired.failures.mean=(samples=0, sum=0, mean=0.0000)) (stream_read_remote_stream_aborted.failures.mean=(samples=0, sum=0, mean=0.0000)) (stream_read_prefetch_operations.failures.mean=(samples=0, sum=0, mean=0.0000)) (stream_read_remote_stream_aborted.mean=(samples=0, sum=0, mean=0.0000)) (stream_read_prefetch_operations.mean=(samples=0, sum=0, mean=0.0000)) (stream_file_cache_eviction.failures.mean=(samples=0, sum=0, mean=0.0000)) (action_http_get_request.failures.mean=(samples=0, sum=0, mean=0.0000)) (action_http_get_request.mean=(samples=1, sum=55, mean=55.0000)) (stream_read_block_read.mean=(samples=1, sum=192, mean=192.0000)) (stream_file_cache_eviction.mean=(samples=0, sum=0, mean=0.0000)) (action_file_opened.mean=(samples=1, sum=0, mean=0.0000)) (stream_read_remote_stream_drain.failures.mean=(samples=0, sum=0, mean=0.0000)) (stream_read_block_acquire_read.failures.mean=(samples=0, sum=0, mean=0.0000)) (action_executor_acquired.mean=(samples=0, sum=0, mean=0.0000)) (stream_read_remote_stream_drain.mean=(samples=1, sum=0, mean=0.0000)) (stream_read_block_read.failures.mean=(samples=0, sum=0, mean=0.0000)) (action_file_opened.failures.mean=(samples=0, sum=0, mean=0.0000)) (stream_read_block_acquire_read.mean=(samples=0, sum=0, mean=0.0000)));
}"

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 0s Docker mode activated.
-1 ❌ patch 0m 20s #7334 does not apply to trunk. Rebase required? Wrong Branch? See https://cwiki.apache.org/confluence/display/HADOOP/How+To+Contribute for help.
Subsystem Report/Notes
GITHUB PR #7334
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7334/19/console
versions git=2.34.1
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 0s Docker mode activated.
-1 ❌ patch 0m 20s #7334 does not apply to trunk. Rebase required? Wrong Branch? See https://cwiki.apache.org/confluence/display/HADOOP/How+To+Contribute for help.
Subsystem Report/Notes
GITHUB PR #7334
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7334/20/console
versions git=2.34.1
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 0s Docker mode activated.
-1 ❌ patch 0m 21s #7334 does not apply to trunk. Rebase required? Wrong Branch? See https://cwiki.apache.org/confluence/display/HADOOP/How+To+Contribute for help.
Subsystem Report/Notes
GITHUB PR #7334
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7334/21/console
versions git=2.34.1
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

@ahmarsuhail
Copy link
Contributor Author

@steveloughran addressed most of your comments, will just double check everything tomorrow morning and let you know once it's ready for another review.

ITestS3AContractVectoredRead.testVectoredReadWholeFilePlusOne will fail as by default fs.contract.vector-io-early-eof-check is true. I set them to conf.set("fs.contract.vector-io-early-eof-check", "false"); in the new ITestS3AContractAnalyticsStreamVectoredRead and they pass.

used getContract().getConf() for the conf in the contract tests

@steveloughran
Copy link
Contributor

thanks. just been analytics-enabling storediag (!) and discovered it was actually prefetch stream i'd bonded to. Will retest now.

with the storediag tests, can i ask that telemetry logs @ debug. thanks,

2025-02-25 18:29:05,250 [main] INFO  s3.telemetry (LogHelper.java:logAtLevel(47)) - [2025-02-25T18:29:05.249Z] [  start] [-2646ti39lg28u] block.manager.make.range.available(generation=0, thread_id=1, range=0-0, etag="74ab891e3bf169d399fd679c7d45bb08", uri=s3://stevel-london/temp/subdir/dir-ebf44484-b3c2-4855-9a4d-b3ae45a56f1b/file, range.effective=0-65535)

Copy link
Contributor
@steveloughran steveloughran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

LGTM; tested s3 london, also tested storediag to two third-party stores...though not the full Itests.

one final change to add before the merge:

Add stream_read_analytics_opened to org.apache.hadoop.fs.s3a.Statistic as a counter on line 325 (just after /* Stream Reads */)

    STREAM_READ_ANALYTICS_OPENED(
        StreamStatisticNames.STREAM_READ_ANALYTICS_OPENED,
        "Bytes read from an input stream in read() calls",
        TYPE_COUNTER),
)

This will add it to FS stats and collect from the input stream on close().

I noticed this in storediag; analytics stats don't get retained in the fs.

super(InputStreamType.Analytics, parameters);
S3ObjectAttributes s3Attributes = parameters.getObjectAttributes();
this.inputStream = s3SeekableInputStreamFactory.createStream(S3URI.of(s3Attributes.getBucket(), s3Attributes.getKey()), buildOpenStreamInformation(parameters));
getS3AStreamStatistics().streamOpened(InputStreamType.Analytics);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

aah. no worries then

* can contain multiple row groups, this allows for further parallelisation, as each row group
* can be processed independently.
*
* @throws IOException any IO problem
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just cut this line; it's out of date and needless on a test case

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 0s Docker mode activated.
-1 ❌ patch 0m 20s #7334 does not apply to trunk. Rebase required? Wrong Branch? See https://cwiki.apache.org/confluence/display/HADOOP/How+To+Contribute for help.
Subsystem Report/Notes
GITHUB PR #7334
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7334/22/console
versions git=2.34.1
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

@ahmarsuhail
Copy link
Contributor Author

@steveloughran - addressed latest comments.

I'll move AAL logs to debug before the AAL next release.

Do you know why yetus is complaing: "#7334 does not apply to trunk. Rebase required? Wrong Branch? See.. " wondering if this is because this branch was originally based on top of your input stream branch

@ahmarsuhail ahmarsuhail force-pushed the feature-HADOOP-19363-analytics-accelerator-s3 branch from 944fbbb to a8da9b2 Compare February 26, 2025 12:50
@steveloughran
Copy link
Contributor

@ahmarsuhail If this surfaces again, try squashing the commits. yetus applies each patch in sequence, and if files are added, renamed, deleted etc sometimes things just "stay around" or generate spurious conflicts on the way.

I see #7433 does this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants
0