-
Notifications
You must be signed in to change notification settings - Fork 86
docs(clp-package): Add instructions for ingesting logs from S3-compatible object storage. #1796
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
WalkthroughAdds documentation clarifications for S3-compatible endpoints: accepts HTTP-style object/key-prefix URLs for s3-object and s3-key-prefix modes, notes CLP auto-determines endpoint URL during compression, and documents a Web UI limitation for extracted streams on custom S3 endpoints. Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~5–10 minutes
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches🧪 Generate unit tests (beta)
📜 Recent review detailsConfiguration used: Organization UI Review profile: ASSERTIVE Plan: Pro 📒 Files selected for processing (1)
🧰 Additional context used🧠 Learnings (4)📓 Common learnings📚 Learning: 2025-12-10T23:32:24.363ZApplied to files:
📚 Learning: 2025-01-13T21:18:54.629ZApplied to files:
📚 Learning: 2025-06-18T20:39:05.899ZApplied to files:
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
🔇 Additional comments (2)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
📜 Review details
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (2)
docs/src/user-docs/guides-using-object-storage/clp-usage.md(2 hunks)docs/src/user-docs/guides-using-object-storage/object-storage-config.md(1 hunks)
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: hoophalab
Repo: y-scope/clp PR: 1767
File: components/clp-py-utils/clp_py_utils/clp_config.py:580-585
Timestamp: 2025-12-10T23:32:24.363Z
Learning: In PR #1767, custom S3 endpoint support was added to CLP. The S3Config.region_code field is now optional (NonEmptyStr | None) because custom S3-compatible endpoints (MinIO, LocalStack, etc.) use path-style URLs and don't require AWS region codes. Only AWS S3 endpoints require region_code. Presto integration still requires region_code because it only works with AWS S3.
Learnt from: haiqi96
Repo: y-scope/clp PR: 852
File: components/clp-package-utils/clp_package_utils/scripts/native/compress.py:151-160
Timestamp: 2025-04-25T20:46:20.140Z
Learning: For S3 URLs without region specifications (legacy global endpoints), either assign a default region (us-east-1) or throw a clear error message requiring region specification in the URL. This addresses validation issues in components like S3InputConfig that require a non-nullable region string.
📚 Learning: 2025-12-10T23:32:24.363Z
Learnt from: hoophalab
Repo: y-scope/clp PR: 1767
File: components/clp-py-utils/clp_py_utils/clp_config.py:580-585
Timestamp: 2025-12-10T23:32:24.363Z
Learning: In PR #1767, custom S3 endpoint support was added to CLP. The S3Config.region_code field is now optional (NonEmptyStr | None) because custom S3-compatible endpoints (MinIO, LocalStack, etc.) use path-style URLs and don't require AWS region codes. Only AWS S3 endpoints require region_code. Presto integration still requires region_code because it only works with AWS S3.
Applied to files:
docs/src/user-docs/guides-using-object-storage/clp-usage.mddocs/src/user-docs/guides-using-object-storage/object-storage-config.md
🪛 LanguageTool
docs/src/user-docs/guides-using-object-storage/clp-usage.md
[style] ~34-~34: To form a complete sentence, be sure to include a subject or ‘there’.
Context: ...et-name>/*` can also be an object stored in an S3-compa...
(MISSING_IT_THERE)
[style] ~89-~89: To form a complete sentence, be sure to include a subject or ‘there’.
Context: ...et-name>/*` can also be an object stored in an S3-compa...
(MISSING_IT_THERE)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: lint-check (ubuntu-24.04)
- GitHub Check: lint-check (macos-15)
- GitHub Check: check-generated
🔇 Additional comments (1)
docs/src/user-docs/guides-using-object-storage/object-storage-config.md (1)
3-6: Documentation additions properly positioned and consistent with S3-compatible endpoint support.The clarification about S3-compatible storages using equivalent bucket policies and automatic endpoint URL determination during compression is well-placed and aligns with the custom S3 endpoint support added in PR #1767 (per learnings, where region_code is now optional for non-AWS endpoints).
| * `<object-url>` can also be an object stored in an S3-compatible storage: | ||
| * `http://<host>:<port>/<bucket-name>/<object-key>` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix incomplete sentence structure on line 34.
The bullet point should form a complete sentence. The current phrasing "* <object-url> can also be an object stored..." lacks proper sentence structure.
🔎 Suggested fix:
- * `<object-url>` can also be an object stored in an S3-compatible storage:
+ * `<object-url>` can also point to an object stored in an S3-compatible storage:📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| * `<object-url>` can also be an object stored in an S3-compatible storage: | |
| * `http://<host>:<port>/<bucket-name>/<object-key>` | |
| * `<object-url>` can also point to an object stored in an S3-compatible storage: | |
| * `http://<host>:<port>/<bucket-name>/<object-key>` |
🧰 Tools
🪛 LanguageTool
[style] ~34-~34: To form a complete sentence, be sure to include a subject or ‘there’.
Context: ...et-name>/*` can also be an object stored in an S3-compa...
(MISSING_IT_THERE)
🤖 Prompt for AI Agents
In docs/src/user-docs/guides-using-object-storage/clp-usage.md around lines
34–35, the bullet is a fragment; rewrite it as a complete sentence that states
that the object URL can reference an object stored in S3-compatible storage and
include the example URL format (host:port/bucket-name/object-key) inline so the
bullet reads as a full, self-contained sentence.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unrelated to the current PR, but i found we haven't update the webui code to support custom s3 endpoints?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
though technically we haven't been mentioning the custom endpoint feature can be used for streams... lol. so long we're explicit that the custom endpoint works only for logs inputs, i think it's fine to skip the webui support for now
anyways, i created an issue to track: #1797
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Technically, we don't officially support storing archives and streams on S3 compatible storages. The PR title and feature request were Add support for ingesting logs from S3 compatible endpoints.
With that being said, the users can actually add an "endpoint_url: ..." field in clp-config.yaml to enable this feature. I did a very quick test, and the only issue I discovered was extracting stream didn't work in webui.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure. let's mention the temporary webui limitation in docs/src/user-docs/guides-using-object-storage/object-storage-config.md if we don't plan to add the webui config in the coming release
|
@junhaoliao updated based on our offline discussion |
Description
Add instructions for ingesting logs from S3 compatible object storage.
Checklist
breaking change.
Validation performed
task docs:serveserves the doc correctlySummary by CodeRabbit
✏️ Tip: You can customize this high-level summary in your review settings.