-
Notifications
You must be signed in to change notification settings - Fork 86
feat(integration-tests): package integration test jobs (do not review) #1766
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
feat(integration-tests): package integration test jobs (do not review) #1766
Conversation
…ig file; address comments.
…er-level clp-config.yml
…E_CONFIGS other than clp-text and clp-json
|
Important Review skippedDraft detected. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the WalkthroughThis pull request introduces comprehensive integration test infrastructure for CLP package testing, including test data directories with ground-truth datasets, job configuration definitions for compression and post-compression workflows, Presto cluster support, fixture enhancements for dynamic port assignment and configuration management, and utility functions for test job orchestration and validation. Changes
Sequence Diagram(s)sequenceDiagram
actor Test
participant JobDispatch as Job Dispatch Logic
participant CompressionJob as Compression Job
participant Archive as CLP Archive
participant PostCompJob as Post-Compression Job
participant GroundTruth as Ground Truth Validator
participant PrestoFilter as Presto Filter Job
Test->>JobDispatch: dispatch_test_jobs(package_instance)
JobDispatch->>JobDispatch: Clean initial archive directory
loop For each compression job
JobDispatch->>CompressionJob: run_package_compression_script()
CompressionJob->>Archive: Compress logs into archive
Archive-->>CompressionJob: Archive created
loop For each dependent post-compression job
JobDispatch->>PostCompJob: run_package_script()
PostCompJob->>Archive: Extract/query archive
Archive-->>PostCompJob: Output generated
PostCompJob->>GroundTruth: _compare_output_with_ground_truth()
GroundTruth-->>PostCompJob: Validation result
end
loop For each dependent presto-filter job
JobDispatch->>PrestoFilter: run_presto_filter()
PrestoFilter->>Archive: Execute Presto filter query
Archive-->>PrestoFilter: Query results
PrestoFilter->>GroundTruth: Validate against ground truth
GroundTruth-->>PrestoFilter: Validation result
end
JobDispatch->>Archive: Clean up archive directory
end
sequenceDiagram
actor Test
participant Fixture as Package Fixture
participant PortUtils as Port Assignment
participant PackageConfig as PackageConfig
participant PrestoCluster as Presto Cluster
participant CLP as CLP Package
Test->>Fixture: fixt_package_config (setup)
Fixture->>PortUtils: assign_ports_from_base(BASE_PORT, clp_config)
PortUtils->>PortUtils: Discover port requirements
PortUtils->>PortUtils: Validate ports are free
PortUtils->>PackageConfig: Assign ports to components
PackageConfig-->>Fixture: Configuration ready
Fixture->>Fixture: build_package_job_list(mode, filter)
Fixture->>PackageConfig: Set package_job_list
PackageConfig-->>Fixture: Config with jobs
Fixture->>Fixture: yield config (fixture ready)
Test->>Fixture: fixt_package_instance (setup)
alt mode is "clp-json-presto"
Fixture->>PrestoCluster: start_presto_cluster(package_config)
PrestoCluster->>PrestoCluster: Construct split-filter mappings
PrestoCluster->>PrestoCluster: Generate split-filter.json
PrestoCluster->>CLP: Start Presto services
CLP-->>PrestoCluster: Services running
end
Fixture->>Fixture: yield instance (fixture ready)
Test->>Test: Execute test_clp_package()
Fixture->>Fixture: fixt_package_instance (teardown)
alt mode is "clp-json-presto"
Fixture->>PrestoCluster: stop_presto_cluster()
PrestoCluster->>CLP: Stop Presto services
PrestoCluster->>PrestoCluster: Reset split-filter.json
end
Fixture-->>Test: Cleanup complete
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Areas requiring extra attention during review:
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 14
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (50)
integration-tests/.pytest.ini(2 hunks)integration-tests/tests/conftest.py(1 hunks)integration-tests/tests/data/__init__.py(1 hunks)integration-tests/tests/data/ground-truths/filter-describe-postgresql.jsonl(1 hunks)integration-tests/tests/data/ground-truths/filter-select-distinct-postgresql.jsonl(1 hunks)integration-tests/tests/data/jobs/compression/__init__.py(1 hunks)integration-tests/tests/data/jobs/compression/compression_jobs.py(1 hunks)integration-tests/tests/data/jobs/post_compression/__init__.py(1 hunks)integration-tests/tests/data/jobs/post_compression/admin_tools_jobs.py(1 hunks)integration-tests/tests/data/jobs/post_compression/search_jobs.py(1 hunks)integration-tests/tests/data/jobs/presto/__init__.py(1 hunks)integration-tests/tests/data/jobs/presto/presto_filters.py(1 hunks)integration-tests/tests/data/logs/json-multifile/README.md(1 hunks)integration-tests/tests/data/logs/json-multifile/logs/sts-135-2011-07-08.jsonl(1 hunks)integration-tests/tests/data/logs/json-multifile/logs/sts-135-2011-07-09.jsonl(1 hunks)integration-tests/tests/data/logs/json-multifile/logs/sts-135-2011-07-11.jsonl(1 hunks)integration-tests/tests/data/logs/json-multifile/logs/sts-135-2011-07-19.jsonl(1 hunks)integration-tests/tests/data/logs/json-multifile/logs/sts-135-2011-07-21.jsonl(1 hunks)integration-tests/tests/data/logs/json-singlefile-1/README.md(1 hunks)integration-tests/tests/data/logs/json-singlefile-1/logs/artemis_ii.jsonl(1 hunks)integration-tests/tests/data/logs/json-singlefile-2/README.md(1 hunks)integration-tests/tests/data/logs/json-singlefile-2/logs/new-horizons.jsonl(1 hunks)integration-tests/tests/data/logs/text-multifile/README.md(1 hunks)integration-tests/tests/data/logs/text-multifile/logs/apollo-17_day01.txt(1 hunks)integration-tests/tests/data/logs/text-multifile/logs/apollo-17_day02.txt(1 hunks)integration-tests/tests/data/logs/text-multifile/logs/apollo-17_day03.txt(1 hunks)integration-tests/tests/data/logs/text-multifile/logs/apollo-17_day04.txt(1 hunks)integration-tests/tests/data/logs/text-multifile/logs/apollo-17_day05.txt(1 hunks)integration-tests/tests/data/logs/text-multifile/logs/apollo-17_day06.txt(1 hunks)integration-tests/tests/data/logs/text-multifile/logs/apollo-17_day07.txt(1 hunks)integration-tests/tests/data/logs/text-multifile/logs/apollo-17_day08.txt(1 hunks)integration-tests/tests/data/logs/text-multifile/logs/apollo-17_day09.txt(1 hunks)integration-tests/tests/data/logs/text-multifile/logs/apollo-17_day10.txt(1 hunks)integration-tests/tests/data/logs/text-multifile/logs/apollo-17_day11.txt(1 hunks)integration-tests/tests/data/logs/text-multifile/logs/apollo-17_day12.txt(1 hunks)integration-tests/tests/data/logs/text-multifile/logs/apollo-17_day13.txt(1 hunks)integration-tests/tests/data/logs/text-singlefile/README.md(1 hunks)integration-tests/tests/data/logs/text-singlefile/logs/voyager-1.txt(1 hunks)integration-tests/tests/fixtures/integration_test_logs.py(1 hunks)integration-tests/tests/fixtures/package_config.py(2 hunks)integration-tests/tests/fixtures/package_instance.py(2 hunks)integration-tests/tests/test_package_start.py(2 hunks)integration-tests/tests/utils/asserting_utils.py(2 hunks)integration-tests/tests/utils/clp_job_utils.py(1 hunks)integration-tests/tests/utils/clp_mode_utils.py(5 hunks)integration-tests/tests/utils/config.py(7 hunks)integration-tests/tests/utils/docker_utils.py(2 hunks)integration-tests/tests/utils/package_utils.py(3 hunks)integration-tests/tests/utils/port_utils.py(1 hunks)integration-tests/tests/utils/utils.py(2 hunks)
🧰 Additional context used
🧠 Learnings (13)
📓 Common learnings
Learnt from: Bill-hbrhbr
Repo: y-scope/clp PR: 1100
File: integration-tests/tests/fixtures/integration_test_logs.py:54-56
Timestamp: 2025-08-17T16:10:38.722Z
Learning: For PR #1100 (feat(integration-tests): Add CLP package integration tests boilerplate), do not raise cache weakness problems related to the pytest cache implementation in the integration test logs fixtures.
Learnt from: Bill-hbrhbr
Repo: y-scope/clp PR: 1100
File: integration-tests/pyproject.toml:11-13
Timestamp: 2025-08-16T10:24:29.316Z
Learning: In the CLP project's integration-tests directory, the integration tests package is not intended to be shipped or distributed - only the tests are leveraged for local testing purposes, so console script entry points should not be included in pyproject.toml.
Learnt from: Bill-hbrhbr
Repo: y-scope/clp PR: 1156
File: components/core/CMakeLists.txt:772-772
Timestamp: 2025-08-09T04:07:27.083Z
Learning: In the CLP project's CMakeLists.txt, when reviewing changes related to the ${zstd_TARGET} variable usage in test linking, the team is planning a refactoring PR to improve this mechanism. Guards for undefined target variables should be deferred to that separate PR rather than being added in focused dependency migration PRs.
Learnt from: quinntaylormitchell
Repo: y-scope/clp PR: 1125
File: components/job-orchestration/job_orchestration/scheduler/compress/compression_scheduler.py:267-291
Timestamp: 2025-09-15T22:20:40.750Z
Learning: For CLP compression jobs, the team has decided to fail the entire job immediately upon encountering any invalid input path, rather than continuing to process valid paths. This decision was made during PR #1125 development.
Learnt from: Bill-hbrhbr
Repo: y-scope/clp PR: 1122
File: components/core/src/clp/clp/CMakeLists.txt:175-195
Timestamp: 2025-07-23T09:54:45.185Z
Learning: In the CLP project, when reviewing CMakeLists.txt changes that introduce new compression library dependencies (BZip2, LibLZMA, LZ4, ZLIB), the team prefers to address conditional linking improvements in separate PRs rather than expanding the scope of focused migration PRs like the LibArchive task-based installation migration.
📚 Learning: 2025-09-28T15:00:22.170Z
Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 1340
File: components/job-orchestration/job_orchestration/executor/compress/compression_task.py:528-528
Timestamp: 2025-09-28T15:00:22.170Z
Learning: In components/job-orchestration/job_orchestration/executor/compress/compression_task.py, there is a suggestion to refactor from passing logger as a parameter through multiple functions to creating a ClpCompressor class that takes the logger as a class member, with current helper functions becoming private member functions.
Applied to files:
integration-tests/tests/data/jobs/compression/__init__.pyintegration-tests/tests/data/jobs/compression/compression_jobs.pyintegration-tests/tests/utils/clp_job_utils.py
📚 Learning: 2025-01-16T16:58:43.190Z
Learnt from: haiqi96
Repo: y-scope/clp PR: 651
File: components/clp-package-utils/clp_package_utils/scripts/compress.py:0-0
Timestamp: 2025-01-16T16:58:43.190Z
Learning: In the clp-package compression flow, path validation and error handling is performed at the scheduler level rather than in the compress.py script to maintain simplicity and avoid code duplication.
Applied to files:
integration-tests/tests/data/jobs/compression/compression_jobs.py
📚 Learning: 2025-08-20T08:36:38.391Z
Learnt from: Bill-hbrhbr
Repo: y-scope/clp PR: 1100
File: integration-tests/tests/__init__.py:1-1
Timestamp: 2025-08-20T08:36:38.391Z
Learning: In the CLP project, __init__.py files should include short module-level docstrings describing the package purpose, as required by their ruff linting configuration, rather than being left empty.
Applied to files:
integration-tests/tests/data/__init__.pyintegration-tests/tests/data/jobs/presto/__init__.py
📚 Learning: 2025-08-16T10:24:29.316Z
Learnt from: Bill-hbrhbr
Repo: y-scope/clp PR: 1100
File: integration-tests/pyproject.toml:11-13
Timestamp: 2025-08-16T10:24:29.316Z
Learning: In the CLP project's integration-tests directory, the integration tests package is not intended to be shipped or distributed - only the tests are leveraged for local testing purposes, so console script entry points should not be included in pyproject.toml.
Applied to files:
integration-tests/tests/data/__init__.pyintegration-tests/.pytest.iniintegration-tests/tests/utils/clp_job_utils.pyintegration-tests/tests/utils/package_utils.pyintegration-tests/tests/utils/config.py
📚 Learning: 2025-08-17T16:10:38.722Z
Learnt from: Bill-hbrhbr
Repo: y-scope/clp PR: 1100
File: integration-tests/tests/fixtures/integration_test_logs.py:54-56
Timestamp: 2025-08-17T16:10:38.722Z
Learning: For PR #1100 (feat(integration-tests): Add CLP package integration tests boilerplate), do not raise cache weakness problems related to the pytest cache implementation in the integration test logs fixtures.
Applied to files:
integration-tests/tests/data/__init__.pyintegration-tests/.pytest.iniintegration-tests/tests/fixtures/integration_test_logs.pyintegration-tests/tests/utils/asserting_utils.pyintegration-tests/tests/fixtures/package_instance.pyintegration-tests/tests/fixtures/package_config.pyintegration-tests/tests/test_package_start.pyintegration-tests/tests/utils/package_utils.py
📚 Learning: 2025-08-08T21:15:10.905Z
Learnt from: haiqi96
Repo: y-scope/clp PR: 1100
File: integration-tests/tests/test_identity_transformation.py:87-95
Timestamp: 2025-08-08T21:15:10.905Z
Learning: In the CLP project's integration tests (Python code), variable names should use "logs" (plural) rather than "log" (singular) when referring to test logs or log-related entities, as this aligns with the naming conventions used throughout the codebase.
Applied to files:
integration-tests/tests/data/__init__.py
📚 Learning: 2025-07-28T08:33:57.487Z
Learnt from: Bill-hbrhbr
Repo: y-scope/clp PR: 1100
File: integration-tests/tests/test_identity_transformation.py:133-137
Timestamp: 2025-07-28T08:33:57.487Z
Learning: In CLP integration tests, the `run_and_assert` utility function should only be used for commands that test CLP package functionality (like clp/clp-s compression/decompression operations). For processing or helper commands (like jq, sort, etc.), direct `subprocess.run` calls should be used instead, as they serve different purposes and don't need the same error handling patterns.
Applied to files:
integration-tests/tests/data/__init__.pyintegration-tests/tests/utils/clp_job_utils.pyintegration-tests/tests/utils/package_utils.py
📚 Learning: 2025-08-18T05:43:07.868Z
Learnt from: Bill-hbrhbr
Repo: y-scope/clp PR: 1184
File: components/core/cmake/Modules/FindLibLZMA.cmake:21-24
Timestamp: 2025-08-18T05:43:07.868Z
Learning: In the CLP project, all supplied `<lib>_ROOT` variables will live within the `CLP_CORE_DEPS_DIR` as part of their architectural design. This means that using CLP_CORE_DEPS_DIR for library discovery in custom Find modules is the intended approach, and prioritizing individual `<lib>_ROOT` variables over CLP_CORE_DEPS_DIR is not necessary.
Applied to files:
integration-tests/.pytest.ini
📚 Learning: 2025-05-06T10:07:04.654Z
Learnt from: kirkrodrigues
Repo: y-scope/clp PR: 881
File: components/core/tools/scripts/lib_install/macos/install-all.sh:11-12
Timestamp: 2025-05-06T10:07:04.654Z
Learning: In CLP installation scripts, temporary directories with downloaded files should not be automatically cleaned up on failure (e.g., with EXIT traps) to preserve artifacts for debugging purposes.
Applied to files:
integration-tests/tests/fixtures/package_config.py
📚 Learning: 2025-10-27T07:07:37.901Z
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1501
File: tools/deployment/presto-clp/scripts/init.py:10-13
Timestamp: 2025-10-27T07:07:37.901Z
Learning: In `tools/deployment/presto-clp/scripts/init.py`, the `DATABASE_COMPONENT_NAME` and `DATABASE_DEFAULT_PORT` constants are intentionally duplicated from `clp_py_utils.clp_config` because `clp_py_utils` is not installed in the Presto init script's runtime environment. The two flows are separate and this duplication is documented. There are plans to merge these flows after a future release.
Applied to files:
integration-tests/tests/utils/clp_mode_utils.py
📚 Learning: 2025-11-10T05:19:56.600Z
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1575
File: components/clp-py-utils/clp_py_utils/clp_config.py:602-607
Timestamp: 2025-11-10T05:19:56.600Z
Learning: In the y-scope/clp repository, the `ApiServer` class in `components/clp-py-utils/clp_py_utils/clp_config.py` does not need a `transform_for_container()` method because no other containerized service depends on the API server - it's only accessed from the host, so no docker-network communication is expected.
Applied to files:
integration-tests/tests/utils/clp_mode_utils.py
📚 Learning: 2025-08-25T16:27:50.549Z
Learnt from: davemarco
Repo: y-scope/clp PR: 1198
File: components/webui/server/src/plugins/app/Presto.ts:38-43
Timestamp: 2025-08-25T16:27:50.549Z
Learning: In the CLP webui Presto configuration, host and port are set via package settings (configurable), while user, catalog, and schema are set via environment variables (environment-specific). This mixed approach is intentional - settings are typically set by package and some values don't need to be package-configurable.
Applied to files:
integration-tests/tests/utils/clp_mode_utils.py
🧬 Code graph analysis (8)
integration-tests/tests/data/jobs/post_compression/search_jobs.py (1)
integration-tests/tests/utils/config.py (1)
PackagePostCompressionJob(181-203)
integration-tests/tests/data/jobs/compression/compression_jobs.py (1)
integration-tests/tests/utils/config.py (1)
PackageCompressionJob(157-177)
integration-tests/tests/utils/docker_utils.py (1)
integration-tests/tests/utils/utils.py (1)
get_binary_path(13-23)
integration-tests/tests/fixtures/integration_test_logs.py (2)
integration-tests/tests/fixtures/path_configs.py (1)
integration_test_path_config(20-24)integration-tests/tests/utils/config.py (2)
IntegrationTestPathConfig(330-350)IntegrationTestLogs(354-378)
integration-tests/tests/data/jobs/presto/presto_filters.py (1)
integration-tests/tests/utils/config.py (1)
PrestoFilterJob(208-221)
integration-tests/tests/fixtures/package_instance.py (2)
integration-tests/tests/utils/package_utils.py (2)
start_presto_cluster(91-167)stop_presto_cluster(190-217)integration-tests/tests/fixtures/package_config.py (1)
fixt_package_config(30-82)
integration-tests/tests/data/jobs/post_compression/admin_tools_jobs.py (1)
integration-tests/tests/utils/config.py (1)
PackagePostCompressionJob(181-203)
integration-tests/tests/test_package_start.py (4)
integration-tests/tests/utils/asserting_utils.py (3)
validate_presto_running(100-115)validate_running_mode_correct(77-97)validate_package_running(46-74)integration-tests/tests/utils/clp_job_utils.py (1)
dispatch_test_jobs(141-181)integration-tests/tests/utils/config.py (1)
PackageInstance(274-326)integration-tests/tests/fixtures/package_instance.py (1)
fixt_package_instance(23-61)
🪛 LanguageTool
integration-tests/tests/data/logs/text-multifile/logs/apollo-17_day10.txt
[uncategorized] ~4-~4: This expression is usually spelled with a hyphen.
Context: ... navigation sightings compare well with ground based tracking solutions; all primary systems...
(BASED_HYPHEN)
integration-tests/tests/data/logs/text-multifile/logs/apollo-17_day03.txt
[style] ~10-~10: ‘in brief’ might be wordy. Consider a shorter alternative.
Context: ...-17 d03 e009 attitude control jets fire in brief pulses to maintain thermal attitudes; a...
(EN_WORDINESS_PREMIUM_IN_BRIEF)
integration-tests/tests/data/logs/text-singlefile/logs/voyager-1.txt
[uncategorized] ~18-~18: If this is a compound adjective that modifies the following noun, use a hyphen.
Context: ...ay detector records gradual increase in high energy particles as solar influence weakens wi...
(EN_COMPOUND_ADJECTIVE_INTERNAL)
[uncategorized] ~22-~22: If this is a compound adjective that modifies the following noun, use a hyphen.
Context: ...ay detector records gradual increase in high energy particles as solar influence weakens wi...
(EN_COMPOUND_ADJECTIVE_INTERNAL)
[uncategorized] ~26-~26: If this is a compound adjective that modifies the following noun, use a hyphen.
Context: ...ay detector records gradual increase in high energy particles as solar influence weakens wi...
(EN_COMPOUND_ADJECTIVE_INTERNAL)
[uncategorized] ~30-~30: If this is a compound adjective that modifies the following noun, use a hyphen.
Context: ...ay detector records gradual increase in high energy particles as solar influence weakens wi...
(EN_COMPOUND_ADJECTIVE_INTERNAL)
[uncategorized] ~34-~34: If this is a compound adjective that modifies the following noun, use a hyphen.
Context: ...ay detector records gradual increase in high energy particles as solar influence weakens wi...
(EN_COMPOUND_ADJECTIVE_INTERNAL)
[uncategorized] ~38-~38: If this is a compound adjective that modifies the following noun, use a hyphen.
Context: ...ay detector records gradual increase in high energy particles as solar influence weakens wi...
(EN_COMPOUND_ADJECTIVE_INTERNAL)
[uncategorized] ~40-~40: If this is a compound adjective that modifies the following noun, use a hyphen.
Context: ...pdated to Voyager Interstellar Mission; long term science focus shifts to heliospheric bo...
(EN_COMPOUND_ADJECTIVE_INTERNAL)
[uncategorized] ~44-~44: If this is a compound adjective that modifies the following noun, use a hyphen.
Context: ...pdated to Voyager Interstellar Mission; long term science focus shifts to heliospheric bo...
(EN_COMPOUND_ADJECTIVE_INTERNAL)
[uncategorized] ~48-~48: If this is a compound adjective that modifies the following noun, use a hyphen.
Context: ...pdated to Voyager Interstellar Mission; long term science focus shifts to heliospheric bo...
(EN_COMPOUND_ADJECTIVE_INTERNAL)
[uncategorized] ~52-~52: If this is a compound adjective that modifies the following noun, use a hyphen.
Context: ...pdated to Voyager Interstellar Mission; long term science focus shifts to heliospheric bo...
(EN_COMPOUND_ADJECTIVE_INTERNAL)
[misspelling] ~112-~112: This expression is normally spelled as one or with a hyphen.
Context: ...ary crossing signatures continues using multi year combined instrument observations 2013-1...
(EN_COMPOUNDS_MULTI_YEAR)
[misspelling] ~116-~116: This expression is normally spelled as one or with a hyphen.
Context: ...ary crossing signatures continues using multi year combined instrument observations 2015-0...
(EN_COMPOUNDS_MULTI_YEAR)
[misspelling] ~120-~120: This expression is normally spelled as one or with a hyphen.
Context: ...ary crossing signatures continues using multi year combined instrument observations 2016-0...
(EN_COMPOUNDS_MULTI_YEAR)
[misspelling] ~124-~124: This expression is normally spelled as one or with a hyphen.
Context: ...ary crossing signatures continues using multi year combined instrument observations 2017-1...
(EN_COMPOUNDS_MULTI_YEAR)
[misspelling] ~129-~129: This word is normally spelled with a hyphen.
Context: ...e forces selective instrument shutdown; power sharing strategy revised to extend mission life...
(EN_COMPOUNDS_POWER_SHARING)
[misspelling] ~133-~133: This word is normally spelled with a hyphen.
Context: ...e forces selective instrument shutdown; power sharing strategy revised to extend mission life...
(EN_COMPOUNDS_POWER_SHARING)
[misspelling] ~137-~137: This word is normally spelled with a hyphen.
Context: ...e forces selective instrument shutdown; power sharing strategy revised to extend mission life...
(EN_COMPOUNDS_POWER_SHARING)
integration-tests/tests/data/logs/text-multifile/logs/apollo-17_day08.txt
[uncategorized] ~7-~7: If this is a compound adjective that modifies the following noun, use a hyphen.
Context: ...and commemorative plaques documented in high resolution imagery; all primary systems nominal; c...
(EN_COMPOUND_ADJECTIVE_INTERNAL)
[style] ~14-~14: ‘prior to’ might be wordy. Consider a shorter alternative.
Context: ...7 apollo-17 d08 e013 crew rests briefly prior to ascent timeline for launch from Moon; a...
(EN_WORDINESS_PREMIUM_PRIOR_TO)
integration-tests/tests/data/logs/text-multifile/logs/apollo-17_day12.txt
[uncategorized] ~15-~15: If this is a compound adjective that modifies the following noun, use a hyphen.
Context: ...-17 d12 e014 crew reports readiness for high speed entry and ocean splashdown; all primary...
(EN_COMPOUND_ADJECTIVE_INTERNAL)
🪛 markdownlint-cli2 (0.18.1)
integration-tests/tests/data/logs/text-multifile/README.md
4-4: Trailing spaces
Expected: 0 or 2; Actual: 1
(MD009, no-trailing-spaces)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: package-image
- GitHub Check: build (ubuntu-24.04)
- GitHub Check: lint-check (ubuntu-24.04)
- GitHub Check: lint-check (macos-15)
- GitHub Check: build (macos-15)
🔇 Additional comments (52)
integration-tests/tests/data/__init__.py (1)
1-1: Module-level docstring is appropriate. The file correctly includes a short, descriptive docstring identifying the package purpose, which satisfies the ruff linting requirements per project standards.integration-tests/tests/data/logs/json-singlefile-2/logs/new-horizons.jsonl (1)
1-150: Acknowledging test data file as requested.The PR title indicates this should not be reviewed. The JSONL format is valid, timestamps are monotonically increasing, and the schema is consistent across all 150 records. The test data appears suitable for integration testing purposes.
integration-tests/tests/data/ground-truths/filter-select-distinct-postgresql.jsonl (1)
1-3: LGTM!The ground-truth data is valid JSON with appropriate PostgreSQL error severity values for testing filter/select/distinct operations.
integration-tests/tests/data/ground-truths/filter-describe-postgresql.jsonl (1)
1-22: LGTM!The ground-truth data is valid JSON with appropriate PostgreSQL column definitions for testing describe operations.
integration-tests/tests/data/jobs/presto/__init__.py (1)
1-1: LGTM!The module docstring follows the project's linting requirements for package initializers.
integration-tests/tests/data/logs/json-singlefile-2/README.md (1)
1-16: LGTM!The README metadata is consistent and clearly documents the test dataset characteristics.
integration-tests/tests/data/logs/json-singlefile-1/README.md (1)
1-16: LGTM!The README metadata accurately reflects the artemis_ii.jsonl data file, with timestamps matching the first and last entries.
integration-tests/.pytest.ini (1)
14-17: LGTM!The new environment variables provide clear test path configuration for repository, data, jobs, and logs directories.
integration-tests/tests/data/jobs/compression/__init__.py (1)
1-1: LGTM!The module docstring follows the project's linting requirements for package initialisers.
integration-tests/tests/data/logs/json-multifile/logs/sts-135-2011-07-09.jsonl (1)
1-8: LGTM!The JSONL data has consistent structure with properly ordered timestamps and sequential line indices.
integration-tests/tests/data/logs/text-singlefile/README.md (2)
3-4: Verify the log type - inconsistency between README documentation and directory naming.The README states the type is "JSON", but the directory is named "text-singlefile", which suggests the logs should be text-based rather than JSON format. Inspect the actual log files in this directory to determine the correct type and update the README or directory name accordingly.
6-7: Verify the file count in the text-singlefile directory.The README states "Number of files: 5", but this needs verification against the actual file count in integration-tests/tests/data/logs/text-singlefile/. Check the directory to confirm the actual number of files matches the documented value.
integration-tests/tests/data/logs/json-singlefile-1/logs/artemis_ii.jsonl (1)
1-150: Verify suffix inconsistency pattern and testing intent.The analysis confirms 35 of 150 entries contain the suffix "This log documents Artemis II integrated systems performance during the crewed lunar test mission." while 115 do not, creating an irregular mixed-format pattern. However, line 128 was incorrectly listed in the review as containing the suffix when it does not. Determine whether this mixed formatting is intentional for testing log parsing robustness or requires standardization to a consistent format.
integration-tests/tests/data/logs/json-multifile/logs/sts-135-2011-07-08.jsonl (1)
1-8: Test data is well-structured and properly formatted.The JSONL file contains properly formatted JSON objects with consistent fields and sequential timestamps appropriate for testing CLP package functionality.
integration-tests/tests/data/logs/json-multifile/logs/sts-135-2011-07-19.jsonl (1)
1-8: Test data is well-structured and properly formatted.The JSONL file contains properly formatted JSON objects representing STS-135 mission day 11 events with consistent structure and sequential timestamps.
integration-tests/tests/data/logs/json-multifile/logs/sts-135-2011-07-11.jsonl (1)
1-8: Test data is well-structured and properly formatted.The JSONL file contains properly formatted JSON objects representing STS-135 mission day 3 events with consistent structure and sequential timestamps.
integration-tests/tests/fixtures/integration_test_logs.py (1)
46-58: Fixture implementation follows established patterns.The
spark_event_logsfixture consistently follows the same pattern as existing fixtures (hive_24hr,postgresql), using the shared_download_and_extract_datasethelper function with appropriate parameters.integration-tests/tests/data/logs/text-singlefile/logs/voyager-1.txt (1)
1-150: Test data is properly formatted for integration testing.The text file contains well-structured Voyager-1 mission logs with consistent formatting (ISO timestamps, spacecraft identifier, event type, and descriptions). The static analysis hints regarding hyphenation are stylistic suggestions for the log message content and are not applicable to test data.
integration-tests/tests/data/jobs/post_compression/__init__.py (1)
1-1: Package initializer is appropriate.The file properly establishes the
post_compressionpackage namespace with a descriptive docstring.integration-tests/tests/data/jobs/presto/presto_filters.py (1)
1-5: Scaffolding is properly structured.The module provides a well-typed, empty dictionary for Presto filter job definitions, following good patterns for test job configuration infrastructure.
integration-tests/tests/data/logs/json-multifile/README.md (1)
1-16: Documentation clearly describes the test dataset.The README provides accurate metadata about the json-multifile logs, including type, file count, events per file, and timestamp range, which aligns with the actual JSONL files added in this PR.
integration-tests/tests/conftest.py (1)
3-35: Pytest options and BASE_PORT ini wiring look coherentThe added ini key and CLI options match downstream usage (
NO_JOBS,JOB_NAME_CONTAINS,BASE_PORT) and provide clear help text; configuration and typing look sound.integration-tests/tests/utils/docker_utils.py (1)
3-3: New Docker helper correctly narrows containers by name prefixThe combination of
docker ps --filter name=...with are.fullmatch(re.escape(prefix) + r"\d+")check gives a preciseprefix + digitsmatch while avoiding regex pitfalls; error handling is consistent with the existing compose helper.Also applies to: 39-64
integration-tests/tests/test_package_start.py (1)
7-9: End-to-end package, mode, and job validation flow reads solidValidating package components, then mode correctness, then conditionally checking Presto for
clp-json-presto, and finally dispatching jobs only whenpackage_job_listis present aligns cleanly with the fixture/job list design and avoids 6961 unnecessary work when--no-jobsis used.Also applies to: 10-12, 14-16, 25-44
integration-tests/tests/data/logs/json-multifile/logs/sts-135-2011-07-21.jsonl (1)
1-8: JSONL log entries are well-formed and consistentAll records share the expected schema, timestamps are strictly increasing, and
line_indexvalues are unique and ordered, which should work well for multifile log test scenarios.integration-tests/tests/data/jobs/post_compression/search_jobs.py (1)
1-5: Search job registry skeleton matches existing job-modelling patternDefining
PACKAGE_SEARCH_JOBS: dict[str, PackagePostCompressionJob] = {}mirrors the other job registries and gives a clear, type-safe place to add search jobs later without affecting current test behaviour.integration-tests/tests/utils/utils.py (1)
8-11: YAML loading helper is robust and consistent with its contract
load_yaml_to_dictcleanly distinguishes parse/IO failures (ValueError) from structural issues (TypeError on non-mapping), and the typing/return contract (dict[str, Any]) matches the enforced top-level mapping requirement.Also applies to: 69-93
integration-tests/tests/data/jobs/post_compression/admin_tools_jobs.py (1)
1-5: Admin-tools job registry skeleton is aligned with the job modelThe
PACKAGE_ADMIN_TOOLS_JOBS: dict[str, PackagePostCompressionJob] = {}definition follows the same pattern as other job registries and safely acts as a placeholder until specific admin-tools jobs are added.integration-tests/tests/utils/asserting_utils.py (2)
9-21: LGTM!The new imports are well-organized and correctly support the validation utilities for mode verification and Presto container checks.
77-97: LGTM!The
validate_running_mode_correctfunction correctly validates the shared config against the intended configuration by comparing mode signatures. The error handling forValidationErroris appropriate.integration-tests/tests/data/jobs/compression/compression_jobs.py (1)
1-38: LGTM!The compression job definitions are well-structured and follow the
PackageCompressionJobdataclass schema. The dictionary keys appropriately match thejob_namefields, which aids maintainability and lookup consistency.integration-tests/tests/fixtures/package_config.py (4)
3-26: LGTM!The imports are well-organized and the logger setup follows the standard pattern for the test infrastructure.
45-54: LGTM!Good defensive error handling for the
BASE_PORTconfiguration value. The explicit error message clearly indicates the expected type.
59-70: LGTM!The job list construction correctly respects the
NO_JOBSandJOB_NAME_CONTAINSoptions, allowing flexible test filtering.
75-82: LGTM!The enhanced teardown properly cleans up both the temporary config file and the generated data/tmp directories, ensuring test isolation.
integration-tests/tests/utils/clp_mode_utils.py (4)
44-56: LGTM!The new
clp-json-prestomode configuration correctly sets up the Presto query engine withCLP_Sstorage, disables results cache retention (appropriate for Presto-based queries), and configures the Presto coordinator.
97-114: LGTM!The
compute_mode_signaturefunction provides a clean way to compare mode-defining aspects of configurations. The tuple captures the essential fields that determine operating mode behaviour.
139-145: LGTM!The refactored initialization correctly builds the component list by extending from
CLP_BASE_COMPONENTSand conditionally adding query components and the reducer forFULLdeployment type.
91-94: Consider using a constant for the Presto worker component name.
PRESTO_COORDINATOR_COMPONENT_NAMEis imported fromclp_py_utils.clp_config, but"presto-worker"is hardcoded. Verify whether aPRESTO_WORKER_COMPONENT_NAMEconstant exists inclp_py_utils.clp_configand use it for consistency with the coordinator component name.integration-tests/tests/utils/port_utils.py (6)
9-18: LGTM!Constants are well-defined with clear naming. The valid port range correctly excludes privileged ports (0-1023) and includes the full range up to 65535.
21-33: LGTM!The
PortAssignmentdataclass is well-structured with clear documentation. UsingAnyforcomponentis appropriate given the dynamic nature of CLP config components.
36-58: LGTM!The
assign_ports_from_basefunction correctly discovers port requirements, validates the range, checks availability, and assigns ports sequentially. The logic is sound and well-organized.
61-76: LGTM!Port availability checking logic is correct. The error message is informative and actionable.
128-141: Socket resource handling is correct.The context manager properly ensures the socket is closed. Catching
OSErrorcovers both address-in-use and other binding failures appropriately.
144-163: LGTM!Bounds validation logic is correct. The error message clearly indicates the issue and provides both the required and valid ranges.
integration-tests/tests/utils/package_utils.py (2)
258-263: Placeholder assertions should be tracked.The
assert Truestatements with TODO comments are placeholders that effectively skip validation. While the TODOs reference specific PRs/issues, these could easily be forgotten.Consider adding
pytest.skip()with a reason instead, which makes the skip visible in test reports:if compression_job.mode in ("clp-json", "clp-json-presto"): - # TODO: Waiting for PR 1299 to be merged. - assert True + pytest.skip("Decompression validation pending PR #1299") elif compression_job.mode == "clp-text": - # TODO: As of the morning of Dec. 10, clp-text decompression is broken. - assert True + pytest.skip("clp-text decompression is currently broken")
91-167: Good orchestration of Presto cluster startup.The function correctly validates paths, generates split filters from compression jobs, writes configuration, and starts the Docker Compose cluster.
integration-tests/tests/utils/config.py (5)
156-178: LGTM!
PackageCompressionJobdataclass is well-structured with clear field documentation. The frozen=True ensures immutability which is appropriate for job definitions.
180-204: LGTM!
PackagePostCompressionJobcorrectly models post-compression dependencies with a reference to the parentPackageCompressionJob. Therequires_archive_iddefault is sensible.
206-231: LGTM!
PrestoFilterJobandPackageJobListdataclasses are well-defined. The comment explaining why Presto filters have their own class is helpful.
130-153: LGTM!New script path properties follow the established pattern and provide convenient access to package scripts. The paths are consistent with the CLP package structure.
283-302: Good addition ofshared_config_file_pathvalidation.The new field is properly initialized and validated during
__post_init__, ensuring the shared config file exists before the instance is considered valid.
integration-tests/tests/data/logs/text-multifile/logs/apollo-17_day10.txt
Show resolved
Hide resolved
| # Description of logs | ||
|
|
||
| **Type** | ||
| Unstructured text |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove trailing whitespace from line 4.
The line ends with a single trailing space, which violates markdown formatting standards.
Apply this diff to fix the trailing whitespace:
**Type**
-Unstructured text
+Unstructured text📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| Unstructured text | |
| **Type** | |
| Unstructured text |
🧰 Tools
🪛 markdownlint-cli2 (0.18.1)
4-4: Trailing spaces
Expected: 0 or 2; Actual: 1
(MD009, no-trailing-spaces)
🤖 Prompt for AI Agents
integration-tests/tests/data/logs/text-multifile/README.md around line 4: the
line "Unstructured text " contains a single trailing space; remove the trailing
whitespace at the end of line 4 so the line ends with "Unstructured text" (no
extra spaces) and save the file.
| @@ -1,5 +1,6 @@ | |||
| """Fixtures that start and stop CLP package instances for integration tests.""" | |||
|
|
|||
| import logging | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick | 🔵 Trivial
Fixture behaviour for NO_JOBS and Presto looks correct, with a small opportunity to preserve error detail
The skip condition when package_job_list is None and not no_jobs matches how the job list is built and avoids spinning up instances for modes with no matching jobs, and the Presto start/stop tied to mode_name == "clp-json-presto" in try/finally ensures cleanup. One optional improvement would be to catch RuntimeError as err and include err in the pytest.fail message (or log it) so callers keep the original failure context alongside the helpful BASE_PORT hint.
Also applies to: 13-17, 19-19, 23-26, 31-31, 35-40, 46-60
🤖 Prompt for AI Agents
In integration-tests/tests/fixtures/package_instance.py around lines 3, 13-17,
19, 23-26, 31, 35-40 and 46-60, the tests currently call pytest.fail (or
otherwise re-raise) on RuntimeError without preserving the original exception
details; update each try/except that catches RuntimeError to use "except
RuntimeError as err" and include the err in the failure message (e.g.,
pytest.fail(f"...: {err}") or log the err before failing) so the original
exception context is preserved alongside the existing BASE_PORT hint and cleanup
logic.
| def validate_presto_running() -> None: | ||
| """ | ||
| Validate that a Presto cluster is running. | ||
| :param package_instance: | ||
| """ | ||
| required_components = CLP_PRESTO_COMPONENTS | ||
|
|
||
| for component in required_components: | ||
| prefix = f"presto-clp-{component}-" | ||
| running_matches = list_running_containers_with_prefix(prefix) | ||
| if len(running_matches) == 0: | ||
| pytest.fail( | ||
| f"No running container found for component '{component}' " | ||
| f"(expected name prefix '{prefix}')." | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove stale docstring parameter.
The docstring references :param package_instance: but the function takes no parameters.
def validate_presto_running() -> None:
"""
Validate that a Presto cluster is running.
-
- :param package_instance:
"""
required_components = CLP_PRESTO_COMPONENTS🤖 Prompt for AI Agents
In integration-tests/tests/utils/asserting_utils.py around lines 100 to 115, the
function validate_presto_running() has a stale docstring parameter reference
":param package_instance:" that does not match the function signature; remove
that incorrect param line from the docstring and, if desired, replace it with a
brief description of the function or relevant params (none in this case) so the
docstring accurately reflects the function signature.
| str(package_config.path_config.clp_package_dir), | ||
| ] | ||
| # fmt: on | ||
| subprocess.run(setup_presto_cmd, check=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick | 🔵 Trivial
Consider adding timeouts to subprocess calls.
The subprocess.run calls for Presto setup, start, and stop don't have timeouts. If the Docker daemon is unresponsive or the commands hang, the test will block indefinitely.
- subprocess.run(setup_presto_cmd, check=True)
+ subprocess.run(setup_presto_cmd, check=True, timeout=DEFAULT_CMD_TIMEOUT_SECONDS)Apply similarly to lines 167 and 202.
Also applies to: 167-167, 202-202
🤖 Prompt for AI Agents
In integration-tests/tests/utils/package_utils.py around lines 112, 167, and
202, the subprocess.run invocations for Presto setup/start/stop lack timeouts
and can hang indefinitely; add a reasonable timeout argument (e.g., timeout=30
or configurable via a constant/ENV var) to each subprocess.run call and wrap
them in try/except to catch subprocess.TimeoutExpired to perform cleanup or
raise a clear exception; ensure any resources (docker containers/processes) are
cleaned up or the test fails fast when a timeout occurs.
| if package_compression_jobs is not None: | ||
| for package_compression_job in package_compression_jobs: | ||
| flags = package_compression_job.flags | ||
| if flags is not None: | ||
| dataset_name = flags["--dataset"] | ||
| timestamp_key = flags["--timestamp-key"] | ||
|
|
||
| # Skip jobs that do not specify both dataset_name and timestamp_key. | ||
| if dataset_name is None or timestamp_key is None: | ||
| continue | ||
|
|
||
| dataset_timestamp_dict[dataset_name] = timestamp_key |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Potential UnboundLocalError if flags is None.
When flags is None, the variables dataset_name and timestamp_key are never assigned, but the code at lines 130-131 still checks and uses them. This will raise an UnboundLocalError.
Apply this diff to fix the scoping issue:
if package_compression_jobs is not None:
for package_compression_job in package_compression_jobs:
flags = package_compression_job.flags
- if flags is not None:
- dataset_name = flags["--dataset"]
- timestamp_key = flags["--timestamp-key"]
-
- # Skip jobs that do not specify both dataset_name and timestamp_key.
- if dataset_name is None or timestamp_key is None:
+ if flags is None:
+ continue
+
+ dataset_name = flags.get("--dataset")
+ timestamp_key = flags.get("--timestamp-key")
+
+ # Skip jobs that do not specify both dataset_name and timestamp_key.
+ if dataset_name is None or timestamp_key is None:
continue
dataset_timestamp_dict[dataset_name] = timestamp_keyThis also uses .get() to avoid KeyError if the flags exist but don't contain the expected keys.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if package_compression_jobs is not None: | |
| for package_compression_job in package_compression_jobs: | |
| flags = package_compression_job.flags | |
| if flags is not None: | |
| dataset_name = flags["--dataset"] | |
| timestamp_key = flags["--timestamp-key"] | |
| # Skip jobs that do not specify both dataset_name and timestamp_key. | |
| if dataset_name is None or timestamp_key is None: | |
| continue | |
| dataset_timestamp_dict[dataset_name] = timestamp_key | |
| if package_compression_jobs is not None: | |
| for package_compression_job in package_compression_jobs: | |
| flags = package_compression_job.flags | |
| if flags is None: | |
| continue | |
| dataset_name = flags.get("--dataset") | |
| timestamp_key = flags.get("--timestamp-key") | |
| # Skip jobs that do not specify both dataset_name and timestamp_key. | |
| if dataset_name is None or timestamp_key is None: | |
| continue | |
| dataset_timestamp_dict[dataset_name] = timestamp_key |
🤖 Prompt for AI Agents
In integration-tests/tests/utils/package_utils.py around lines 122 to 133, the
code reads flags = package_compression_job.flags but only assigns dataset_name
and timestamp_key when flags is not None, yet later checks those variables
unconditionally which can raise UnboundLocalError; fix by initializing
dataset_name and timestamp_key to None before the flags check, then if flags is
not None assign dataset_name = flags.get("--dataset") and timestamp_key =
flags.get("--timestamp-key") and keep the existing skip/continue logic so jobs
missing flags or keys are ignored.
| def _compare_output_with_ground_truth(output: str, ground_truth_file_path: Path) -> bool: | ||
| with tempfile.TemporaryDirectory() as temporary_output_directory_name: | ||
| # Write output to tempfile. | ||
| output_file_path = Path(temporary_output_directory_name) / "output.tmp" | ||
| output_file_path.write_text(output) | ||
| 6961 |
|
|
| # Compare tempfile with ground truth. | ||
| return is_dir_tree_content_equal(output_file_path, ground_truth_file_path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Search for the is_dir_tree_content_equal function definition
rg -n "def is_dir_tree_content_equal" --type py -A 20Repository: y-scope/clp
Length of output: 166
🌐 Web query:
is_dir_tree_content_equal function Python file directory comparison
💡 Result:
Here’s a concise, robust implementation to compare two directory trees by names and file contents (recursively), using Python’s filecmp utilities:
Code:
import os
import filecmp
def is_dir_tree_content_equal(dir1: str, dir2: str) -> bool:
"""Return True if dir trees rooted at dir1 and dir2 have the same
filenames, subdirectories and file contents (recursively)."""
dcmp = filecmp.dircmp(dir1, dir2)
if dcmp.left_only or dcmp.right_only or dcmp.funny_files:
return False
# compare common files by content
match, mismatch, errors = filecmp.cmpfiles(dir1, dir2, dcmp.common_files, shallow=False)
if mismatch or errors:
return False
# recurse into common subdirectories
for sub in dcmp.common_dirs:
if not is_dir_tree_content_equal(os.path.join(dir1, sub), os.path.join(dir2, sub)):
return False
return True
Notes:
- Uses shallow=False so file contents are compared (not just stat). See Python’s filecmp docs. [1]
- A similar recursive approach is shown and discussed on Stack Overflow. [2]
Sources:
[1] Python filecmp documentation.
[2] Stack Overflow — recursive directory comparison example.
Function compares single file, but uses directory comparison utility.
_compare_output_with_ground_truth writes output to a single file but uses is_dir_tree_content_equal for comparison. The underlying utility is designed for directory tree comparison using filecmp.dircmp(), which expects directory paths. Comparing a single file path with this function is semantically incorrect and may fail at runtime. Consider either wrapping the file comparison in a directory structure or using a dedicated file comparison function.
🤖 Prompt for AI Agents
In integration-tests/tests/utils/package_utils.py around lines 344 to 351, the
helper writes the output to a single temp file but then calls
is_dir_tree_content_equal (a directory tree comparator), which is semantically
wrong and can fail; change the comparison to use a proper file comparison
instead (e.g., filecmp.cmp or pathlib.read_text() equality) or, if you want to
keep using the directory comparator, create a temporary directory mirroring the
ground-truth layout (place the temp file under the same relative name) before
calling is_dir_tree_content_equal; update the function to perform one of these
fixes and return the boolean result of the correct file-or-directory comparison.
| def run_presto_filter( | ||
| presto_filter_job: PrestoFilterJob, | ||
| ) -> None: | ||
| """ | ||
| Run a Presto filter. | ||
| :param request: | ||
| :param presto_filter: | ||
| :param package_instance: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Docstring parameters don't match function signature.
The docstring references request, presto_filter, and package_instance parameters, but the function only accepts presto_filter_job.
def run_presto_filter(
presto_filter_job: PrestoFilterJob,
) -> None:
"""
Run a Presto filter.
- :param request:
- :param presto_filter:
- :param package_instance:
+ :param presto_filter_job: The Presto filter job to execute.
"""🤖 Prompt for AI Agents
In integration-tests/tests/utils/package_utils.py around lines 354 to 362, the
function run_presto_filter(presto_filter_job) has a docstring listing parameters
request, presto_filter, and package_instance that do not match the actual
signature; update the docstring to document the single parameter
presto_filter_job (and its type/purpose) and remove or rename the incorrect
parameter entries so the docstring accurately reflects the function signature
and expected argument.
Description
Checklist
breaking change.
Validation performed
Summary by CodeRabbit
New Features
Tests
✏️ Tip: You can customize this high-level summary in your review settings.