-
Notifications
You must be signed in to change notification settings - Fork 171
feat(cli): add analyze command and LLM context features #254
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Replace direct console.log() and chalk imports across CLI commands with centralized output utilities (output(), log(), getChalk()) from output.ts. This ensures --no-color, --debug, and --quiet flags are consistently respected by all commands. Files updated: - stats, dag, lint, run, open, inspect, diff, convert commands - output-renderer.ts, utils/metrics.ts - cli.ts (outputError callback) Added tests verifying --no-color and --quiet behavior in stats, dag, and lint commands.
- Extend `run` command to accept .ipynb, .py (percent/marimo), and .qmd files with automatic conversion to DeepnoteFile before execution - Add `--open` flag to `run` command to open project in Deepnote Cloud after successful execution - Add `--open` flag to `convert` command to open converted .deepnote file in Deepnote Cloud New utilities: - format-converter.ts: resolves and converts any supported format - open-in-cloud.ts: reusable upload and browser open logic Includes 31 new tests covering format conversion, multi-format run, and integration tests for .ipynb, .py, and .qmd execution.
- Add shared analysis utilities (analysis.ts) extracting reusable functions from lint.ts and stats.ts for stats, lint, and DAG analysis - Add new `analyze` command for comprehensive project analysis including quality score, structure info, dependency analysis, and suggestions - Add `--context` flag to run command that includes analysis info (stats, lint issues, variable definitions/usages) in output - Add auto-diagnosis on failure: when JSON/TOON output is enabled and execution fails, automatically include upstream dependencies, related lint issues, and used variables to help LLMs understand failures - Refactor lint.ts and stats.ts to use shared analysis utilities, reducing code duplication by ~550 lines
📝 WalkthroughWalkthroughAdds an analyze CLI command plus a centralized analysis utilities module that computes project stats, lint/DAG issues, failure diagnosis, and suggestions. Refactors lint and stats commands to delegate to the new analysis utilities. Extends run to optionally attach analysis context, auto-diagnose failed blocks, and persist execution snapshots. Adds output-persistence utilities and exports for input block types and SQL env helpers. Adds extensive tests for analysis, persistence, completions, and the new analyze command. Sequence Diagram(s)sequenceDiagram
participant User as User/CLI
participant CLI as CLI Command (analyze)
participant FS as File System
participant Util as Analysis Utilities
participant Output as Output Formatter
User->>CLI: analyze <path> --output json
CLI->>FS: read .deepnote file
FS-->>CLI: file content
CLI->>Util: analyzeProject(file, options)
Util->>Util: computeProjectStats
Util->>Util: checkForIssues (lint + DAG)
Util->>Util: buildBlockMap
Util-->>CLI: AnalysisResult
CLI->>CLI: buildAnalyzeResult (score, structure, suggestions)
CLI->>Output: outputAnalysis(result, format)
Output-->>User: formatted report
sequenceDiagram
participant User as User/CLI
participant CLI as CLI Command (run)
participant FS as File System
participant Executor as Project Executor
participant Util as Analysis Utilities
participant Persist as Snapshot Persistence
User->>CLI: run <path> --context
CLI->>FS: read .deepnote file
FS-->>CLI: file content
CLI->>Executor: executeProject(file)
Executor-->>CLI: execution results (block outputs)
alt context enabled or failures
CLI->>Util: analyzeProject(file)
Util-->>CLI: analysis + DAG
CLI->>Util: diagnoseBlockFailure for failed blocks
Util-->>CLI: failure diagnoses
CLI->>CLI: attach context to RunResult
end
CLI->>Persist: saveExecutionSnapshot(sourcePath, file, outputs, timing)
Persist->>FS: write YAML snapshot
FS-->>Persist: snapshot saved
Persist-->>CLI: snapshot path
CLI-->>User: run result (with optional context & snapshot info)
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Comment |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #254 +/- ##
==========================================
- Coverage 92.82% 91.98% -0.84%
==========================================
Files 81 84 +3
Lines 4808 5054 +246
Branches 1303 1366 +63
==========================================
+ Hits 4463 4649 +186
- Misses 345 405 +60 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 5
🤖 Fix all issues with AI agents
In `@packages/cli/src/commands/analyze.ts`:
- Around line 331-339: The dependencies block is currently only rendered when
result.dependencies.imports.length > 0, which hides
result.dependencies.missingIntegrations when imports are empty; change the guard
to render the Dependencies section if either result.dependencies.imports.length
> 0 or result.dependencies.missingIntegrations.length > 0, and inside still
print the Imports line only when imports exist and the Missing line only when
missingIntegrations exist (using the same output(...) calls and c.red/c.bold
formatting as before).
In `@packages/cli/src/commands/run.ts`:
- Around line 104-123: The BlockWithContext interface duplicates fields already
defined on BlockResult; update BlockWithContext to extend BlockResult (e.g.,
"interface BlockWithContext extends BlockResult") and remove the duplicated
properties (id, type, label, success, durationMs, outputs) so only the
additional/contextual fields (error?, defines?, uses?, issues?) remain; ensure
the type name BlockResult is imported or in scope where BlockWithContext is
declared.
- Around line 826-844: The mapping over blockResults currently does O(n²) work
because dag.nodes.find(...) and lint.issues.filter(...) are called for every
block; pre-build lookup maps (e.g., nodeById from dag.nodes keyed by id, and
issuesByBlockId grouping lint.issues by blockId) before computing
blocksWithContext, then replace dag.nodes.find(...) with nodeById.get(block.id)
and lint.issues.filter(...) with issuesByBlockId.get(block.id) to produce
defines, uses, and issues (map to {code,message,severity} or undefined) in the
BlockWithContext assembly and assign to result.blocks.
In `@packages/cli/src/utils/analysis.test.ts`:
- Around line 117-130: The test "detects missing integrations" relies on
process.env keys (e.g., SQL_MY_DATABASE / SQL_MY_DB) and can be flaky in CI;
before calling checkForIssues in that test (and the similar one at lines
239-256) capture and clear those env vars, run the test logic (createTestFile +
checkForIssues + assertions), then restore the original env values afterward;
locate the test by its description string and the use of createTestFile and
checkForIssues to add a small save/clear/restore around process.env to make the
test deterministic.
In `@packages/cli/src/utils/analysis.ts`:
- Around line 341-367: The DFS currently returns early which leaves
recursionStack populated and skips other neighbors; modify dfs(nodeId, path) so
it does not return immediately on finding a cycle or when a recursive call
reports one—instead use a local foundCycle boolean, set foundCycle = true when
either dfs(neighbor, ...) returns true or when detecting
recursionStack.has(neighbor), continue iterating remaining neighbors, and only
after the loop call recursionStack.delete(nodeId) and then return foundCycle;
preserve the existing logic that adds nodes to cycleBlocks (use path and
neighbor) but ensure recursionStack cleanup always runs before returning.
- Add vi.unstubAllGlobals() to afterEach in run.test.ts to prevent test leaks - Add debug logging for temp directory cleanup failures in run.ts - Add error handling for parseQuartoFormat, parseMarimoFormat, and parsePercentFormat in format-converter.ts with user-friendly error messages
Add test that verifies openDeepnoteInCloud is called when converting to .deepnote format with --open flag.
- Reset process.exitCode in multi-format integration tests to prevent exit code leakage between test cases - Add error handling for .deepnote file parsing in format-converter, wrapping read/decode/deserialize in try/catch and throwing FileResolutionError for consistency with other formats - Refactor convert.test.ts to use existing simple.deepnote fixture instead of duplicated inline YAML
- Save/restore extglob state when using extended glob patterns in bash completion for run and convert commands - Add extglob and shopt to cspell dictionary - Rename openDeepnoteInCloud to openDeepnoteFileInCloud for clarity - Rename open-in-cloud.ts to open-file-in-cloud.ts
Resolved conflicts in --context flag for run command.
After every deepnote run, outputs are now automatically saved to a
snapshot file at snapshots/{slug}_{projectId}_latest.snapshot.deepnote.
This makes execution outputs persistent and available for later
inspection, which is useful for LLM use cases where outputs need
to be analyzed from previous runs.
- Add output-persistence.ts with mergeOutputsIntoFile and saveExecutionSnapshot
- Update run.ts to save snapshot after every execution
- Snapshot saving is best-effort (errors logged but dont fail run)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In `@packages/cli/src/utils/output-persistence.test.ts`:
- Around line 159-189: The test expectations hardcode POSIX-style paths which
fail on Windows; update the three tests in this block to wrap expected snapshot
paths with path.resolve (or use resolve imported from 'path') so they match the
getSnapshotPath output on all platforms; locate the tests referencing
getSnapshotPath and createTestFile and replace the hardcoded strings in the
expect(...).toBe(...) assertions with resolve('<same path string>') (or build
the expected path via path.join) to normalize separators across OSes.
In `@packages/cli/src/utils/output-persistence.ts`:
- Around line 58-68: The merge currently spreads the executionCount only when
result.executionCount is non-null, which leaves an existing block.executionCount
intact when the new result omits it; update the merge in the notebook.blocks.map
handler (the block mapping that uses outputsByBlockId.get) to explicitly remove
the stale executionCount when the new result has no executionCount: create a
merged object like const merged = { ...block, outputs: result.outputs }; then if
(result.executionCount != null) set merged.executionCount =
result.executionCount; else delete merged.executionCount; and return merged —
this ensures old executionCount is cleared when the current result omits it.
- Show missing integrations in analyze output even when imports are empty - Have BlockWithContext extend BlockResult to avoid field duplication - Pre-build lookup maps in run.ts for O(n) block context mapping - Isolate env vars in analysis tests for CI determinism - Fix DFS stack cleanup to avoid false circular dependency errors - Use resolve() in tests for cross-platform path handling - Clear stale executionCount when merging outputs into blocks
The example snapshots should be tracked in git as they serve as reference outputs for the example notebooks.
| @@ -0,0 +1,2 @@ | |||
| # Ignore snapshot directories created by tests | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dinohamzic we should probably undo this, commit the snaphots and also use them for tests in ci. But this is out of scope for this PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@packages/cli/src/utils/analysis.ts`:
- Around line 696-732: The current triple-quote detection only checks
trimmed.startsWith('"""')/("'''") and misses triple-quoted strings that begin
mid-line; update the logic in the code-block handling (variables: lines,
inMultilineString, multilineDelimiter, trimmed, loc) to detect delimiters
anywhere in the line (use indexOf/includes) and handle opening and closing on
the same line: when not inMultilineString, if a delimiter appears anywhere,
treat it as the start unless the same delimiter also appears later on the same
line (in which case it’s an inline triple-quoted string and should not flip
inMultilineString but still be ignored for LOC); when inMultilineString, check
for the delimiter anywhere to close it and resume counting; ensure
multilineDelimiter is set to the found delimiter ('"""' or "'''") so closing
matches.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
packages/cli/src/commands/run.ts (1)
483-489: DuplicategetIntegrationEnvVarName.Same function in
analysis.ts:509-514. Import from there or extract to shared module.
🤖 Fix all issues with AI agents
In `@packages/cli/src/commands/analyze.ts`:
- Around line 164-222: The current calculateLongestChain uses an array and
queue.shift() (O(n)) for BFS; replace it with an O(1) queue implementation —
e.g., keep the same queue array but maintain a head index (increment head
instead of shift) or swap to a deque implementation — and update all places that
read from queue (where queue.shift() is used) to r
4D1C
ead queue[head++] and adjust
emptiness checks to head < queue.length; ensure any places that push use
queue.push(...) and that memory is reclaimed (slice/replace array when head
grows large) to avoid leaks.
In `@packages/cli/src/utils/analysis.ts`:
- Around line 509-514: The getIntegrationEnvVarName function is duplicated;
extract it into a single shared utility and have both modules import/re-export
it: create or add to a common utils module (e.g., export function
getIntegrationEnvVarName) and replace the duplicate implementations in
analysis.ts (function getIntegrationEnvVarName) and run.ts with an import of
that shared symbol, or alternatively export it from one file and import it from
the other to remove duplication; ensure the exported function name and behavior
remain identical and update any imports where getIntegrationEnvVarName is used.
| const { absolutePath } = await resolvePathToDeepnoteFile(path) | ||
|
|
||
| debug('Reading file contents...') | ||
| const rawBytes = await fs.readFile(absolutePath) | ||
| const yamlContent = decodeUtf8NoBom(rawBytes) | ||
|
|
||
| debug('Parsing .deepnote file...') | ||
| const deepnoteFile = deserializeDeepnoteFile(yamlContent) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no action: I'm just making a note to myself to refactor things like this (also eg. error handling, --output flag shared logic, etc.) to single function invocations, as we are duplicating similar logic in multiple commands
I'll dedicate a bit of time to this after this fast PRs cycle slows down a bit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"I'll dedicate a bit of time to this after this fast PRs cycle slows down a bit" I hope it won't slow down :)
| /** | ||
| * Count lines of code in a block's content. | ||
| */ | ||
| function countLinesOfCode(block: DeepnoteBlock): number { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should be able to easily extend ast-analyzer.py (in the reactivity package) to return LOC as well, instead of having to manually compute them like this.
packages/cli/src/utils/analysis.ts
Outdated
| /** | ||
| * Extract imported module names from a code block. | ||
| */ | ||
| function extractImports(block: DeepnoteBlock): string[] { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also this, we should see if we can get this from the reactivity lib.
| export async function saveExecutionSnapshot( | ||
| sourcePath: string, | ||
| file: DeepnoteFile, | ||
| blockOutputs: BlockExecutionOutput[], | ||
| timing: ExecutionTiming | ||
| ): Promise<SaveSnapshotResult> { | ||
| // Merge outputs into the file | ||
| const fileWithOutputs = mergeOutputsIntoFile(file, blockOutputs, timing) | ||
|
|
||
| // Split into source and snapshot (we only need the snapshot) | ||
| const { snapshot } = splitDeepnoteFile(fileWithOutputs) | ||
|
|
||
| // Determine snapshot path | ||
| const snapshotDir = getSnapshotDir(sourcePath) | ||
| const slug = slugifyProjectName(file.project.name) || 'project' | ||
| const snapshotFilename = generateSnapshotFilename(slug, file.project.id, 'latest') | ||
| const snapshotPath = resolve(snapshotDir, snapshotFilename) | ||
|
|
||
| // Create snapshot directory if it doesn't exist | ||
| await fs.mkdir(snapshotDir, { recursive: true }) | ||
|
|
||
| // Serialize and write snapshot | ||
| const snapshotYaml = stringify(snapshot) | ||
| await fs.writeFile(snapshotPath, snapshotYaml, 'utf-8') | ||
|
|
||
| debug(`Saved execution snapshot to: ${snapshotPath}`) | ||
|
|
||
| return { snapshotPath } | ||
| } | ||
|
|
||
| /** | ||
| * Gets the path where a snapshot would be saved for a given source file. | ||
| * | ||
| * @param sourcePath - Path to the source file | ||
| * @param file - The DeepnoteFile | ||
| * @returns The snapshot file path | ||
| */ | ||
| export function getSnapshotPath(sourcePath: string, file: DeepnoteFile): string { | ||
| const snapshotDir = getSnapshotDir(sourcePath) | ||
| const slug = slugifyProjectName(file.project.name) || 'project' | ||
| const snapshotFilename = generateSnapshotFilename(slug, file.project.id, 'latest') | ||
| return resolve(snapshotDir, snapshotFilename) | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Artmann is snapshot related logic in this file something that we can find in another package?
if not, I suppose it would be good to move this outside of packages/cli, as we will need to reuse it in the extension? (but can be done later)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, we have code for writing and reading snapshots in the extension, and we have code for reading them in convert. It would be good to have it together in one place :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Chris. I'm giving this a try but it's a big change. I'll merge this PR as is, and then I'll open a new one with the refactor immediately after.
- Add shell completions for `analyze` command (bash, zsh, fish) - Use c 4E22 onvertToEnvironmentVariableName from @deepnote/blocks to reduce duplication - Export INPUT_BLOCK_TYPES from @deepnote/blocks and use in analysis.ts - Replace inline createTestFile() with fixture in output-persistence tests - Add output-persistence-test.deepnote fixture - Add loadRootFixture helper to fixture-loader Co-authored-by: Cursor <cursoragent@cursor.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In `@packages/cli/src/commands/run.ts`:
- Around line 787-789: Snapshot replacement currently only matches lowercase
extensions; update the snapshot path computation in the block using
convertedFile.wasConverted and absolutePath to handle uppercase extensions by
applying a case-insensitive replace (use the regex with the /i flag) or by
normalizing the extension (e.g., get the extension via
path.extname(absolutePath).toLowerCase() and replace accordingly) so that
snapshotSourcePath is correct for .PY/.IPYNB/.QMD in any case.
- Around line 490-492: Duplicate helper getIntegrationEnvVarName found in run.ts
and analysis.ts; remove the local definition in run.ts and export the shared
function from analysis.ts, then import it where needed. Specifically, in
analysis.ts export getIntegrationEnvVarName (ensure it uses
convertToEnvironmentVariableName from its module scope or import it), and in
run.ts delete the local getIntegrationEnvVarName definition and add an import
for the exported function; update any references to keep the same function name
and ensure TypeScript imports/exports compile.
Remove duplicate getIntegrationEnvVarName from run.ts and export the shared function from analysis.ts instead. This eliminates code duplication while maintaining the same behavior. Co-authored-by: Cursor <cursoragent@cursor.com>
- Use importedModules from DAG analysis instead of separate regex-based extractImports function in analysis.ts - Add linesOfCode to ast-analyzer.py output for each block type - Update stats command to use analyzeProject for accurate imports - Update TypeScript types and Zod schema to include linesOfCode This eliminates code duplication and uses the more accurate AST-based import extraction from the reactivity package. Co-authored-by: Cursor <cursoragent@cursor.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
packages/reactivity/src/scripts/ast-analyzer.py (1)
228-412: Align LOC semantics with stats (comments/blank lines).
count_lines_of_codecurrently counts every line, while TS stats exclude comments/docstrings. If this field is surfaced alongside stats, numbers will drift. Consider mirroring the TS rules or renaming the field to clarify intent.🛠️ Suggested alignment with TS LOC rules
-def count_lines_of_code(content): +def count_lines_of_code(block): """ Count lines of code in a block's content. - Returns the total number of lines (including empty lines and comments). + Returns the number of code lines (excludes comments/blank lines). """ - if not content: + content = block.get("content", "") + if not content: return 0 - return len(content.split("\n")) + lines = content.split("\n") + block_type = block.get("type") + + if block_type == "sql": + return sum(1 for line in lines if line.strip() and not line.strip().startswith("--")) + + if block_type in ("code", None): + loc = 0 + in_multiline = False + delimiter = "" + for line in lines: + trimmed = line.strip() + if in_multiline: + if delimiter in trimmed: + in_multiline = False + continue + if trimmed.startswith('"""') or trimmed.startswith("'''"): + delim = '"""' if trimmed.startswith('"""') else "'''" + if trimmed.count(delim) == 1: + in_multiline = True + delimiter = delim + continue + if trimmed and not trimmed.startswith("#"): + loc += 1 + return loc + + return sum(1 for line in lines if line.strip())- content = block.get("content", "") - loc = count_lines_of_code(content) + loc = count_lines_of_code(block) ... - content = block.get("content", "") - loc = count_lines_of_code(content) + loc = count_lines_of_code(block)
analyzecommand for comprehensive project analysis including quality score, structure info, dependency analysis, and suggestions--contextflag to run command that includes analysis info (stats, lint issues, variable definitions/usages) in outputSummary by CodeRabbit
New Features
Tests
✏️ Tip: You can customize this high-level summary in your review settings.