8000 GitHub - mrorigo/codesmart: Git-centric, local-first deterministic codebase intelligence system · GitHub
[go: up one dir, main page]

Skip to content

mrorigo/codesmart

Repository files navigation

CodeSmart

CodeSmart is a Git-centric, local-first codebase intelligence system with:

  • Deterministic repository indexing
  • Commit/ref/working-tree target resolution with on-demand indexing
  • Symbol and semantic label queries
  • Deterministic call-edge indexing and call-graph diffing
  • Code and dependency diffing across Git targets
  • Optional background GLiNER2 entity enrichment (non-blocking)
  • MCP server support (tools, resources, roots)

Design stance:

  • Git targets are first-class for agent workflows (working_tree, commit, ref).
  • Snapshot IDs are internal implementation artifacts; target-based APIs are preferred.
  • Short SHA forms are emitted in MCP outputs for token-efficient agent usage.

Benchmark Results (Single Snapshot)

Latest benchmark run combines both quality and performance metrics in one pass.

Benchmark Type Files Symbols Full Index Time Index Throughput Query p95 Curated Query Hits Curated File Hits Auto Probe Hits
spectral-cortex Curated 20 190 1.13s 17.8 files/s 2.07ms 9/9 3/3 80/80
pocketmesh Curated 30 73 1.12s 26.7 files/s 1.59ms 10/10 3/3 36/36
fathom Curated 7 24 1.14s 6.1 files/s 3.25ms 7/7 3/3 7/7
turbopack Perf-only 1330 6872 4.13s 322.2 files/s 2.41ms N/A N/A N/A
VS Code Perf-only 6618 41333 26.78s 247.1 files/s 1.57ms N/A N/A N/A

Notes:

  • spectral-cortex, pocketmesh, and fathom are used for strict quality gating.
  • Large monorepo benchmarks (turbopack, VS Code) are used primarily for throughput and latency tracking.
  • Full benchmark pipeline docs: scripts/eval/README.md.

Commands used:

python3 scripts/eval/clone_and_index_eval_repos.py --mode full --clean-indexes
python3 scripts/eval/run_graph_eval.py --strict

GLiNER2 Enrichment

CodeSmart supports asynchronous ML entity enrichment with a GLiNER2 adapter. This runs in the background after normal indexing, so time-to-usable remains fast.

Install real GLiNER2 support with:

python3 -m pip install -e ".[ml]"
  • Baseline index stays deterministic and immediately queryable.
  • Enrichment augments snapshots with ml_entities.
  • CLI + MCP can query enrichment outputs without blocking index workflows.

Latest real-model enrichment eval (GLiNER2 enforced):

Repo Worker Avg Exact Phrase Coverage Path Coverage (symbol-miss probes) Candidate Coverage (practical lift)
spectral-cortex 1467.96ms 1/3 (0.333) 3/3 (1.000) 3/3 (1.000)
pocketmesh 977.34ms 0/3 (0.000) 3/3 (1.000) 3/3 (1.000)
fathom 4040.95ms 0/3 (0.000) 3/3 (1.000) 3/3 (1.000)

Interpretation:

  • GLiNER2 often returns shorter semantic spans rather than exact long probe phrases.
  • Exact phrase coverage can be low while practical file-level candidate lift remains high.
  • Use candidate_coverage for practical enrichment utility; keep exact phrase coverage as a strict regression signal.

Commands used:

python3 scripts/eval/run_enrich_eval.py --repos-root eval-repos --output eval-repos/indexes/enrich-eval-report.json --require-real-model --min-coverage 0.0

Case Study Facts (Monorepos)

Two grounded case studies were run on large real-world repositories to measure fan-out bug-fix recall.

  • VS Code study: 5 scenarios, 27 ground-truth files total.
  • Turbopack study: 4 scenarios, 38 ground-truth files total.
  • CodeSmart symbol-only retrieval: 0/65 recall on these fan-out phrase tasks.
  • CodeSmart deterministic baseline (symbol + literal index): 54/65 recall (0.831).
  • CodeSmart enriched retrieval: 65/65 recall (1.000).
  • Vector-style top-k baseline:
    • VS Code: 20/27 at k=50 (0.741 recall).
    • Turbopack: 26/38 at k=50 (0.684 recall), 10/38 at k=20 (0.263 recall).

Bottom line: for fan-out completeness, deterministic structured + literal indexing closes most of the baseline gap, and GLiNER2 enrichment closes the rest in these studies.

Baseline Literal Index Eval

Deterministic literal indexing was evaluated directly against rg-based ground truth across 6 monorepo fan-out probes.

  • Symbol-only baseline: 0/54 recall (0.000)
  • Hybrid deterministic baseline (find_symbols + find_text): 54/54 recall (1.000)
  • Hybrid precision: 1.000

Command used:

python3 scripts/eval/run_baseline_literal_eval.py --repos-root eval-repos --indexes-root eval-repos/indexes --output eval-repos/indexes/baseline-literal-eval-report.json

Scope-Guided Retrieval Eval

Scope-aware retrieval is now evaluated directly on real phrase-driven fan-out tasks from VS Code and Turbopack.

Latest results (9 tasks):

  • recall delta (scoped - unscoped): +0.111
  • precision delta (scoped - unscoped): +0.043
  • median candidate reduction: 0.153 (15.3% fewer retrieved files)

Command used:

python3 scripts/eval/run_scope_eval.py --strict

Interpretation:

  • Scoping improves ranking quality while reducing candidate-set size.
  • get_stats.available_scopes is the recommended first-step signal before broad retrieval calls.

CLI

codesmart index --root . --db .codesmart/index.db --json
codesmart stats --db .codesmart/index.db --json
codesmart find-symbol run --db .codesmart/index.db --json
codesmart find-text "This action is irreversible!" --db .codesmart/index.db --mode substring --json
codesmart label security_operation --db .codesmart/index.db --json
codesmart callers <symbol_id> --db .codesmart/index.db --json
codesmart callees <symbol_id> --db .codesmart/index.db --json
codesmart diff <from_snapshot> <to_snapshot> --db .codesmart/index.db --json
codesmart call-diff <from_snapshot> <to_snapshot> --db .codesmart/index.db --json
codesmart snapshot list --db .codesmart/index.db --json
codesmart snapshot show <snapshot_id> --db .codesmart/index.db --json
codesmart snapshot delete <snapshot_id> --db .codesmart/index.db --force --json
codesmart doctor --db .codesmart/index.db --root . --json
codesmart hooks install --root . --db .codesmart/index.db --mode incremental --json
codesmart hooks status --root . --json
codesmart hooks uninstall --root . --json
codesmart enrich run --db .codesmart/index.db --snapshot <snapshot_id> --json
codesmart enrich worker --db .codesmart/index.db --once --json
codesmart ml-entities --db .codesmart/index.db --snapshot <snapshot_id> --limit 50 --json

MCP

Run over stdio:

codesmart serve-mcp --root . --db .codesmart/index.db

Run over Streamable HTTP (single endpoint):

codesmart serve-mcp-http --root . --db .codesmart/index.db --host 127.0.0.1 --port 8765 --endpoint /mcp

Implemented methods:

  • initialize
  • tools/list
  • tools/call
  • resources/list
  • resources/read
  • resources/templates/list
  • roots/list

Current high-value query tools include:

  • resolve_target, ensure_indexed
  • find_symbols, find_by_label, find_text, find_ml_entities
  • get_callers, get_callees, analyze_change_impact
  • diff_code, diff_dependencies, scan_doc_drift, find_doc_drift

Symbol IDs are compact opaque identifiers (for example, sym_a1b2c3d4e5f67890). Treat them as stable handles to pass into symbol, callers, and callees.

get_stats is also a primary discovery step for agents and users:

  • use top_labels to choose high-signal label sweeps
  • use available_scopes to scope all follow-up queries
  • keep output bounded with label_limit/scope_limit (MCP) or --label-limit/--scope-limit (CLI)

Call-Graph Support

Deterministic call-graph extraction currently covers:

  • Python
  • TypeScript / JavaScript
  • Go
  • Rust
  • Java

Run parity checks with:

python3 scripts/eval/run_callgraph_parity_eval.py

Testing

uv run lint
uv run test
.venv/bin/python -m ruff check .
python3 -m unittest discover -s tests -p 'test_*.py' -v

Call-Graph Parity Eval

python3 scripts/eval/run_callgraph_parity_eval.py

Reference:

  • docs/ref/CALL_GRAPH_LANGUAGE_PARITY.md

About

Git-centric, local-first deterministic codebase intelligence system

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages

0