CodeSmart is a Git-centric, local-first codebase intelligence system with:
- Deterministic repository indexing
- Commit/ref/working-tree target resolution with on-demand indexing
- Symbol and semantic label queries
- Deterministic call-edge indexing and call-graph diffing
- Code and dependency diffing across Git targets
- Optional background GLiNER2 entity enrichment (non-blocking)
- MCP server support (
tools,resources,roots)
Design stance:
- Git targets are first-class for agent workflows (
working_tree,commit,ref). - Snapshot IDs are internal implementation artifacts; target-based APIs are preferred.
- Short SHA forms are emitted in MCP outputs for token-efficient agent usage.
Latest benchmark run combines both quality and performance metrics in one pass.
| Benchmark | Type | Files | Symbols | Full Index Time | Index Throughput | Query p95 | Curated Query Hits | Curated File Hits | Auto Probe Hits |
|---|---|---|---|---|---|---|---|---|---|
| spectral-cortex | Curated | 20 | 190 | 1.13s | 17.8 files/s | 2.07ms | 9/9 | 3/3 | 80/80 |
| pocketmesh | Curated | 30 | 73 | 1.12s | 26.7 files/s | 1.59ms | 10/10 | 3/3 | 36/36 |
| fathom | Curated | 7 | 24 | 1.14s | 6.1 files/s | 3.25ms | 7/7 | 3/3 | 7/7 |
| turbopack | Perf-only | 1330 | 6872 | 4.13s | 322.2 files/s | 2.41ms | N/A | N/A | N/A |
| VS Code | Perf-only | 6618 | 41333 | 26.78s | 247.1 files/s | 1.57ms | N/A | N/A | N/A |
Notes:
spectral-cortex,pocketmesh, andfathomare used for strict quality gating.- Large monorepo benchmarks (
turbopack, VS Code) are used primarily for throughput and latency tracking. - Full benchmark pipeline docs:
scripts/eval/README.md.
Commands used:
python3 scripts/eval/clone_and_index_eval_repos.py --mode full --clean-indexes
python3 scripts/eval/run_graph_eval.py --strictCodeSmart supports asynchronous ML entity enrichment with a GLiNER2 adapter. This runs in the background after normal indexing, so time-to-usable remains fast.
Install real GLiNER2 support with:
python3 -m pip install -e ".[ml]"- Baseline index stays deterministic and immediately queryable.
- Enrichment augments snapshots with
ml_entities. - CLI + MCP can query enrichment outputs without blocking index workflows.
Latest real-model enrichment eval (GLiNER2 enforced):
| Repo | Worker Avg | Exact Phrase Coverage | Path Coverage (symbol-miss probes) | Candidate Coverage (practical lift) |
|---|---|---|---|---|
| spectral-cortex | 1467.96ms | 1/3 (0.333) | 3/3 (1.000) | 3/3 (1.000) |
| pocketmesh | 977.34ms | 0/3 (0.000) | 3/3 (1.000) | 3/3 (1.000) |
| fathom | 4040.95ms | 0/3 (0.000) | 3/3 (1.000) | 3/3 (1.000) |
Interpretation:
- GLiNER2 often returns shorter semantic spans rather than exact long probe phrases.
- Exact phrase coverage can be low while practical file-level candidate lift remains high.
- Use
candidate_coveragefor practical enrichment utility; keep exact phrase coverage as a strict regression signal.
Commands used:
python3 scripts/eval/run_enrich_eval.py --repos-root eval-repos --output eval-repos/indexes/enrich-eval-report.json --require-real-model --min-coverage 0.0Two grounded case studies were run on large real-world repositories to measure fan-out bug-fix recall.
- VS Code study: 5 scenarios, 27 ground-truth files total.
- Turbopack study: 4 scenarios, 38 ground-truth files total.
- CodeSmart symbol-only retrieval:
0/65recall on these fan-out phrase tasks. - CodeSmart deterministic baseline (symbol + literal index):
54/65recall (0.831). - CodeSmart enriched retrieval:
65/65recall (1.000). - Vector-style top-k baseline:
- VS Code:
20/27atk=50(0.741recall). - Turbopack:
26/38atk=50(0.684recall),10/38atk=20(0.263recall).
- VS Code:
Bottom line: for fan-out completeness, deterministic structured + literal indexing closes most of the baseline gap, and GLiNER2 enrichment closes the rest in these studies.
Deterministic literal indexing was evaluated directly against rg-based ground truth across 6 monorepo fan-out probes.
- Symbol-only baseline:
0/54recall (0.000) - Hybrid deterministic baseline (
find_symbols + find_text):54/54recall (1.000) - Hybrid precision:
1.000
Command used:
python3 scripts/eval/run_baseline_literal_eval.py --repos-root eval-repos --indexes-root eval-repos/indexes --output eval-repos/indexes/baseline-literal-eval-report.jsonScope-aware retrieval is now evaluated directly on real phrase-driven fan-out tasks from VS Code and Turbopack.
Latest results (9 tasks):
- recall delta (scoped - unscoped):
+0.111 - precision delta (scoped - unscoped):
+0.043 - median candidate reduction:
0.153(15.3% fewer retrieved files)
Command used:
python3 scripts/eval/run_scope_eval.py --strictInterpretation:
- Scoping improves ranking quality while reducing candidate-set size.
get_stats.available_scopesis the recommended first-step signal before broad retrieval calls.
codesmart index --root . --db .codesmart/index.db --json
codesmart stats --db .codesmart/index.db --json
codesmart find-symbol run --db .codesmart/index.db --json
codesmart find-text "This action is irreversible!" --db .codesmart/index.db --mode substring --json
codesmart label security_operation --db .codesmart/index.db --json
codesmart callers <symbol_id> --db .codesmart/index.db --json
codesmart callees <symbol_id> --db .codesmart/index.db --json
codesmart diff <from_snapshot> <to_snapshot> --db .codesmart/index.db --json
codesmart call-diff <from_snapshot> <to_snapshot> --db .codesmart/index.db --json
codesmart snapshot list --db .codesmart/index.db --json
codesmart snapshot show <snapshot_id> --db .codesmart/index.db --json
codesmart snapshot delete <snapshot_id> --db .codesmart/index.db --force --json
codesmart doctor --db .codesmart/index.db --root . --json
codesmart hooks install --root . --db .codesmart/index.db --mode incremental --json
codesmart hooks status --root . --json
codesmart hooks uninstall --root . --json
codesmart enrich run --db .codesmart/index.db --snapshot <snapshot_id> --json
codesmart enrich worker --db .codesmart/index.db --once --json
codesmart ml-entities --db .codesmart/index.db --snapshot <snapshot_id> --limit 50 --jsonRun over stdio:
codesmart serve-mcp --root . --db .codesmart/index.dbRun over Streamable HTTP (single endpoint):
codesmart serve-mcp-http --root . --db .codesmart/index.db --host 127.0.0.1 --port 8765 --endpoint /mcpImplemented methods:
initializetools/listtools/callresources/listresources/readresources/templates/listroots/list
Current high-value query tools include:
resolve_target,ensure_indexedfind_symbols,find_by_label,find_text,find_ml_entitiesget_callers,get_callees,analyze_change_impactdiff_code,diff_dependencies,scan_doc_drift,find_doc_drift
Symbol IDs are compact opaque identifiers (for example, sym_a1b2c3d4e5f67890).
Treat them as stable handles to pass into symbol, callers, and callees.
get_stats is also a primary discovery step for agents and users:
- use
top_labelsto choose high-signal label sweeps - use
available_scopesto scope all follow-up queries - keep output bounded with
label_limit/scope_limit(MCP) or--label-limit/--scope-limit(CLI)
Deterministic call-graph extraction currently covers:
- Python
- TypeScript / JavaScript
- Go
- Rust
- Java
Run parity checks with:
python3 scripts/eval/run_callgraph_parity_eval.pyuv run lint
uv run test
.venv/bin/python -m ruff check .
python3 -m unittest discover -s tests -p 'test_*.py' -vpython3 scripts/eval/run_callgraph_parity_eval.pyReference:
docs/ref/CALL_GRAPH_LANGUAGE_PARITY.md