8000 feat: add filesystem, thinking, and structured reasoning tools by chhot2u · Pull Request #43 · cocoindex-io/cocoindex-code · GitHub
[go: up one dir, main page]

Skip to content

feat: add filesystem, thinking, and structured reasoning tools#43

Open
chhot2u wants to merge 7 commits intococoindex-io:mainfrom
chhot2u:feat/filesystem-tools
Open

feat: add filesystem, thinking, and structured reasoning tools#43
chhot2u wants to merge 7 commits intococoindex-io:mainfrom
chhot2u:feat/filesystem-tools

Conversation

@chhot2u
Copy link
@chhot2u chhot2u commented Mar 9, 2026

Summary

  • Add fast filesystem tools: find_files, read_file, write_file, edit_file, grep_code, directory_tree
  • Add core thinking tools: sequential_thinking, extended_thinking, ultra_thinking, learning_loop, self_improve, reward_thinking
  • Add structured reasoning tools with effort_mode (low/medium/high) support:
    • evidence_tracker — attach typed, weighted evidence to ultra_thinking hypotheses
    • premortem — structured pre-failure risk analysis (5 phases)
    • inversion_thinking — Munger-style invert-then-reinvert reasoning (6 phases)
    • effort_estimator — three-point PERT estimation with confidence intervals

Effort Mode

All 4 new structured reasoning tools support an effort_mode parameter:

Mode Behavior
low Minimal validation, fewer phases, single-point estimates
medium Standard depth, full phase support, PERT + 68% CI
high Exhaustive analysis, auto-population, 95% CI

Testing

  • 159 tests total (53 new for structured reasoning tools, 75 for filesystem, 31 for core thinking)
  • All passing, ruff-clean, no new dependencies

Commits

  1. 01bd322 — fast filesystem tools (find_files, read_file, grep_code, directory_tree)
  2. 6c66601 — write_file tool
  3. 8286e5d — edit_file tool
  4. 4c9c253 — core thinking tools (sequential, extended, ultra, learning loop, RL)
  5. f5f57ec — evidence_tracker, premortem, inversion_thinking, effort_estimator with effort_mode

root added 3 commits March 9, 2026 03:57
…rectory_tree)

Add 4 new MCP tools for fast filesystem operations that complement
the existing semantic search tool:

- find_files: glob-based file discovery with language/path filters
- read_file: direct file reading with optional line range
- grep_code: regex text search with context lines
- directory_tree: project structure listing

Includes path traversal protection, binary file detection, excluded
directory filtering, and 41 new tests covering all tools.
All existing tests continue to pass.
Adds write_file MCP tool that creates or overwrites files within the
codebase root. Features auto-creation of parent directories, 1 MB size
limit, path traversal protection, and write-then-read roundtrip safety.

Includes 9 new tests (65 total, all passing).
Adds edit_file MCP tool for surgical edits: finds old_string in a file
and replaces with new_string. Requires unique match by default (safety),
with replace_all option for bulk renames. Supports multiline strings,
deletion (replace with empty), and insertion (replace anchor text).

Includes 10 new tests (75 total, all passing).
@georgeh0
Copy link
Member
georgeh0 commented Mar 9, 2026

Thanks for the PR! Can you elaborate when these tools are needed a little bit?

I think most of them are already native capabilities of the coding agents like Claude Code. So in which cases we need the agent doing these through the MCP?

…ing loop, RL)

Add 6 new MCP tools for structured reasoning and self-improvement:

- sequential_thinking: step-by-step problem solving with branching/revision
- extended_thinking: deep analysis with automatic checkpoints
- ultra_thinking: phased hypothesis generation, verification, synthesis
- learning_loop: reflect on sessions and extract learnings to JSONL
- self_improve: recommend strategies ranked by historical reward
- reward_thinking: reinforcement learning feedback signals

Includes ThinkingEngine with persistent memory, 31 new tests (119 total,
all passing), and ruff-clean code.
@chhot2u chhot2u changed the title feat: add fast filesystem tools (find_files, read_file, grep_code, directory_tree) feat: add fast filesystem tools and advanced thinking tools Mar 9, 2026
@chhot2u
Copy link
Author
chhot2u commented Mar 9, 2026

Great question! Here's the breakdown:

Filesystem Tools — Why MCP instead of native agent?

You're right that Claude Code / Cursor etc. have native file ops. But not every MCP client does. cocoindex-code is designed to work with any MCP-compatible client (OpenCode, Continue, Zed, custom agents, etc.). The filesystem tools make it a self-contained codebase assistant — one MCP server gives you search + read + write + grep without depending on the client having those built in.

Also, these tools are codebase-aware by default: they respect .git, node_modules, __pycache__, build/ exclusions, detect binary files, enforce path traversal security, and detect languages — things that raw native file ops don't do.

Example — agent using cocoindex-code to explore an unfamiliar codebase:

Agent: directory_tree(path="", max_depth=2)         → see project structure
Agent: find_files(pattern="*.go", languages=["go"])  → find all Go files
Agent: grep_code(pattern="func main", include="*.go") → find entry points
Agent: read_file(path="cmd/server/main.go", start_line=1, end_line=50) → read the entrypoint
Agent: search(query="how is authentication handled") → semantic search

Thinking Tools — What agents can't do natively

The thinking tools are not about basic chain-of-thought (which LLMs do naturally). They add three things agents can't do on their own:

1. Persistent learning across sessions

Native agents forget everything between conversations. The learning_loop + self_improve tools persist strategy scores to disk (thinking_memory.jsonl). Over time, the agent learns which reasoning approaches work best for this specific codebase.

Example — RL loop over multiple sessions:

Session 1:
  Agent: sequential_thinking(thought="...", session_id="abc", ...) × 5 steps
  Agent: learning_loop(session_id="abc", strategy_used="divide_and_conquer", 
         outcome_tags=["success"], reward=0.9, insights=["Breaking into subproblems worked well"])

Session 2:
  Agent: self_improve(top_k=3) → returns: divide_and_conquer (avg_reward=0.9), ...
  Agent knows to prefer "divide_and_conquer" for similar problems

Session 3 (bad outcome):
  Agent: reward_thinking(session_id="xyz", reward=-0.5)  → strategy scores adjust downward

2. Structured hypothesis testing (ultra_thinking)

Native LLMs mix exploration and verification in a single stream. ultra_thinking forces a phased approach: explore → hypothesize → verify → synthesize. The server tracks hypotheses and verification status separately.

Example — debugging a complex issue:

Agent: ultra_thinking(thought="Examining the error trace...", phase="explore", 
       thought_number=1, total_thoughts=8, session_id="debug-1", next_thought_needed=True)
Agent: ultra_thinking(thought="Could be a race condition in the queue", phase="hypothesize",
       hypothesis="Race condition in queue.Submit()", thought_number=2, ..., next_thought_needed=True)
Agent: ultra_thinking(thought="Found mutex guard at line 142, but not on the cancel path",
       phase="verify", confidence=0.8, thought_number=3, ..., next_thought_needed=True)
→ returns: verification_status="supported", hypotheses=["Race condition in queue.Submit()"]
Agent: ultra_thinking(thought="The fix is to hold recorderMu through cancellation",
       phase="synthesize", thought_number=4, ..., next_thought_needed=False)
→ returns: synthesis="Synthesis of hypotheses: Race condition in queue.Submit()"

3. Long reasoning with checkpoints (extended_thinking)

For deep analysis (50+ steps), native agents lose coherence. extended_thinking provides automatic checkpoint summaries at configurable intervals so the agent can re-anchor.

Example:

Agent: extended_thinking(thought="Step 5 analysis...", thought_number=5,
       total_thoughts=20, depth_level="exhaustive", checkpoint_interval=5, ...)
→ returns: checkpoint_summary="Checkpoint at step 5: 5 thoughts, 0 branches"
  (agent can use this to summarize progress and stay on track)

TL;DR

  • Filesystem tools: make cocoindex-code a complete, standalone MCP server for any client
  • Thinking tools: add persistent learning, structured hypothesis testing, and checkpoint-based deep reasoning — capabilities that go beyond what stateless LLM agents can do natively

…imator tools

Add 4 new thinking tools with effort_mode (low/medium/high) support:

- evidence_tracker: attach typed, weighted evidence to ultra_thinking
  hypotheses (code_ref, data_point, external, assumption, test_result)
- premortem: structured pre-failure risk analysis with 5 phases
  (describe_plan, imagine_failure, identify_causes, rank_risks, mitigate)
- inversion_thinking: Munger-style invert-then-reinvert reasoning with
  6 phases (define_goal, invert, list_failure_causes, rank_causes,
  reinvert, action_plan)
- effort_estimator: three-point PERT estimation with confidence intervals
  (68% CI at medium, 95% CI at high effort)

Includes 53 new tests (159 total passing), all ruff-clean.
@chhot2u chhot2u changed the title feat: add fast filesystem tools and advanced thinking tools feat: add filesystem, thinking, and structured reasoning tools Mar 10, 2026
root added 2 commits March 11, 2026 18:02
…mode support

Add code_intelligence_tools (find_definition, find_references, list_symbols,
code_metrics, rename_symbol, search) and patch_tools (apply_patch, large_write).
Extend thinking_tools with plan_optimizer, effort_estimator, inversion_thinking,
premortem, and evidence_tracker with configurable effort modes. Register all new
tools in server. Add comprehensive tests.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants

0