feat: add filesystem, thinking, and structured reasoning tools by chhot2u · Pull Request #43 · cocoindex-io/cocoindex-code

chhot2u · 2026-03-09T03:57:52Z

Summary

Add fast filesystem tools: find_files, read_file, write_file, edit_file, grep_code, directory_tree
Add core thinking tools: sequential_thinking, extended_thinking, ultra_thinking, learning_loop, self_improve, reward_thinking
Add structured reasoning tools with effort_mode (low/medium/high) support:
- evidence_tracker — attach typed, weighted evidence to ultra_thinking hypotheses
- premortem — structured pre-failure risk analysis (5 phases)
- inversion_thinking — Munger-style invert-then-reinvert reasoning (6 phases)
- effort_estimator — three-point PERT estimation with confidence intervals

Effort Mode

All 4 new structured reasoning tools support an effort_mode parameter:

Mode	Behavior
`low`	Minimal validation, fewer phases, single-point estimates
`medium`	Standard depth, full phase support, PERT + 68% CI
`high`	Exhaustive analysis, auto-population, 95% CI

Testing

159 tests total (53 new for structured reasoning tools, 75 for filesystem, 31 for core thinking)
All passing, ruff-clean, no new dependencies

Commits

01bd322 — fast filesystem tools (find_files, read_file, grep_code, directory_tree)
6c66601 — write_file tool
8286e5d — edit_file tool
4c9c253 — core thinking tools (sequential, extended, ultra, learning loop, RL)
f5f57ec — evidence_tracker, premortem, inversion_thinking, effort_estimator with effort_mode

…rectory_tree) Add 4 new MCP tools for fast filesystem operations that complement the existing semantic search tool: - find_files: glob-based file discovery with language/path filters - read_file: direct file reading with optional line range - grep_code: regex text search with context lines - directory_tree: project structure listing Includes path traversal protection, binary file detection, excluded directory filtering, and 41 new tests covering all tools. All existing tests continue to pass.

Adds write_file MCP tool that creates or overwrites files within the codebase root. Features auto-creation of parent directories, 1 MB size limit, path traversal protection, and write-then-read roundtrip safety. Includes 9 new tests (65 total, all passing).

Adds edit_file MCP tool for surgical edits: finds old_string in a file and replaces with new_string. Requires unique match by default (safety), with replace_all option for bulk renames. Supports multiline strings, deletion (replace with empty), and insertion (replace anchor text). Includes 10 new tests (75 total, all passing).

georgeh0 · 2026-03-09T06:10:12Z

Thanks for the PR! Can you elaborate when these tools are needed a little bit?

I think most of them are already native capabilities of the coding agents like Claude Code. So in which cases we need the agent doing these through the MCP?

…ing loop, RL) Add 6 new MCP tools for structured reasoning and self-improvement: - sequential_thinking: step-by-step problem solving with branching/revision - extended_thinking: deep analysis with automatic checkpoints - ultra_thinking: phased hypothesis generation, verification, synthesis - learning_loop: reflect on sessions and extract learnings to JSONL - self_improve: recommend strategies ranked by historical reward - reward_thinking: reinforcement learning feedback signals Includes ThinkingEngine with persistent memory, 31 new tests (119 total, all passing), and ruff-clean code.

chhot2u · 2026-03-09T15:33:41Z

Great question! Here's the breakdown:

Filesystem Tools — Why MCP instead of native agent?

You're right that Claude Code / Cursor etc. have native file ops. But not every MCP client does. cocoindex-code is designed to work with any MCP-compatible client (OpenCode, Continue, Zed, custom agents, etc.). The filesystem tools make it a self-contained codebase assistant — one MCP server gives you search + read + write + grep without depending on the client having those built in.

Also, these tools are codebase-aware by default: they respect .git, node_modules, __pycache__, build/ exclusions, detect binary files, enforce path traversal security, and detect languages — things that raw native file ops don't do.

Example — agent using cocoindex-code to explore an unfamiliar codebase:

Agent: directory_tree(path="", max_depth=2)         → see project structure
Agent: find_files(pattern="*.go", languages=["go"])  → find all Go files
Agent: grep_code(pattern="func main", include="*.go") → find entry points
Agent: read_file(path="cmd/server/main.go", start_line=1, end_line=50) → read the entrypoint
Agent: search(query="how is authentication handled") → semantic search

Thinking Tools — What agents can't do natively

The thinking tools are not about basic chain-of-thought (which LLMs do naturally). They add three things agents can't do on their own:

1. Persistent learning across sessions

Native agents forget everything between conversations. The learning_loop + self_improve tools persist strategy scores to disk (thinking_memory.jsonl). Over time, the agent learns which reasoning approaches work best for this specific codebase.

Example — RL loop over multiple sessions:

Session 1:
  Agent: sequential_thinking(thought="...", session_id="abc", ...) × 5 steps
  Agent: learning_loop(session_id="abc", strategy_used="divide_and_conquer", 
         outcome_tags=["success"], reward=0.9, insights=["Breaking into subproblems worked well"])

Session 2:
  Agent: self_improve(top_k=3) → returns: divide_and_conquer (avg_reward=0.9), ...
  Agent knows to prefer "divide_and_conquer" for similar problems

Session 3 (bad outcome):
  Agent: reward_thinking(session_id="xyz", reward=-0.5)  → strategy scores adjust downward

2. Structured hypothesis testing (ultra_thinking)

Native LLMs mix exploration and verification in a single stream. ultra_thinking forces a phased approach: explore → hypothesize → verify → synthesize. The server tracks hypotheses and verification status separately.

Example — debugging a complex issue:

Agent: ultra_thinking(thought="Examining the error trace...", phase="explore", 
       thought_number=1, total_thoughts=8, session_id="debug-1", next_thought_needed=True)
Agent: ultra_thinking(thought="Could be a race condition in the queue", phase="hypothesize",
       hypothesis="Race condition in queue.Submit()", thought_number=2, ..., next_thought_needed=True)
Agent: ultra_thinking(thought="Found mutex guard at line 142, but not on the cancel path",
       phase="verify", confidence=0.8, thought_number=3, ..., next_thought_needed=True)
→ returns: verification_status="supported", hypotheses=["Race condition in queue.Submit()"]
Agent: ultra_thinking(thought="The fix is to hold recorderMu through cancellation",
       phase="synthesize", thought_number=4, ..., next_thought_needed=False)
→ returns: synthesis="Synthesis of hypotheses: Race condition in queue.Submit()"

3. Long reasoning with checkpoints (extended_thinking)

For deep analysis (50+ steps), native agents lose coherence. extended_thinking provides automatic checkpoint summaries at configurable intervals so the agent can re-anchor.

Example:

Agent: extended_thinking(thought="Step 5 analysis...", thought_number=5, total_thoughts=20, depth_level="exhaustive", checkpoint_interval=5, ...) → returns: checkpoint_summary="Checkpoint at step 5: 5 thoughts, 0 branches" (agent can use this to summarize progress and stay on track)

TL;DR

Filesystem tools: make cocoindex-code a complete, standalone MCP server for any client

Thinking tools: add persistent learning, structured hypothesis testing, and checkpoint-based deep reasoning — capabilities that go beyond what stateless LLM agents can do natively

…imator tools Add 4 new thinking tools with effort_mode (low/medium/high) support: - evidence_tracker: attach typed, weighted evidence to ultra_thinking hypotheses (code_ref, data_point, external, assumption, test_result) - premortem: structured pre-failure risk analysis with 5 phases (describe_plan, imagine_failure, identify_causes, rank_risks, mitigate) - inversion_thinking: Munger-style invert-then-reinvert reasoning with 6 phases (define_goal, invert, list_failure_causes, rank_causes, reinvert, action_plan) - effort_estimator: three-point PERT estimation with confidence intervals (68% CI at medium, 95% CI at high effort) Includes 53 new tests (159 total passing), all ruff-clean.

…mode support Add code_intelligence_tools (find_definition, find_references, list_symbols, code_metrics, rename_symbol, search) and patch_tools (apply_patch, large_write). Extend thinking_tools with plan_optimizer, effort_estimator, inversion_thinking, premortem, and evidence_tracker with configurable effort modes. Register all new tools in server. Add comprehensive tests.

root added 3 commits March 9, 2026 03:57

chhot2u changed the title ~~feat: add fast filesystem tools (find_files, read_file, grep_code, directory_tree)~~ feat: add fast filesystem tools and advanced thinking tools Mar 9, 2026

chhot2u changed the title ~~feat: add fast filesystem tools and advanced thinking tools~~ feat: add filesystem, thinking, and structured reasoning tools Mar 10, 2026

root added 2 commits March 11, 2026 18:02

add session file

3405db5

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add filesystem, thinking, and structured reasoning tools#43

feat: add filesystem, thinking, and structured reasoning tools#43
chhot2u wants to merge 7 commits intococoindex-io:mainfrom
chhot2u:feat/filesystem-tools

Uh oh!

Uh oh!

TL;DR

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Uh oh!

Summary

Effort Mode

Testing

Commits

Uh oh!

Uh oh!

Filesystem Tools — Why MCP instead of native agent?

Thinking Tools — What agents can't do natively

1. Persistent learning across sessions

2. Structured hypothesis testing (ultra_thinking)

3. Long reasoning with checkpoints (extended_thinking)

TL;DR

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants