feat: agent core loop — history, caching, compaction, trust#43
Merged
Conversation
…mpaction, mail trust Implements AGENT-CORE-LOOP spec (all 4 sections): 1. Conversation accumulation: full message history across tool turns. Previously only sent latest tool results — model lost context. Now the complete messages array carries through the entire loop. 2. Prompt caching: Anthropic cache_control breakpoints on system prompt and last tool definition. Cache metrics (read/write tokens) tracked in CompletionResponse for all providers. 3. Cache-safe compaction: compact() method reuses identical system prompt + tools for cache hits. Pre-compaction flush mandatory. Compaction summary injected into subsequent conversations. 4. Mail trust & capability scoping: - Three trust levels: user, internal, external - External mail wrapped in UNTRUSTED_CONTENT markers - exec tool removed for external trust - write/edit restricted to scratch/ for external trust - Untrusted content preamble added to system prompt - Tool specs sorted alphabetically for cache determinism Also: - PANIC_MAX_TURNS (20) as provider-level safety net - maxToolTurns config (default 12, was hardcoded 8) - rawAssistantMessage preserved for Anthropic tool_use compliance - TrustLevel type added E2E tested: 3-step tool chain (write → read → write) with qwen3:8b on Ollama. All steps complete with full context.
- Add four-layer isolation threat model - Add trust boundary documentation - Catalog all K&S findings (S33-A through S43-D) - Document security testing approach and periodic review cadence - Track resolved vs open findings with test references
S43-A: Internal mail drops exec — only user (human) gets full tools.
Prevents lateral movement if one agent is compromised.
S43-B: Compaction wraps history in <conversation_history> XML tags.
Instruction placed outside tags with explicit ignore directive.
S43-C: validateRawAssistant() checks structure before appending to
history (text/tool_use for Anthropic, string+tool_calls for OpenAI).
S43-D: Scratch path check uses resolve() + startsWith() instead of
path.includes('scratch/'). Blocks scratch/../../ traversal.
Adds 5 regression tests in packages/agent/test/security/mail-trust.test.ts.
Updates SECURITY.md findings catalog with fix status and test references.
Threat model and findings catalog moved to ops repo (shared/security/THREAT-MODEL.md). SECURITY.md in a public repo should only contain the vulnerability reporting policy.
Body could tell the model to ignore what comes after, bypassing the safety framing. Instruction now comes first — model reads the warning before encountering untrusted data.
heskew
approved these changes
Feb 27, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Implements the full AGENT-CORE-LOOP spec (4 sections):
1. Conversation Accumulation
Full message history preserved across tool turns. Before: model only saw latest tool results. Now: complete
messagesarray carries through the entire loop.E2E tested: 3-step tool chain (write → read → write) with qwen3:8b — all steps complete with full context.
2. Prompt Caching
cache_control: ephemeralon system prompt + last tool definitioncacheReadTokens,cacheWriteTokens) tracked for all 4 providers3. Cache-Safe Compaction
compact()reuses identical system prompt + tools for cache hits4. Mail Trust & Capability Scoping
user,internal,external<<<UNTRUSTED_CONTENT>>>markersexectool removed entirely for external trustwrite/editrestricted toscratch/for external trustPANIC_MAX_TURNS = 20as absolute safety netFiles Changed
packages/agent/src/runtime/types.ts— TrustLevel, cache fields, maxToolTurns, rawAssistantMessagepackages/agent/src/runtime/event-loop.ts— complete rewrite (conversation loop, trust, compaction)packages/agent/src/llm/provider.ts— cache breakpoints, raw messages, cache metrics408 tests pass (41 agent + 367 CLI). Sherlock review requested.