8000 feat: agent core loop — history, caching, compaction, trust by tps-flint · Pull Request #43 · tpsdev-ai/cli · GitHub
[go: up one dir, main page]

Skip to content

feat: agent core loop — history, caching, compaction, trust#43

Merged
heskew merged 5 commits intomainfrom
agent-core-loop
Feb 27, 2026
Merged

feat: agent core loop — history, caching, compaction, trust#43
heskew merged 5 commits intomainfrom
agent-core-loop

Conversation

@tps-flint
Copy link
Contributor

Implements the full AGENT-CORE-LOOP spec (4 sections):

1. Conversation Accumulation

Full message history preserved across tool turns. Before: model only saw latest tool results. Now: complete messages array carries through the entire loop.

E2E tested: 3-step tool chain (write → read → write) with qwen3:8b — all steps complete with full context.

2. Prompt Caching

  • Anthropic: cache_control: ephemeral on system prompt + last tool definition
  • Cache metrics (cacheReadTokens, cacheWriteTokens) tracked for all 4 providers
  • Tool specs sorted alphabetically for deterministic ordering (cache stability)

3. Cache-Safe Compaction

  • compact() reuses identical system prompt + tools for cache hits
  • Pre-compaction memory flush is mandatory
  • Compaction summary injected into subsequent conversations

4. Mail Trust & Capability Scoping

  • Three trust levels: user, internal, external
  • External mail wrapped in <<<UNTRUSTED_CONTENT>>> markers
  • exec tool removed entirely for external trust
  • write/edit restricted to scratch/ for external trust
  • System prompt includes untrusted content preamble
  • PANIC_MAX_TURNS = 20 as absolute safety net

Files Changed

  • packages/agent/src/runtime/types.ts — TrustLevel, cache fields, maxToolTurns, rawAssistantMessage
  • packages/agent/src/runtime/event-loop.ts — complete rewrite (conversation loop, trust, compaction)
  • packages/agent/src/llm/provider.ts — cache breakpoints, raw messages, cache metrics

408 tests pass (41 agent + 367 CLI). Sherlock review requested.

…mpaction, mail trust

Implements AGENT-CORE-LOOP spec (all 4 sections):

1. Conversation accumulation: full message history across tool turns.
   Previously only sent latest tool results — model lost context.
   Now the complete messages array carries through the entire loop.

2. Prompt caching: Anthropic cache_control breakpoints on system
   prompt and last tool definition. Cache metrics (read/write tokens)
   tracked in CompletionResponse for all providers.

3. Cache-safe compaction: compact() method reuses identical system
   prompt + tools for cache hits. Pre-compaction flush mandatory.
   Compaction summary injected into subsequent conversations.

4. Mail trust & capability scoping:
   - Three trust levels: user, internal, external
   - External mail wrapped in UNTRUSTED_CONTENT markers
   - exec tool removed for external trust
   - write/edit restricted to scratch/ for external trust
   - Untrusted content preamble added to system prompt
   - Tool specs sorted alphabetically for cache determinism

Also:
- PANIC_MAX_TURNS (20) as provider-level safety net
- maxToolTurns config (default 12, was hardcoded 8)
- rawAssistantMessage preserved for Anthropic tool_use compliance
- TrustLevel type added

E2E tested: 3-step tool chain (write → read → write) with qwen3:8b
on Ollama. All steps complete with full context.
- Add four-layer isolation threat model
- Add trust boundary documentation
- Catalog all K&S findings (S33-A through S43-D)
- Document security testing approach and periodic review cadence
- Track resolved vs open findings with test references
S43-A: Internal mail drops exec — only user (human) gets full tools.
       Prevents lateral movement if one agent is compromised.
S43-B: Compaction wraps history in <conversation_history> XML tags.
       Instruction placed outside tags with explicit ignore directive.
S43-C: validateRawAssistant() checks structure before appending to
       history (text/tool_use for Anthropic, string+tool_calls for OpenAI).
S43-D: Scratch path check uses resolve() + startsWith() instead of
       path.includes('scratch/'). Blocks scratch/../../ traversal.

Adds 5 regression tests in packages/agent/test/security/mail-trust.test.ts.
Updates SECURITY.md findings catalog with fix status and test references.
Threat model and findings catalog moved to ops repo (shared/security/THREAT-MODEL.md).
SECURITY.md in a public repo should only contain the vulnerability reporting policy.
Body could tell the model to ignore what comes after, bypassing
the safety framing. Instruction now comes first — model reads the
warning before encountering untrusted data.
@heskew heskew merged commit 9c361d9 into main Feb 27, 2026
10 checks passed
@heskew heskew deleted the agent-core-loop branch February 27, 2026 09:24
@tps-flint tps-flint mentioned this pull request Feb 27, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants

0