feat: agent core loop — history, caching, compaction, trust by tps-flint · Pull Request #43 · tpsdev-ai/cli

tps-flint · 2026-02-27T08:33:38Z

Implements the full AGENT-CORE-LOOP spec (4 sections):

1. Conversation Accumulation

Full message history preserved across tool turns. Before: model only saw latest tool results. Now: complete messages array carries through the entire loop.

E2E tested: 3-step tool chain (write → read → write) with qwen3:8b — all steps complete with full context.

2. Prompt Caching

Anthropic: cache_control: ephemeral on system prompt + last tool definition
Cache metrics (cacheReadTokens, cacheWriteTokens) tracked for all 4 providers
Tool specs sorted alphabetically for deterministic ordering (cache stability)

3. Cache-Safe Compaction

compact() reuses identical system prompt + tools for cache hits
Pre-compaction memory flush is mandatory
Compaction summary injected into subsequent conversations

4. Mail Trust & Capability Scoping

Three trust levels: user, internal, external
External mail wrapped in <<<UNTRUSTED_CONTENT>>> markers
exec tool removed entirely for external trust
write/edit restricted to scratch/ for external trust
System prompt includes untrusted content preamble
PANIC_MAX_TURNS = 20 as absolute safety net

Files Changed

packages/agent/src/runtime/types.ts — TrustLevel, cache fields, maxToolTurns, rawAssistantMessage
packages/agent/src/runtime/event-loop.ts — complete rewrite (conversation loop, trust, compaction)
packages/agent/src/llm/provider.ts — cache breakpoints, raw messages, cache metrics

408 tests pass (41 agent + 367 CLI). Sherlock review requested.

…mpaction, mail trust Implements AGENT-CORE-LOOP spec (all 4 sections): 1. Conversation accumulation: full message history across tool turns. Previously only sent latest tool results — model lost context. Now the complete messages array carries through the entire loop. 2. Prompt caching: Anthropic cache_control breakpoints on system prompt and last tool definition. Cache metrics (read/write tokens) tracked in CompletionResponse for all providers. 3. Cache-safe compaction: compact() method reuses identical system prompt + tools for cache hits. Pre-compaction flush mandatory. Compaction summary injected into subsequent conversations. 4. Mail trust & capability scoping: - Three trust levels: user, internal, external - External mail wrapped in UNTRUSTED_CONTENT markers - exec tool removed for external trust - write/edit restricted to scratch/ for external trust - Untrusted content preamble added to system prompt - Tool specs sorted alphabetically for cache determinism Also: - PANIC_MAX_TURNS (20) as provider-level safety net - maxToolTurns config (default 12, was hardcoded 8) - rawAssistantMessage preserved for Anthropic tool_use compliance - TrustLevel type added E2E tested: 3-step tool chain (write → read → write) with qwen3:8b on Ollama. All steps complete with full context.

- Add four-layer isolation threat model - Add trust boundary documentation - Catalog all K&S findings (S33-A through S43-D) - Document security testing approach and periodic review cadence - Track resolved vs open findings with test references

S43-A: Internal mail drops exec — only user (human) gets full tools. Prevents lateral movement if one agent is compromised. S43-B: Compaction wraps history in <conversation_history> XML tags. Instruction placed outside tags with explicit ignore directive. S43-C: validateRawAssistant() checks structure before appending to history (text/tool_use for Anthropic, string+tool_calls for OpenAI). S43-D: Scratch path check uses resolve() + startsWith() instead of path.includes('scratch/'). Blocks scratch/../../ traversal. Adds 5 regression tests in packages/agent/test/security/mail-trust.test.ts. Updates SECURITY.md findings catalog with fix status and test references.

Threat model and findings catalog moved to ops repo (shared/security/THREAT-MODEL.md). SECURITY.md in a public repo should only contain the vulnerability reporting policy.

Body could tell the model to ignore what comes after, bypassing the safety framing. Instruction now comes first — model reads the warning before encountering untrusted data.

tps-flint added 5 commits February 27, 2026 00:32

revert SECURITY.md to standard vuln reporting policy

c55fe9a

Threat model and findings catalog moved to ops repo (shared/security/THREAT-MODEL.md). SECURITY.md in a public repo should only contain the vulnerability reporting policy.

security: move safety instruction before untrusted content

44c6972

Body could tell the model to ignore what comes after, bypassing the safety framing. Instruction now comes first — model reads the warning before encountering untrusted data.

heskew approved these changes Feb 27, 2026

View reviewed changes

heskew merged commit 9c361d9 into main Feb 27, 2026
10 checks passed

heskew deleted the agent-core-loop branch February 27, 2026 09:24

tps-flint mentioned this pull request Feb 27, 2026

v0.5.4 #45

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: agent core loop — history, caching, compaction, trust#43

feat: agent core loop — history, caching, compaction, trust#43
heskew merged 5 commits intomainfrom
agent-core-loop

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants