This project is in Alpha - it may introduce breaking changes | This project is not production ready
CortX AI is a local-first intelligent automation platform designed to turn natural language into reliable, structured system behaviour.
Rather than being a single model, service, or workflow engine, CortX is an orchestration layer that connects language understanding, structured reasoning, and tool execution into a cohesive system. Its purpose is simple: allow humans to describe what they want, while CortX AI determines how to accomplish it safely and deterministically.
At its core, CortX AI is built around a clear principle: language should be an interface, not the system itself.
Large language models are powerful interpreters of intent, but they are not inherently reliable decision engines. CortX AI separates interpretation from execution, using structured routing, deterministic validation, and controlled tool interfaces to convert ambiguous human input into predictable outcomes.
The platform is designed to run locally and privately, allowing individuals, engineers, and organisations to build intelligent systems without depending on external APIs or opaque infrastructure. Every component β from the language models to the orchestration layer β is intended to be deployable within environments you control.
Over time, CortX AI aims to evolve into a foundation for building intelligent software systems where:
- Natural language becomes a first-class interface
- Automation remains transparent and debuggable
- AI behaviour is observable and auditable
- Tools and services can be safely composed and extended
- Systems remain local-first, modular, and developer-friendly
- Understand intent β classifies every request as
execution,planning,analysis, orambiguoususing a local LLM, with deterministic prefix checks for common patterns. - Route deterministically β maps intent to the correct execution path using a pure Python dict, not another LLM.
- Generate structured responses β selects an intent-aware prompt template, calls the worker LLM, and returns a response tailored to the type of request.
- Execute tools β agents return structured JSON specifying either a direct reply or a tool to run; the ToolExecutor carries out the action safely.
- Read files β the built-in
read_filetool reads any local file by path and returns its contents. - Observe everything β every request gets a unique ID; every step emits a structured
event=<name> key=valuelog includingevent=pipeline_selected. - Run configurable pipelines β the
PipelineRegistryholds namedPipelineDefinitionobjects; thePipelineRunnerdynamically resolves components from the definition at runtime. - Load components as modules β classifiers, routers, workers, and tools are registered dynamically at startup via the module loader, with signature validation and lifecycle events.
- Fail gracefully β classifier failures, worker failures, tool lookup errors, and tool exceptions all produce safe fallback responses, never unhandled 500s.
COREtex v0.4 is structured as a runtime platform with three layers:
coretex/ β Runtime platform
runtime/ β PipelineRunner, PipelineDefinition, PipelineStep, ToolExecutor, ModuleLoader, ExecutionContext, EventBus
interfaces/ β ABCs: Classifier, Router, Worker, ModelProvider
registry/ β ToolRegistry, ModuleRegistry, ModelProviderRegistry, PipelineRegistry
config/ β Settings
modules/ β Components implementing interfaces, registered at startup
classifier_basic/ β Intent classifier (prefix checks + LLM)
router_simple/ β Deterministic dict-based router
worker_llm/ β LLM response generator
tools_filesystem/ β read_file tool
model_provider_ollama/ β Ollama inference backend
distributions/
cortx/ β FastAPI ingress + OpenWebUI integration
docs/ β Runtime, module development, and distributions guides
Pipelines are now first-class objects. The PipelineRegistry holds named PipelineDefinition instances, each describing an ordered sequence of PipelineStep objects. The PipelineRunner reads the definition at startup to determine which named components to use.
from coretex.runtime.pipeline import PipelineDefinition, PipelineStep
my_pipeline = PipelineDefinition(
name="my_pipeline",
steps=[
PipelineStep(component_type="classifier", name="classifier_basic"),
PipelineStep(component_type="router", name="router_simple"),
PipelineStep(component_type="worker", name="worker_llm"),
PipelineStep(component_type="tool_executor", name="tool_executor"),
],
)
The default pipeline ("default") preserves the pre-v0.4.0 behaviour exactly.
User (browser)
βββΊ OpenWebUI (port 3000)
βββΊ POST /v1/chat/completions (cortx, port 8000)
βββΊ POST /ingest (internal orchestration via PipelineRunner)
β pipeline_selected log (pipeline=default)
βββΊ Classifier β LLM call 1/2 β ClassificationResult
βββΊ Router β pure Python dict lookup β handler
βββΊ Worker β LLM call 2/2 β JSON action envelope
β
Action Parser
β
Tool Executor β Tool Result
- Classifier β calls Ollama, returns one of
execution | planning | analysis | ambiguous. Deterministic prefix checks short-circuit common patterns before any LLM call. - Router β a Python dict. Given the same intent, always returns the same handler. No LLM involved.
- Worker β selects an intent-aware prompt template, calls Ollama, and returns a JSON action envelope.
- Action Parser β parses the agent's JSON output into a typed
AgentAction. - Tool Executor β the only component that can run tools. Looks up the tool by name in the
ToolRegistryand calls it deterministically. Agents never execute tools directly. - OpenWebUI β UI only.
ENABLE_OLLAMA_API=false. It cannot bypass the pipeline.
Ollama runs on the host machine, not in Docker. The container reaches it via host.docker.internal:11434.
Agents (the worker LLM) must return strict JSON. Two formats are supported:
Direct reply:
{"action": "respond", "content": "Here is the answer."}
Tool call:
{"action": "tool", "tool": "read_file", "args": {"path": "notes.md"}}
If the LLM returns plain text instead of JSON, COREtex gracefully falls back to treating it as a direct response.
Prerequisites: Ollama running on the host, Docker or Podman with Compose.
# 1. Pull a model
ollama pull llama3.2:3b
# 2. Start the stack
docker compose up --build
| Service | URL |
|---|---|
| OpenWebUI | http://localhost:3000 |
| Ingress API | http://localhost:8000 |
# 3. Send a request
curl -X POST http://localhost:8000/ingest \
-H "Content-Type: application/json" \
-d '{"input": "Compare Kubernetes and Nomad"}'
# β {"intent":"analysis","confidence":0.9,"response":"..."}
# 4. Request file reading via tool call
curl -X POST http://localhost:8000/ingest \
-H "Content-Type: application/json" \
-d '{"input": "Read the file /etc/hostname"}'
β οΈ If your input contains an apostrophe (I'm,don't), it will close the shell string and curl will appear to freeze. Use'\''to escape or write the payload to a file:-d @body.json
Use a remote Ollama instance:
OLLAMA_BASE_URL=http://192.168.1.50:11434 docker compose up --build
Change models:
CLASSIFIER_MODEL=llama3.2:3b WORKER_MODEL=llama3.1:8b docker compose up --build
OpenWebUI: Browse to http://localhost:3000, create a local account, select the agentic model from the dropdown, and type any message.
Single-turn only: The
/v1/chat/completionsshim extracts only the most recent user message. Prior turns are visible in the OpenWebUI chat history but are not sent to the API β each request is processed independently. This is deliberate.
Run tests (no Docker required):
pip install -r requirements.txt
pytest tests/test_smoke.py -v
All settings are overridable via environment variables or a .env file.
| Variable | Default | Purpose |
|---|---|---|
OLLAMA_BASE_URL |
http://host.docker.internal:11434 |
Ollama endpoint |
CLASSIFIER_MODEL |
llama3.2:3b |
Model used for intent classification |
WORKER_MODEL |
llama3.2:3b |
Model used for response generation |
CLASSIFIER_TIMEOUT |
60 |
Seconds before classifier call times out |
WORKER_TIMEOUT |
300 |
Seconds before worker call times out |
MAX_TOKENS |
256 |
Max tokens generated by the worker |
LOG_LEVEL |
INFO |
DEBUG, INFO, or WARNING |
DEBUG_ROUTER |
false |
When true, logs event=router_decision at DEBUG lev
85C4
el |
docker-compose.yml uses ${VAR:-default} interpolation throughout β shell variables always take precedence over defaults without editing the file.
Every request gets a request_id. All log lines carry event=<name> and request_id=<id> in structured key=value format.
# Follow live
docker compose logs -f ingress
# Trace a single request
docker compose logs ingress | grep "request_id=<id>"
Typical log sequence (with tool execution):
event=pipeline_selected request_id=<id> pipeline=default
event=request_received request_id=<id>
event=classifier_start request_id=<id> classifier=classifier_basic
event=classifier_complete request_id=<id> intent=execution confidence=0.95 duration_ms=312
event=router_selected request_id=<id> intent=execution handler=worker
event=worker_start request_id=<id> worker=worker_llm intent=execution
event=worker_complete request_id=<id> duration_ms=1450
event=agent_output_received request_id=<id>
event=tool_execute tool=read_file request_id=<id>
event=tool_execute_complete tool=read_file request_id=<id>
event=request_complete request_id=<id> intent=execution confidence=0.95 handler=worker total_latency_ms=1765
Enable debug router logging:
DEBUG_ROUTER=true LOG_LEVEL=DEBUG docker compose up --build
Inspect the routing table:
curl http://localhost:8000/debug/routes
# β {"routes":{"execution":"worker","planning":"worker","analysis":"worker","ambiguous":"clarify"}}
docs/runtime.mdβ runtime architecture, pipeline, and failure behaviourdocs/module_development.mdβ how to build new modulesdocs/distributions.mdβ how to build a distributionDEVELOPMENT.mdβ developer guide for extending the projectTESTING.mdβ how to validate the systemIMPLEMENTATION.mdβ full implementation description