0% found this document useful (0 votes)

113 views18 pages

POC For LLM Pipeline

Uploaded by

steven.taghealth

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as TXT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

113 views18 pages

POC For LLM Pipeline

Uploaded by

steven.taghealth

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as TXT, PDF, TXT or read online on Scribd

You are on page 1/ 18

Below is a “right-size cheat-sheet” you can hand your DevOps engineer so they can

provision the exact VM (or GPUs) for each piece of the stack, wire Oban on a second
lightweight box — and keep the bill from spiking.

________________

1 Model & service memory realities

Component
Params / binary
Good quant.
RAM / VRAM when loaded
Source
RAGFlow (service)
—
—
16 GB RAM, 4 vCPU min
(ragflow.io)
Qwen 3 Embedding 0.6B
0.6 B
Q4_K_M
≈5 GB RAM / 0 GPU (CPU ok)
(huggingface.co)
Mistral-7B-Instruct
7 B
Q4_K_M
⇢ 11 GB RAM or ≈8 GB VRAM
(huggingface.co)
DeepSeek-R1-dense-6.7B
6.7 B
GGUF Q4
⇢ 12 GB RAM or ≈10 GB VRAM
(reddit.com)
(optional) MagicoderS-CL-7B – great code fixer
7 B
Q4_K_S
≈11 GB RAM
model sizing similar to Mistral
Oban worker service
Elixir app, no LLM
—
512 MB RAM
—

All three LLMs never run simultaneously – the orchestrator spins one container,
shuts it, starts the next.

________________

2 Recommended GCP layout

VM
Purpose
Machine
Est. $/hr†
ragbox (all inference)
RAGFlow + Ollama containers
e2-standard-8 (8 vCPU / 32 GB)
$0.30
• burst GPU
only when using DeepSeek-R1 full speed
n1-standard-4 + 1 × T4 (16 GB)
$0.43
jobsbox
Oban + Cron / Trigger.dev runner
e2-micro (0.25 vCPU / 1 GB)
$0.01

†US-central, on-demand pricing. You’ll shut the GPU VM after each batch.

________________

3 Auto–stop / auto–start pattern

# /usr/local/bin/idle_check.sh (runs every 5 min via cron)
if ! docker ps --format '{{.Names}}' | grep -qE 'mistral|qwen|deepseek'; then
if [[ $(cat /proc/uptime | awk '{print int($1)}') -gt 900 ]]; then # >15 min
idle
gcloud compute instances stop ragbox --zone=us-central1-a --quiet
fi
fi

* Cloud Scheduler fires gcloud compute instances start ragbox every weekday 08:00.
* LangGraph first API hit waits ~45-60 s for Ollama to reload a 7 B model.

Internal traffic between VMs in the same VPC & zone is free, so the orchestrator →
Ollama calls cost $0 in egress. Lock firewalls to 10.0.0.0/24 to avoid DDOS attack
surface.

________________

4 Installation steps for DevOps (revised)

1. Create ragbox VM: e2-standard-8 + 100 GB disk, Ubuntu 22.04, private IP only,
ssh via IAP.

2. Docker-compose (same file as previous answer) + add volumes /models/qwen

/models/mistral /models/deepseek to avoid re-download.

3. jobsbox VM: e2-micro, deploy Oban Phoenix app; point it at Cloud SQL PostgreSQL.

4. Install Ollama models

ollama pull mistral:7b-instruct-q4_K_M

ollama pull qwen3-embedding:0.6b-q4_K_M
ollama pull deepseek-r1:6.7b-q4_0

5. Add Cloud Scheduler + startup/idle scripts shown above.

________________

5 Why this sizing works

* RAGFlow 16 GB baseline + one 7 B model (≈11 GB) ⇒ 27 GB → fits 32 GB VM.

* When you switch to DeepSeek-R1 you either:

* keep CPU mode (12 GB RAM) or

* spin a short-lived T4 GPU box for ~3× throughput.

* Oban and Temporal/Trigger.dev produce HTTP tasks; tiny vCPU suffices.

* Auto-stop after 15 min idle saves 65-75 % of runtime hours.

________________

Ready for hand-off

* Copy the table + scripts into a runbook → DevOps knows exactly which machine
types to click.
* You keep LangGraph prompt work; Dev simply ensures the containers and keys are
live.

Need a Terraform module for the idle-check scheduler or a bash model_switch.sh

script that loads the right LLM container on demand? Tell me and I’ll generate it.

___________________________________________________________________________
0 Prerequisites
* GCP project with the $300 credit.
* gcloud ≥ 462, Docker ≥ 24 on admin laptop.
* A service account ragbox-sa with Compute Admin + IAP-secured Web App User.

gcloud iam service-accounts create ragbox-sa

gcloud projects add-iam-policy-binding $PROJECT --member="serviceAccount:ragbox-
sa@$PROJECT.iam.gserviceaccount.com" --role="roles/compute.admin"

________________

1 Architecture diagram (1-box view)

 +---------+
| IAP LB | https://ragbox.example.com/*
+----+----+
|
443 (IAP) |
┌─────────────────┴───────────────────┐
│ VM 1: ragbox (e2-standard-8) │
│ │
│ Docker compose stack │
│ ┌─────────────┐ 8080 │
│ │ RAGFlow │<──/ │
│ └─────────────┘ │
│ ┌─────────────┐ 11434 │
│ │ Ollama │<──/ │
│ └─────────────┘ │
│ ┌─────────────┐ 9090 │
│ │ MCP │<──/ │
│ └─────────────┘ │
│ ┌─────────────┐ 3001 │
│ │ Flowise UI │<──/ │
│ └─────────────┘ │
│ ┌─────────────┐ 3002 │
│ │ OpenWebUI │<──/ │
│ └─────────────┘ │
│ idle-check.service (stops VM if │
│ no LLM container >15 min) │
└─────────────────────────────────────┘

┌─────────────────────────────────────┐
│ VM 2: jobsbox (e2-micro, 24/7) │
│ Oban worker → Temporal / LangGraph │
└─────────────────────────────────────┘

All traffic stays on internal VPC; external users must pass IAP OAuth.

________________

2 Provision the VMs

# VM 1: 8 vCPU / 32 GB RAM, 100 GB PD
gcloud compute instances create ragbox \
--zone=us-central1-a \
--machine-type=e2-standard-8 \
--service-account=ragbox-sa@$PROJECT.iam.gserviceaccount.com \
--scopes=https://www.googleapis.com/auth/cloud-platform \
--boot-disk-size=100GB \
--metadata-from-file startup-script=ragbox_startup.sh

# VM 2: tiny always-on box for Oban

gcloud compute instances create jobsbox \
--zone=us-central1-a \
--machine-type=e2-micro \
--service-account=ragbox-sa@$PROJECT.iam.gserviceaccount.com \
--tags=internal

________________

ragbox_startup.sh
#!/bin/bash
apt-get update -y && apt-get install -y docker.io docker-compose-plugin
usermod -aG docker $USER
mkdir -p /opt/ai-stack && cd /opt/ai-stack

cat > docker-compose.yml <<'YAML'

version: "3.9"
services:
ragflow:
image: infiniflow/ragflow:latest
ports: ["8080:8080"]
volumes: ["./data/ragflow:/data"]

ollama:
image: ollama/ollama:latest
ports: ["11434:11434"]
volumes: ["./data/ollama:/root/.ollama"]
environment:
- OLLAMA_MODELS=/models
healthcheck: { test: ["CMD", "curl", "-f", "http://localhost:11434"], interval:
30s, timeout: 5s, retries: 5 }

mcp:
image: context7/mcp:latest
ports: ["9090:9090"]

flowise:
image: flowiseai/flowise:latest
ports: ["3001:3000"]
environment:
- PORT=3000
- DATABASE_TYPE=sqlite
- SECRETKEY=$(openssl rand -hex 16)

openwebui:
image: ghcr.io/open-webui/open-webui:latest
ports: ["3002:3000"]
volumes: ["./data/ollama:/root/.ollama"]
YAML

docker compose up -d

# preload models (CPU Q4)

docker exec ollama ollama pull mistral:7b-instruct-q4_K_M
docker exec ollama ollama pull qwen3-embedding:0.6b-q4_K_M
docker exec ollama ollama pull deepseek-r1:6.7b-q4_0

# idle-stop service
cat > /etc/systemd/system/idle-check.service <<'SERVICE'
[Unit]
Description=Stop VM if Ollama idle > 15 min
[Service]
Type=simple
ExecStart=/usr/local/bin/idle-check.sh
Restart=always
RestartSec=300
SERVICE

cat > /usr/local/bin/idle-check.sh <<'SCRIPT'

#!/bin/bash
idle=$(docker ps --format '{{.Names}}' | grep -E 'ollama|mistral|qwen|deepseek' ||
true)
if [ -z "$idle" ]; then
up=$(awk '{print int($1)}' /proc/uptime)
if [ "$up" -gt 900 ]; then
gcloud compute instances stop ragbox --zone=us-central1-a --quiet
fi
fi
SCRIPT
chmod +x /usr/local/bin/idle-check.sh
systemctl daemon-reload && systemctl enable --now idle-check.service

(Startup script ≈ 5 min on first boot.)

________________

3 Identity-Aware Proxy
1. Create an HTTPS external load balancer that fronts ragbox on ports 8080, 3001,
3002.
2. Enable Cloud IAP on the backend service and add your Google Workspace emails as
IAP-Secured Web App Users. (cloud.google.com)
3. Now the URLs:

https://ragbox.example.com/ragflow # RAGFlow UI
https://ragbox.example.com/flowise # prompt playground
https://ragbox.example.com/openwebui # model switch chat

are gated by Google Sign-in.

________________

4 Oban worker box

Deploy your Elixir project (or a minimal app):

sudo apt-get update && sudo apt-get install -y git elixir erlang-dev
git clone https://github.com/myorg/orchestrator.git
cd orchestrator
mix deps.get
MIX_ENV=prod mix release
_build/prod/rel/orchestrator/bin/orchestrator start

Oban connects to Cloud SQL or Neon; Oban.LanggraphRunner hits the LangGraph Cloud
URL.
________________

5 Start / Stop buttons for power users

From Flowise (Admin > Tools > REST):

* POST /api/util/start – gcloud compute instances start ragbox ...

* POST /api/util/stop – gcloud compute instances stop ragbox ...

Add them as custom endpoints in Flowise; UI shows two buttons. Because Flowise
itself is on ragbox, you can also use the GCP mobile app for emergencies.

________________

6 One login for every UI

* IAP handles OAuth; no local passwords.
* Flowise and OpenWebUI: set BASIC_AUTH_USER / BASIC_AUTH_PASS env variables even
though IAP is in front for defense-in-depth.
* RAGFlow has no auth → rely on IAP.

________________

7 Validation checklist (30 minutes)

Check
Command / URL
Docker stack healthy
docker compose ps
Ollama model list
curl http://localhost:11434/api/tags
Flowise opens (IAP)
https://ragbox.example.com/flowise
RAGFlow search
Upload a PDF → /query returns hits
OpenWebUI chat
switch model → ask “Hello”
Idle-stop trigger
docker stop ollama; wait 15 min → VM stops automatically

Once these pass, you—the prompt engineer—can log in, upload docs, craft LangGraph
prompts and never bother DevOps again.

(VM cost ≈ $12–$17/mo at 1¾ h/day runtime; idle-stop prevents surprises.)

(cloud.google.com, hub.docker.com)

Below are three self-contained artifacts you can hand to your DevOps teammate:

1. model_switch.sh – on-VM utility that Flowise or OpenWebUI can trigger to pull /

load / tag a model on demand.
2. Terraform stack (main.tf + variables.tf) – provisions both VMs, firewall,
startup-script injection, and a weekday Cloud Scheduler “auto-start” job (idle-stop
is already handled by the system-unit in ragbox_startup.sh).
3. Flowise “REST Tool” snippet – so the Start/Stop buttons and model-switch call
show up in its UI without extra coding.

Nothing else is required for the POC.

________________

1 model_switch.sh
/usr/local/bin/model_switch.sh — place on ragbox and chmod +x.

#!/bin/bash
#
# model_switch.sh <model-name>
# Pulls model if missing, then sets it as the CURRENT tag for
# OpenWebUI & default LangChain calls.
# Usage: model_switch.sh mistral:7b-instruct-q4_K_M
set -e

MODEL=$1
if [[ -z "$MODEL" ]]; then
echo "Usage: $0 <model-name-in-ollama>"
exit 1
fi

echo "🔄 Pulling $MODEL (if needed)…"

docker exec ollama ollama pull "$MODEL" >/dev/null

echo "🔄 Tagging $MODEL as 'current'…"

docker exec ollama ollama tag "$MODEL" current

echo "✅ Model '$MODEL' is now the default (tag = current)."

How to expose in Flowise
* In Flowise UI → Tools → REST → create endpoint:

* Method: POST
* URL: http://localhost:3001/api/run/model_switch
* Body: { "model": "{{input}}" }

Add a node “Switch Model” in any Flow and you’ll get a button; the input box lets
you paste deepseek-r1:6.7b-q4_0 etc.

(OpenWebUI already detects new tags automatically.)

________________

2 Terraform (“one-click”) stack

variables.tf
variable "project" { type = string }
variable "region" { type = string default = "us-central1" }
variable "zone" { type = string default = "us-central1-a" }
variable "service_account_email" { type = string }
variable "ssh_key" { type = string } # ~/.ssh/id_rsa.pub
main.tf
terraform {
required_providers {
google = { source = "hashicorp/google", version = "~> 5.21" }
}
}

provider "google" {
project = var.project
region = var.region
zone = var.zone
}

# ------------ ragbox ---------------

resource "google_compute_instance" "ragbox" {
name = "ragbox"
machine_type = "e2-standard-8"
zone = var.zone
tags = ["iap-allowed"]

boot_disk {
initialize_params { image = "ubuntu-os-cloud/ubuntu-2204-lts" size = 100 }
}

metadata_startup_script = file("ragbox_startup.sh")
service_account { email = var.service_account_email scopes = ["cloud-platform"] }

network_interface { network = "default" access_config {} }

metadata = { ssh-keys = "ubuntu:${file(var.ssh_key)}" }

}

# ------------ jobsbox --------------

resource "google_compute_instance" "jobsbox" {
name = "jobsbox"
machine_type = "e2-micro"
zone = var.zone
tags = ["internal"]

boot_disk { initialize_params { image = "ubuntu-os-cloud/ubuntu-2204-lts" } }

network_interface { network = "default" access_config {} }
service_account { email = var.service_account_email scopes = ["cloud-platform"] }

metadata = { ssh-keys = "ubuntu:${file(var.ssh_key)}" }

}

# ---------- firewall for IAP LB ----

resource "google_compute_firewall" "iap_allow" {
name = "allow-iap"
network = "default"
allow { protocol = "tcp" ports = ["8080","9090","11434","3001","3002"] }
source_ranges = ["35.235.240.0/20"] # IAP fixed IP range
target_tags = ["iap-allowed"]
}

# ---------- Cloud Scheduler: auto-start weekdays ------

resource "google_cloud_scheduler_job" "start_ragbox" {
name = "start-ragbox-weekdays"
schedule = "50 13 * * 1-5" # 07:50 America/Chicago = 13:50 UTC
time_zone = "America/Chicago"
attempt_deadline = "60s"

http_target {
http_method = "POST"
uri = "https://compute.googleapis.com/compute/v1/projects/$
{var.project}/zones/${var.zone}/instances/ragbox/start"
oidc_token {
service_account_email = var.service_account_email
}
}
}

Run terraform init && terraform apply. Total deploy time ≈ 5 min.

________________

3 Flowise “Start / Stop ragbox” buttons

In Flowise → Tools → REST, add two endpoints:

Action
Method
URL
Body
Start ragbox
POST
https://compute.googleapis.com/compute/v1/projects/${PROJECT}/zones/${ZONE}/
instances/ragbox/start
empty
Stop ragbox
POST
https://compute.googleapis.com/compute/v1/projects/${ZONE}/instances/ragbox/
stop
empty
Auth tab → “OAuth 2 (GCP)” → pick the same ragbox-sa service-account.

Users who sign in via IAP will see two buttons in the Flowise canvas.

________________

That’s all DevOps needs

* ragbox_startup.sh (from previous message) in repo root.
* the Terraform files above.
* model_switch.sh copied onto ragbox.
* Quick doc: how to log in via IAP and where the buttons live.

Everything else—prompt crafting, doc uploads, LangGraph runs—stays yours.

_____________________________________________________________

1 Core LLM & Retrieval Stack

Component
Purpose
Repo / Docs
RAGFlow
Chunk, embed, search documents (20 % overlap, Qwen 3 embed).
https://github.com/infiniflow/ragflow
Ollama
Local model runner (Mistral-7B, DeepSeek-R1, Qwen-3-Embed).
https://github.com/ollama/ollama
Context7 MCP
Code-snippet retriever via /search endpoint.
https://github.com/upstash/context7
Qwen 3 Embedding 0.6 B
Embedding model binary for RAGFlow.
https://github.com/QwenLM/Qwen3-Embedding

________________

2 Orchestration & Workflows

Component
Purpose
Repo / Docs
LangGraph Cloud
Multi-agent state-machine service (Draft → Test → Fix).
https://www.langchain.com/langgraph
LangSmith
Prompt playground & run tracing.
https://www.langchain.com/langsmith
Temporal Cloud
Long-running workflow engine (optional at POC).
https://github.com/temporalio
Trigger.dev
Event / cron glue; calls Temporal or LangGraph.
https://github.com/triggerdotdev/trigger.dev
Oban
Job queue inside Elixir app that calls LangGraph.
https://github.com/sorentwo/oban

________________

3 “Ops” Services & Datastores

Component
Purpose
Repo / Docs

Docker + Docker-Compose
Container runtime on VMs.
https://docs.docker.com/compose/
Terraform
Infra-as-code for VMs, firewall, scheduler.
https://github.com/hashicorp/terraform
GCP Compute Engine
VMs (e2-standard-8, e2-micro).
https://cloud.google.com/compute
GCP Cloud Scheduler
Auto-start ragbox weekdays 07:50.
https://cloud.google.com/scheduler
GCP Identity-Aware Proxy (IAP)
OAuth-gate every UI port.
https://cloud.google.com/iap

________________

4 Local Admin & Testing UIs

Component
Purpose
Repo / Docs
Flowise AI
Drag-drop doc upload, retriever & chat playground.
https://github.com/FlowiseAI/Flowise
Open WebUI
Ollama model switcher & side-by-side chat.
https://github.com/open-webui/open-webui

________________

5 Utility / Support Scripts (included in run-book)

Script
Path on ragbox
Notes
ragbox_startup.sh
metadata-startup-script
Installs Docker, pulls containers, preloads models, sets idle-stop service.
idle-check.service + idle-check.sh
/etc/systemd/system/ & /usr/local/bin/
Stops VM after 15 min with no LLM containers.
model_switch.sh
/usr/local/bin/
model_switch.sh mistral:7b-instruct-q4_K_M tags model current.
Terraform files
infra/main.tf, infra/variables.tf
Provisions ragbox & jobsbox, firewall, Cloud Scheduler.
Flowise REST endpoints
Created in Flowise UI
POST /model_switch, /start, /stop hit scripts & GCP API.

________________

6 Credentials / Keys to hand over

Key
For
Where to generate
LANGGRAPH_API_KEY
LangGraph Cloud
LangGraph console → “Settings → API Keys”.
ragbox-sa service-account key
Terraform + idle-stop
GCP IAM → JSON key (store in Secret Manager).

BASIC_AUTH_USER/PASS
Flowise, Open WebUI
Put in docker-compose.yml env.

Everything else (docs upload, prompt editing, model comparison) you can handle
through Flowise / OpenWebUI once DevOps deploys.

Hand this inventory plus the run-book from the previous message, and your DevOps
engineer has every repo, credential pointer, and script needed to spin up the full
POC in a single work-day.

________________________________________________________________________

Below is a copy-pastable run-book you can hand to your DevOps teammate. It wires
together RAGFlow → CTM → Oban → LangGraph and exposes progress on a Focalboard
Kanban board. An optional snippet shows how to switch CTM to OpenAI o3 instead of a
local model.

https://github.com/eyaltoledano/claude-task-master/tree/main
https://github.com/mattermost-community/focalboard
________________

flowchart TD
subgraph Ingestion
A[RAGFlow (Qdrant + Qwen 3)]--/query-->C
B[Context7 (code MCP)]--/search-->C
end
subgraph Planning
C[Claude-Task-Master CLI]-->.task.yaml-->D
D[Focalboard card "Backlog"]-.REST .->|
status:"Backlog"|Focal
end
subgraph Execution
D--enqueue-->E[Oban TaskRunner]
E--spec_yaml-->G[LangGraph Pipeline]
G--green-->H(Git PR + Focal "Done")
G--fail-->E
end
1 Prerequisites
Host
Purpose
ragbox VM
Qdrant + RAGFlow REST (http://ragbox:8080)
langgraph-runner (container or Fly app)
Executes Draft → Test → Refine chain
Postgres
Oban jobs + Focalboard DB (single instance is fine)

________________

2 Install Focalboard (UI)

docker run -d \
--name focalboard \
-p 8000:8000 \
mattermost/focalboard # official image :contentReference[oaicite:0]{index=0}

Open http://SERVER:8000, create an admin user, then a board named “Phase-1
Backlog”. From “Profile → Security” copy the Personal Access Token — we’ll call it
FOCAL_TOKEN.
________________

3 Install & configure CTM

python3 -m venv ~/.ctm
source ~/.ctm/bin/activate
pip install "claude_task_master @ git+https://github.com/eyaltoledano/claude-task-
master"

# ~/.ctmrc
contextProviders:
- name: ragflow
url: http://ragbox:8080/query
weight: 3
- name: context7
url: http://context7:9090/search
weight: 1

postTaskValidators:
- complete_file_paths
- compile_elixir # see
docs/configuration.md :contentReference[oaicite:1]{index=1}

maxParallelTasks: 3

Optional – use OpenAI o3 for planning:

export OPENAI_API_KEY=sk-prod-*********
export OPENAI_BASE_URL=https://api.openai.com/v1
ctm plan specs/epic.md --model gpt-4o

(If those env vars are unset, CTM falls back to the --local --ollama-url flags you
already use.)

________________

4 Bridge CTM → Focalboard {scripts/ctm_to_focal.exs}

Mix.install([{:httpoison, "~> 2.2"}, {:yaml_elixir, "~> 2.9"}])

[focal_url, token, board_id] = System.argv()

for path <- Path.wildcard("tasks/*.task.yaml") do

spec = YamlElixir.read_from_file!(path)
# create card
body = %{
board_id: board_id,
title: spec["title"],
description: "```yaml\n#{File.read!(path)}\n```",
fields: %{status: "Backlog"}
}
HTTPoison.post!(
"#{focal_url}/api/v1/cards",
Jason.encode!(body),
[{"Authorization", "Bearer #{token}"}, {"Content-Type", "application/json"}]
)
end

Run after ctm plan:

mix run scripts/ctm_to_focal.exs \

http://focalboard:8000 \
$FOCAL_TOKEN \
$BOARD_ID

Cards land in “Backlog” column. Devs drag to “In Progress” to trigger the pipeline.

________________

5 Oban worker → LangGraph

# mix.exs
defp deps, do: [
{:oban, "~> 2.16"},
{:yaml_elixir, "~> 2.9"},
{:httpoison, "~> 2.2"} # reuse for focal patch
]

# lib/task_runner.ex
defmodule TaskRunner do
use Oban.Worker, queue: :tasks
def perform(%Oban.Job{args: %{"yaml" => path, "card_id" => card_id}}) do
spec = File.read!(path)
model = if YAML.get(spec, "difficulty") == 3, do: "gpt-4o", else: "mistral:7b"

result = Langgraph.run(
template: YAML.get(spec, "promptTemplate", "draft.j2"),
spec_yaml: spec,
context: Retriever.merged(top_k: 8),
model: model
)

status = if result.status == :ok, do: "Done", else: "UnitTest:Fail"

patch_card(card_id, status)
end

defp patch_card(card_id, status) do

HTTPoison.put!(
"#{System.fetch_env!("FOCAL_URL")}/api/v1/cards/#{card_id}",
Jason.encode!(%{fields: %{status: status}}),
[{"Authorization", "Bearer #{System.get_env("FOCAL_TOKEN")}"},
{"Content-Type", "application/json"}])
end
end

Webhook (Focalboard → Oban):

post "/focalhook", FocalController, :move

# In :move detect column change to "In Progress" → enqueue Oban job with card_id +
yaml path

________________

6 CI helper (GitHub Actions)

name: CTM Plan & Board
on:
push:
paths: ["specs/**", "docs/**"]

jobs:
plan:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Plan tasks
run: |
source ~/.ctm/bin/activate
ctm plan specs/epic.md --local --ollama-url http://ragbox:11434
- name: Populate Focalboard
env:
FOCAL_TOKEN: ${{ secrets.FOCAL_TOKEN }}
BOARD_ID: ${{ secrets.BOARD_ID }}
run: |
mix run scripts/ctm_to_focal.exs http://focalboard:8000 $FOCAL_TOKEN
$BOARD_ID

Now every doc change refreshes tasks & backlog cards.

________________

7 Developer workflow
Action
Who
Result
Drag card → In Progress
Dev
Focal webhook enqueues Oban job.
LangGraph pipeline finishes
TaskRunner updates card status to Done or UnitTest:Fail.

Merge PR
GitHub Action moves card to Released lane.

Jira eliminated. All activity surfaces in Focalboard, and CTM + LangGraph run
head-less in the background.

LLM Roadmap
No ratings yet
LLM Roadmap
23 pages
AI Engineer Resume
No ratings yet
AI Engineer Resume
2 pages
My CV
No ratings yet
My CV
2 pages
Senior Big Data Engineer Profile
No ratings yet
Senior Big Data Engineer Profile
6 pages
MLOps Interview Q&A Guide 2024
No ratings yet
MLOps Interview Q&A Guide 2024
19 pages
Kubernetes For Generative AI Solutions - Sukirti GuptaSukirti Gupta
100% (1)
Kubernetes For Generative AI Solutions - Sukirti GuptaSukirti Gupta
334 pages
Sandeep Interview
No ratings yet
Sandeep Interview
27 pages
LLM Guide for Interns
No ratings yet
LLM Guide for Interns
4 pages
Probability and Statistics For ML - Cwa
No ratings yet
Probability and Statistics For ML - Cwa
822 pages
AgenticAi Roadmap
No ratings yet
AgenticAi Roadmap
9 pages
Train With Shubham Syllabus
No ratings yet
Train With Shubham Syllabus
61 pages
LLM Development Pipeline
No ratings yet
LLM Development Pipeline
101 pages
Hugging Face Case Study 112023
No ratings yet
Hugging Face Case Study 112023
2 pages
Lab7 LLM Chains
No ratings yet
Lab7 LLM Chains
7 pages
Llms and Generative Ai For Healthcare (Early Release) Kerrie Holley PDF Download
No ratings yet
Llms and Generative Ai For Healthcare (Early Release) Kerrie Holley PDF Download
156 pages
Transformer Architecture Explained
No ratings yet
Transformer Architecture Explained
8 pages
Pytest Pres
No ratings yet
Pytest Pres
51 pages
LLM Ai Interview SS
No ratings yet
LLM Ai Interview SS
187 pages
The Rise of Vector Databases in The Age of LLMs
No ratings yet
The Rise of Vector Databases in The Age of LLMs
26 pages
Deep Learning With Databricks: Srijith Rajamohan, Ph.D. John O'Dwyer
No ratings yet
Deep Learning With Databricks: Srijith Rajamohan, Ph.D. John O'Dwyer
38 pages
Detailed LangChain Interview Questions
No ratings yet
Detailed LangChain Interview Questions
4 pages
RAG Systems Evaluation Guide
No ratings yet
RAG Systems Evaluation Guide
8 pages
Cody Mckeand Resume-Lang
No ratings yet
Cody Mckeand Resume-Lang
5 pages
6 Types of Neural Network
No ratings yet
6 Types of Neural Network
8 pages
LLM Test Case Generation for Software
No ratings yet
LLM Test Case Generation for Software
6 pages
Rakesh Kumar - Data Scientist
No ratings yet
Rakesh Kumar - Data Scientist
3 pages
Understanding Vector Embeddings
No ratings yet
Understanding Vector Embeddings
14 pages
Lang Chain
No ratings yet
Lang Chain
143 pages
MLOps Notes
100% (1)
MLOps Notes
48 pages
Lang Graph
100% (1)
Lang Graph
113 pages
Ramya Gen AI ML Engineer
No ratings yet
Ramya Gen AI ML Engineer
5 pages
5 Techiques To FineTune LLMs
No ratings yet
5 Techiques To FineTune LLMs
7 pages
Machine Learning
No ratings yet
Machine Learning
31 pages
AI Engineer Interview Q&A Guide
No ratings yet
AI Engineer Interview Q&A Guide
27 pages
MLOPS Notes
100% (1)
MLOPS Notes
5 pages
GenAI Pinnacle Plus Brochure
No ratings yet
GenAI Pinnacle Plus Brochure
10 pages
Generative AI With Large Language Models AWS & DeepLearning
100% (1)
Generative AI With Large Language Models AWS & DeepLearning
96 pages
Top 50 GenAI Interview Questions
100% (1)
Top 50 GenAI Interview Questions
3 pages
Slides Deep Learning On AWS With NVIDIA From Training To Deployment
No ratings yet
Slides Deep Learning On AWS With NVIDIA From Training To Deployment
48 pages
Django Orm Vs Elasticsearch DSL Cheat Sheet
No ratings yet
Django Orm Vs Elasticsearch DSL Cheat Sheet
1 page
Graph-Based RAG with Neo4j
No ratings yet
Graph-Based RAG with Neo4j
13 pages
Deeskhith Resume
No ratings yet
Deeskhith Resume
2 pages
Codility Lessons
No ratings yet
Codility Lessons
48 pages
Llms and Generative Ai For Healthcare (Early Release) Kerrie Holley
No ratings yet
Llms and Generative Ai For Healthcare (Early Release) Kerrie Holley
49 pages
AI Engineer Resume: Generative AI & NLP
No ratings yet
AI Engineer Resume: Generative AI & NLP
3 pages
LLaMa Model Hallucination Analysis
No ratings yet
LLaMa Model Hallucination Analysis
3 pages
Top 45 Machine Learning Interview Questions in 2025
100% (1)
Top 45 Machine Learning Interview Questions in 2025
37 pages
DSPy: Streamlined LLM Programming
No ratings yet
DSPy: Streamlined LLM Programming
12 pages
Fast Python High Performance Techniques For Large Datasets MEAP V10 Tiago Rodrigues Antao Instant Download
No ratings yet
Fast Python High Performance Techniques For Large Datasets MEAP V10 Tiago Rodrigues Antao Instant Download
110 pages
Combining AI With React For A Smarter Frontend - The New Stack
No ratings yet
Combining AI With React For A Smarter Frontend - The New Stack
9 pages
Advanced LangChain AI Assistant Framework For Comp
No ratings yet
Advanced LangChain AI Assistant Framework For Comp
7 pages
Bedrock Doc 1
No ratings yet
Bedrock Doc 1
4 pages
AI & MLOps for Decision Makers
No ratings yet
AI & MLOps for Decision Makers
36 pages
Agentic AI Engineer
No ratings yet
Agentic AI Engineer
2 pages
Kubernetes For MLOps Engineers
No ratings yet
Kubernetes For MLOps Engineers
7 pages
MLOPs Artem Koval
No ratings yet
MLOPs Artem Koval
38 pages
Optimizing Long-Context LLMs in RAG
No ratings yet
Optimizing Long-Context LLMs in RAG
34 pages
Building Large Language Models (LLM) - A Step-By-Step Guide - SaberiKamarposhti, Morteza - 2024
No ratings yet
Building Large Language Models (LLM) - A Step-By-Step Guide - SaberiKamarposhti, Morteza - 2024
374 pages
Docker 1
No ratings yet
Docker 1
5 pages
Kragentic Orchestrator
No ratings yet
Kragentic Orchestrator
22 pages
Programming Phoenix Liveview Interactive Elixir Web Programming Without Writing Any Javascript 1nbsped 1680508210 9781680508215 Compress
No ratings yet
Programming Phoenix Liveview Interactive Elixir Web Programming Without Writing Any Javascript 1nbsped 1680508210 9781680508215 Compress
330 pages
Krg-Ui For Oban Supervisor
No ratings yet
Krg-Ui For Oban Supervisor
3 pages
This Is Probably A Better Explanation of Code Pipeline
No ratings yet
This Is Probably A Better Explanation of Code Pipeline
17 pages
Stop Getting Outdated Elixir Code From AI YouTube Transcript
No ratings yet
Stop Getting Outdated Elixir Code From AI YouTube Transcript
5 pages
ChatGPT AI Vs Human Cost & LLM Orchestrator
No ratings yet
ChatGPT AI Vs Human Cost & LLM Orchestrator
373 pages
Appliance Repair Troubleshooting
No ratings yet
Appliance Repair Troubleshooting
16 pages
Admit Card
No ratings yet
Admit Card
5 pages
(SJ00DCM04-01E) - 1.1 - AXI Programming Manual - Mail
No ratings yet
(SJ00DCM04-01E) - 1.1 - AXI Programming Manual - Mail
194 pages
ADANI Medical MAMMOSCAN Leaflet A3 Eng 020916
No ratings yet
ADANI Medical MAMMOSCAN Leaflet A3 Eng 020916
2 pages
Multiplying Polynomials
No ratings yet
Multiplying Polynomials
16 pages
Compiler Design - Practice Set 1
No ratings yet
Compiler Design - Practice Set 1
3 pages
Reported Speech - Transformation Exercise 2 - ELTbase
No ratings yet
Reported Speech - Transformation Exercise 2 - ELTbase
5 pages
Biometric Change Form-1
No ratings yet
Biometric Change Form-1
1 page
Engaging Developers in Threat Modeling
No ratings yet
Engaging Developers in Threat Modeling
15 pages
NVIDIA
No ratings yet
NVIDIA
16 pages
Amadeus NDC OrderReshop Guide
No ratings yet
Amadeus NDC OrderReshop Guide
295 pages
5G RAN Feature Documentation 5G RAN2.1 - 07 20201229173029
No ratings yet
5G RAN Feature Documentation 5G RAN2.1 - 07 20201229173029
54 pages
Untitled
No ratings yet
Untitled
40 pages
Safety Management Issues in Sustainable, Humancentric, and Resilient Industrial Systems
No ratings yet
Safety Management Issues in Sustainable, Humancentric, and Resilient Industrial Systems
22 pages
Scheme of Work: Cambridge O Level Computer Science 2210
No ratings yet
Scheme of Work: Cambridge O Level Computer Science 2210
41 pages
Judge Chen's Costs Order in Prenda-Navasca Case
No ratings yet
Judge Chen's Costs Order in Prenda-Navasca Case
9 pages
3187930-Wall-Mounted Cooling Unit Blue E+
No ratings yet
3187930-Wall-Mounted Cooling Unit Blue E+
5 pages
Vishal Iti
No ratings yet
Vishal Iti
1 page
LA 27001 Requirements
No ratings yet
LA 27001 Requirements
2 pages
Python Project: Parking Management System
100% (4)
Python Project: Parking Management System
21 pages
Mercusys AC1200 Wireless Dual Band Router
No ratings yet
Mercusys AC1200 Wireless Dual Band Router
67 pages
CIS
No ratings yet
CIS
17 pages
4 - Samuel Karanja Kabini-FInal Paper1
No ratings yet
4 - Samuel Karanja Kabini-FInal Paper1
12 pages
Testbank Renegade Verified PDF
No ratings yet
Testbank Renegade Verified PDF
412 pages
Chapter 10
No ratings yet
Chapter 10
30 pages
Monthly Marketing Reporting Template - HubSpot
No ratings yet
Monthly Marketing Reporting Template - HubSpot
7 pages
Capacity Management
No ratings yet
Capacity Management
17 pages
Cisco High CPU Utilization Guide
No ratings yet
Cisco High CPU Utilization Guide
25 pages
Whats New SW 2025
No ratings yet
Whats New SW 2025
311 pages
22CA018 - Operating System Concepts
No ratings yet
22CA018 - Operating System Concepts
8 pages

POC For LLM Pipeline

Uploaded by

POC For LLM Pipeline

Uploaded by

Below is a “right-size cheat-sheet” you can hand your DevOps engineer so they can

1 Model & service memory realities

2 Recommended GCP layout

3 Auto–stop / auto–start pattern

4 Installation steps for DevOps (revised)

2. Docker-compose (same file as previous answer) + add volumes /models/qwen

4. Install Ollama models

ollama pull mistral:7b-instruct-q4_K_M

5 Why this sizing works

* When you switch to DeepSeek-R1 you either:

* keep CPU mode (12 GB RAM) or

* Oban and Temporal/Trigger.dev produce HTTP tasks; tiny vCPU suffices.

* Auto-stop after 15 min idle saves 65-75 % of runtime hours.

Ready for hand-off

Need a Terraform module for the idle-check scheduler or a bash model_switch.sh

gcloud iam service-accounts create ragbox-sa

1 Architecture diagram (1-box view)

2 Provision the VMs

# VM 2: tiny always-on box for Oban

cat > docker-compose.yml <<'YAML'

# preload models (CPU Q4)

cat > /usr/local/bin/idle-check.sh <<'SCRIPT'

4 Oban worker box

5 Start / Stop buttons for power users

* POST /api/util/start – gcloud compute instances start ragbox ...

6 One login for every UI

7 Validation checklist (30 minutes)

(VM cost ≈ $12–$17/mo at 1¾ h/day runtime; idle-stop prevents surprises.)

1. model_switch.sh – on-VM utility that Flowise or OpenWebUI can trigger to pull /

Nothing else is required for the POC.

echo "🔄 Pulling $MODEL (if needed)…"

echo "🔄 Tagging $MODEL as 'current'…"

echo "✅ Model '$MODEL' is now the default (tag = current)."

(OpenWebUI already detects new tags automatically.)

2 Terraform (“one-click”) stack

# ------------ ragbox ---------------

network_interface { network = "default" access_config {} }

metadata = { ssh-keys = "ubuntu:${file(var.ssh_key)}" }

# ------------ jobsbox --------------

boot_disk { initialize_params { image = "ubuntu-os-cloud/ubuntu-2204-lts" } }

metadata = { ssh-keys = "ubuntu:${file(var.ssh_key)}" }

# ---------- firewall for IAP LB ----

# ---------- Cloud Scheduler: auto-start weekdays ------

3 Flowise “Start / Stop ragbox” buttons

That’s all DevOps needs

Everything else—prompt crafting, doc uploads, LangGraph runs—stays yours.

1 Core LLM & Retrieval Stack

2 Orchestration & Workflows

3 “Ops” Services & Datastores

4 Local Admin & Testing UIs

5 Utility / Support Scripts (included in run-book)

6 Credentials / Keys to hand over

2 Install Focalboard (UI)

3 Install & configure CTM

4 Bridge CTM → Focalboard {scripts/ctm_to_focal.exs}

[focal_url, token, board_id] = System.argv()

for path <- Path.wildcard("tasks/*.task.yaml") do

mix run scripts/ctm_to_focal.exs \

5 Oban worker → LangGraph

status = if result.status == :ok, do: "Done", else: "UnitTest:Fail"

defp patch_card(card_id, status) do

post "/focalhook", FocalController, :move

6 CI helper (GitHub Actions)

You might also like