Comprehensive Local AI LLM System
Architecture v3.0
Executive Summary
This document outlines the architecture for a robust, user-friendly local AI LLM system
designed specifically for mobile ad attribution and offer-wall exploitation. The system
integrates cutting-edge technologies including MCP (Model Context Protocol), A2A (Agent-
to-Agent) communication, and a comprehensive suite of tools to provide Manus AI-like
capabilities locally.
System Overview
The system is designed as a multi-layered architecture that provides:
• Local LLM Core: Ollama-based LLM management with multiple specialized models
• AI Agent Framework: Flowise for visual agent building and n8n for workflow
automation
• Data Layer: Supabase for structured data, Qdrant for vector storage, Neo4j for
knowledge graphs
• Interface Layer: Open WebUI for chat interaction, custom GUI for system management
• Tool Integration: MCP servers for pentesting tools (JADX, MITMPROXY, ADB, Frida, etc.)
• Infrastructure: Caddy for HTTPS, SearXNG for web search, Langfuse for observability
Architecture Layers
1. Infrastructure Layer
1.1 Container Orchestration
• Docker Compose: Primary orchestration for all services
• Service Discovery: Internal DNS resolution between containers
• Volume Management: Persistent storage for databases and configurations
• Network Isolation: Secure internal communication between services
1.2 Reverse Proxy & Security (Caddy)
• Automatic HTTPS: Let's Encrypt certificates for all services
• Service Routing: Subdomain-based routing to internal services
• webui.local.ai → Open WebUI
• n8n.local.ai → n8n Workflow Engine
• flowise.local.ai → Flowise Agent Builder
• qdrant.local.ai → Qdrant Dashboard
• neo4j.local.ai → Neo4j Browser
• search.local.ai → SearXNG
• langfuse.local.ai → Langfuse Dashboard
• Load Balancing: Distribution of requests across service instances
• SSL Termination: Centralized certificate management
2. Data Layer
2.1 Supabase (PostgreSQL + Extensions)
• Primary Database: Structured data storage for:
• Project configurations and settings
• User sessions and authentication
• Pentesting results and findings
• Tool execution logs and history
• Agent workflow definitions
• Real-time Subscriptions: Live updates for collaborative features
• Row Level Security: Fine-grained access control
• API Gateway: RESTful and GraphQL endpoints
• Extensions: pgvector for basic vector operations
2.2 Qdrant (Vector Database)
• High-Performance Vector Storage: Optimized for RAG operations
• Collections:
• pentesting_knowledge : Vulnerability databases, exploit techniques
• code_patterns : Decompiled code snippets and analysis
• network_signatures : Traffic patterns and malicious indicators
• documentation : Tool documentation and usage examples
• Hybrid Search: Combining vector similarity with metadata filtering
• Clustering: Distributed deployment for scalability
2.3 Neo4j (Knowledge Graph)
• Relationship Modeling: Complex connections between:
• Applications and their components
• Vulnerabilities and affected systems
• Exploit chains and attack vectors
• SDK relationships and dependencies
• GraphRAG Integration: Enhanced retrieval for LLM context
• Cypher Queries: Advanced graph traversal and analysis
• APOC Procedures: Extended functionality for data processing
3. LLM Core Layer
3.1 Ollama Engine
• Model Management: Download, update, and switch between models
• Specialized Models:
• llama3.1:70b - General reasoning and analysis
• codellama:34b - Code analysis and generation
• mistral:7b - Fast responses for simple queries
• deepseek-coder:33b - Advanced code understanding
• GPU Acceleration: CUDA support for RTX 5080
• Model Quantization: Optimized memory usage
• API Compatibility: OpenAI-compatible endpoints
3.2 Open WebUI
• Chat Interface: Primary user interaction point
• Multi-Model Support: Switch between Ollama models
• RAG Integration: Connected to Qdrant and Neo4j
• Tool Integration: Direct access to MCP servers
• Session Management: Persistent conversation history
• File Upload: Document analysis and processing
4. AI Agent Layer
4.1 Flowise (Visual Agent Builder)
• Drag-and-Drop Interface: Visual workflow creation
• Agent Templates: Pre-built agents for common pentesting tasks
• Tool Integration: Direct connection to MCP servers
• Multi-Agent Orchestration: Supervisor and worker agent patterns
• Custom Nodes: Specialized nodes for pentesting operations
• Flow Execution: Real-time agent workflow execution
4.2 n8n (Workflow Automation)
• Low-Code Automation: Visual workflow builder
• Extensive Integrations: 400+ pre-built connectors
• Custom Webhooks: API endpoints for external triggers
• Scheduled Execution: Cron-based automation
• Error Handling: Robust error recovery and retry logic
• Data Transformation: Built-in data processing capabilities
5. Tool Integration Layer (MCP Servers)
5.1 MCP Architecture
• Protocol Implementation: Standardized tool communication
• Server Registry: Central management of available tools
• Authentication: Secure tool access and permissions
• Message Routing: Efficient communication between agents and tools
5.2 Pentesting Tool Servers
• JADX MCP Server: APK decompilation and analysis
• MITMPROXY MCP Server: Network traffic interception
• ADB MCP Server: Android device control and automation
• Frida MCP Server: Dynamic instrumentation and hooking
• MobSF MCP Server: Mobile security framework integration
• Apktool MCP Server: APK reverse engineering
• Burp Suite MCP Server: Web application security testing
5.3 Additional Tool Servers
• SearXNG MCP Server: Web search capabilities
• File System MCP Server: Local file operations
• Database MCP Server: Direct database access
• API Testing MCP Server: REST/GraphQL endpoint testing
6. Search and Intelligence Layer
6.1 SearXNG (Metasearch Engine)
• Privacy-Focused Search: No tracking or profiling
• Multi-Engine Aggregation: Results from 249+ search services
• Custom Instances: Self-hosted for complete control
• API Access: Programmatic search capabilities
• Result Filtering: Advanced search parameters and filters
6.2 Intelligence Processing
• Real-time Research: Automated information gathering
• Threat Intelligence: CVE and vulnerability data collection
• SDK Analysis: Automated documentation retrieval
• Exploit Research: Latest bypass techniques and methods
7. Observability Layer
7.1 Langfuse (LLM Observability)
• Trace Collection: Detailed LLM interaction logging
• Performance Metrics: Token usage, latency, and costs
• Evaluation Framework: Model performance assessment
• Prompt Management: Centralized prompt versioning
• Debug Interface: Interactive debugging tools
7.2 System Monitoring
• Container Health: Docker service monitoring
• Resource Usage: CPU, memory, and GPU utilization
• Error Tracking: Centralized error collection and analysis
• Performance Dashboards: Real-time system metrics
8. User Interface Layer
8.1 Primary GUI (PyQt6 Application)
• System Dashboard: Overview of all services and their status
• Project Management: Create, manage, and switch between projects
• Tool Configuration: Setup and configure pentesting tools
• Agent Builder: Visual interface for creating custom agents
• Results Viewer: Comprehensive analysis and reporting interface
8.2 Web Interfaces
• Open WebUI: Primary chat interface for LLM interaction
• n8n Editor: Workflow creation and management
• Flowise Canvas: Visual agent building environment
• Langfuse Dashboard: Observability and analytics
Data Flow Architecture
1. User Interaction Flow
Plain Text
User Input → Open WebUI → LLM Processing → Agent Activation → Tool Execution
→ Result Processing → User Output
2. Agent Workflow Flow
Plain Text
Trigger → n8n Workflow → Flowise Agent → MCP Tool Calls → Data Collection →
Analysis → Action Execution
3. Knowledge Retrieval Flow
Plain Text
Query → Vector Search (Qdrant) → Graph Traversal (Neo4j) → Context Assembly →
LLM Enhancement → Response Generation
Security Architecture
1. Network Security
• Internal Network Isolation: Docker network segmentation
• TLS Encryption: End-to-end encryption for all communications
• Certificate Management: Automated certificate renewal
• Access Control: Role-based permissions and authentication
2. Data Security
• Encryption at Rest: Database and file system encryption
• Secure Storage: Sensitive data protection
• Audit Logging: Comprehensive activity tracking
• Backup Strategy: Automated backup and recovery
3. Tool Security
• Sandboxed Execution: Isolated tool execution environments
• Permission Management: Granular tool access controls
• Input Validation: Secure parameter handling
• Output Sanitization: Safe result processing
Deployment Architecture
1. Local Development
• Docker Compose: Single-machine deployment
• Resource Allocation: Optimized for RTX 5080 and 32GB RAM
• Port Management: Automated port assignment and routing
• Service Dependencies: Proper startup ordering and health checks
2. Production Deployment
• High Availability: Multi-instance service deployment
• Load Balancing: Request distribution and failover
• Monitoring: Comprehensive health and performance monitoring
• Scaling: Horizontal and vertical scaling capabilities
Integration Points
1. LLM Integration
• Ollama API: Direct model interaction
• OpenAI Compatibility: Standard API endpoints
• Custom Embeddings: Specialized embedding models
• Fine-tuning Pipeline: Model customization capabilities
2. Tool Integration
• MCP Protocol: Standardized tool communication
• REST APIs: HTTP-based tool interfaces
• WebSocket Connections: Real-time tool communication
• File System Integration: Direct file access and manipulation
3. Data Integration
• ETL Pipelines: Data extraction, transformation, and loading
• Real-time Sync: Live data synchronization
• Batch Processing: Scheduled data processing tasks
• API Gateways: Unified data access interfaces
Performance Optimization
1. Hardware Utilization
• GPU Acceleration: CUDA optimization for RTX 5080
• Memory Management: Efficient RAM usage for 32GB system
• Storage Optimization: NVMe SSD utilization for fast I/O
• CPU Optimization: Multi-core processing for parallel tasks
2. Software Optimization
• Caching Strategies: Multi-level caching for improved performance
• Connection Pooling: Efficient database connection management
• Async Processing: Non-blocking operations for better responsiveness
• Resource Scheduling: Intelligent resource allocation and prioritization
Scalability Considerations
1. Horizontal Scaling
• Service Replication: Multiple instances of critical services
• Load Distribution: Intelligent request routing
• Data Partitioning: Distributed data storage strategies
• Microservice Architecture: Independent service scaling
2. Vertical Scaling
• Resource Allocation: Dynamic resource adjustment
• Performance Tuning: Optimized configurations for different workloads
• Capacity Planning: Predictive scaling based on usage patterns
• Hardware Upgrades: Support for future hardware improvements
This architecture provides a robust, scalable, and user-friendly foundation for advanced
mobile pentesting and offer-wall exploitation, combining the power of local LLMs with
comprehensive tool integration and intelligent automation.