Deploy intelligence. Open-source infrastructure for AI agents in production.
-
Updated
Mar 11, 2026
8000
Deploy intelligence. Open-source infrastructure for AI agents in production.
Mechanistic analysis of a GPT-2–like model exploring the compositionality gap in transformers. Using Logit Lens and Causal Tracing, the study identifies and overcomes a deep-layer bottleneck via dataset enhancement addressing the stated Compositionality Gap (NeurIPS24).
A mock Azure OpenAI API for seamless testing and development, supporting both streaming and non-streaming responses. Easily emulate OpenAI completions with token-based streaming in a local or Dockerized environment.
My Digital Twin Companion
DeepSeek-V3 Windows Deployment Fixer: Resolves CUDA_ERROR_OUT_OF_MEMORY, missing cublas64_12.dll, and Triton compiler errors. Optimized for RTX 30/40 series GPUs.
Complete guide to deploying private, on-premise AI and LLMs: hardware selection, model comparison (ollama vs vLLM vs llama.cpp), security hardening, and AI governance policy templates. By Petronella Technology Group.
Comprehensive guide to FastAPI, Pydantic, and SQLAlchemy for AI engineers. Learn API design, validation, and ORM workflows with practical examples and setup 🐙
Production-oriented GPU inference stack with FastAPI, Docker, Redis, Prometheus, and Grafana for AI workload serving
A Streamlit-based spam classifier that predicts whether a message is spam or not spam using machine learning.
A minimal, high-performance starter kit for running AI model inference on NVIDIA GPUs using CUDA. Includes environment setup, sample kernels, and guidance for integrating ONNX/TensorRT pipelines for fast, optimized inference on modern GPU hardware.
Full-stack web application (React + Flask) for Multimodal Video Captioning. Deploys the MixCap model (BLIP-2 + Wav2Vec2) to generate video descriptions for end-users.
End-to-end pipeline for deploying deep learning models on edge devices: model conversion, quantization, hardware acceleration, and Android integration.
Skeleton Slack bot that lets developers trigger deployments, rollbacks, and status checks via simple chat commands.
Deployment of a self-hosted LLM infrastructure using Ollama and Open WebUI on Linux, including custom model creation, API integration, and system-level troubleshooting.
Multi-agentic researcher (RAG)
🧠 Инфраструктура для деплоймента Zeroclaw, докерезированная и совместимая с Portainer
🛠 Fix DeepSeek-V3 Windows setup issues by resolving CUDA memory errors, missing DLLs, and PyTorch-CUDA conflicts for smooth local deployment.
Compare PyTorch vs Triton inference latency with CLI tools, benchmarks, and performance plots.
Add a description, image, and links to the ai-deployment topic page so that developers can more easily learn about it.
To associate your repository with the ai-deployment topic, visit your repo's landing page and select "manage topics."