inference

Star

Here are 1,753 public repositories matching this topic...

vllm-project / vllm

Sponsor

Star

A high-throughput and memory-efficient inference and serving engine for LLMs

Updated Nov 24, 2025
Python

ggml-org / whisper.cpp

Star

Port of OpenAI's Whisper model in C/C++

inference transformer speech-recognition openai speech-to-text whisper

Updated Nov 20, 2025
C++

hpcaitech / ColossalAI

Star

Making large AI models cheaper, faster and more accessible

ai deep-learning hpc distributed-computing inference big-model large-scale data-parallelism model-parallelism pipeline-parallelism foundation-models heterogeneous-training

Updated Nov 13, 2025
Python

deepspeedai / DeepSpeed

Star

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.

machine-learning compression deep-learning gpu inference pytorch zero data-parallelism model-parallelism mixture-of-experts pipeline-parallelism billion-parameters trillion-parameters

Updated Nov 24, 2025
Python

google-ai-edge / mediapipe

Star

Cross-platform, customizable ML solutions for live and streaming media.

android c-plus-plus calculator machine-learning framework computer-vision deep-learning inference pipeline-framework stream-processing video-processing perception mobile-development audio-processing graph-framework graph-based mediapipe

Updated Nov 20, 2025
C++

Tencent / ncnn

Star

ncnn is a high-performance neural network inference framework optimized for the mobile platform

Updated Nov 21, 2025
C++

sgl-project / sglang

Star

SGLang is a fast serving framework for large language models and vision language models.

Updated Nov 24, 2025
Python

SYSTRAN / faster-whisper

Star

Faster Whisper transcription with CTranslate2

deep-learning inference transformer speech-recognition openai speech-to-text quantization whisper

Updated Nov 19, 2025
Python

stas00 / ml-engineering

Star

Machine Learning Engineering Open Book

training debugging machine-learning ai storage network scalability transformers slurm inference pytorch machine-learning-engineering gpus mlops large-language-models llm

Updated Nov 21, 2025
Python

gvergnaud / ts-pattern

Sponsor

Star

🎨 The exhaustive Pattern Matching library for TypeScript, with smart type inference.

javascript typescript matching pattern pattern-matching branching inference ts conditions type-inference exhaustive

Updated Nov 17, 2025
TypeScript

NVIDIA / TensorRT

Star

NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source components of TensorRT.

deep-learning inference nvidia gpu-acceleration tensorrt

Updated Nov 12, 2025
C++

aws / amazon-sagemaker-examples

Star

< 5100 /nav>

Example 📓 Jupyter notebooks that demonstrate how to build, train, and deploy machine learning models using 🧠 Amazon SageMaker.

training aws data-science machine-learning reinforcement-learning deep-learning examples jupyter-notebook inference sagemaker mlops

Updated Sep 30, 2025
Jupyter Notebook

huggingface / text-generation-inference

Star

Large Language Model Text Generation Inference

nlp bloom deep-learning inference pytorch falcon transformer gpt starcoder

Updated Nov 19, 2025
Python

triton-inference-server / server

Star

The Triton Inference Server provides an optimized cloud and edge inferencing solution.

machine-learning cloud deep-learning gpu inference edge datacenter

Updated Nov 24, 2025
Python

openvinotoolkit / openvino

Star

OpenVINO™ is an open source toolkit for optimizing and deploying AI inference

nlp natural-language-processing ai computer-vision deep-learning transformers inference speech-recognition yolo recommendation-system performance-boost good-first-issue openvino diffusion-models stable-diffusion generative-ai llm-inference optimize-ai deploy-ai

Updated Nov 24, 2025
C++

GeeeekExplorer / nano-vllm

Star

Nano vLLM

nlp deep-learning inference pytorch transformer llm

Updated Nov 3, 2025
Python

Swap GPT for any LLM by changing a single line of code. Xinference lets you run open-source, speech, and multimodal models on cloud, on-prem, or your laptop — all through one unified, production-ready inference API.

Updated Nov 24, 2025
Python

oumi-ai / oumi

Star

Easily fine-tune, evaluate and deploy gpt-oss, Qwen3, DeepSeek-R1, or any open source LLM / VLM!

evaluation inference llama fine-tuning sft dpo slms llms vlms gpt-oss gpt-oss-120b gpt-oss-20b

Updated Nov 24, 2025
Python

dusty-nv / jetson-inference

Star

Hello AI World guide to deploying deep-learning inference networks and deep vision primitives with TensorRT and NVIDIA Jetson.

Updated Oct 16, 2025
C++

Linzaer / Ultra-Light-Fast-Generic-Face-Detector-1MB

Star

💎1MB lightweight face detection model (1MB轻量级人脸检测模型)

arm inference face-detection mnn ncnn

Updated Dec 29, 2023
Python

Improve this page

Add a description, image, and links to the inference topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the inference topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

inference

Here are 1,753 public repositories matching this topic...

vllm-project / vllm

ggml-org / whisper.cpp

hpcaitech / ColossalAI

deepspeedai / DeepSpeed

google-ai-edge / mediapipe

Tencent / ncnn

sgl-project / sglang

SYSTRAN / faster-whisper

stas00 / ml-engineering

gvergnaud / ts-pattern

NVIDIA / TensorRT

aws / amazon-sagemaker-examples

huggingface / text-generation-inference

triton-inference-server / server

openvinotoolkit / openvino

GeeeekExplorer / nano-vllm

xorbitsai / inference

oumi-ai / oumi

dusty-nv / jetson-inference

Linzaer / Ultra-Light-Fast-Generic-Face-Detector-1MB

Improve this page

Add this topic to your repo