-
PhD Student at Stanford
- Palo Alto, CA
- simonguo.tech
- @simonguozirui
Stars
Memory Mosaics are networks of associative memories working in concert to achieve a prediction task.
Archon provides a modular framework for combining different inference-time techniques and LMs with just a JSON config file.
Doing simple retrieval from LLM models at various context lengths to measure accuracy
Simple, flexible configuration in pure Python!
Implement a ChatGPT-like LLM in PyTorch from scratch, step by step
Hydragen: High-Throughput LLM Inference with Shared Prefixes
🚀 Awesome System for Machine Learning ⚡️ AI System Papers and Industry Practice. ⚡️ System for Machine Learning, LLM (Large Language Model), GenAI (Generative AI). 🍻 OSDI, NSDI, SIGCOMM, SoCC, MLSy…
A framework for deploying on-demand distributed-trust.
Flash Attention in raw Cuda C beating PyTorch
A high-throughput and memory-efficient inference and serving engine for LLMs
A natural language interface for computers
Building blocks for foundation models.
A list of ICs and IPs for AI, Machine Learning and Deep Learning.
Fast and memory-efficient exact attention
Toy Gaussian Splatting visualization in Unity
A demo pipeline of using Redis as an online feature store with Feast for orchestration and Ray for training and model serving
Composable transformations of Python+NumPy programs: differentiate, vectorize, JIT to GPU/TPU, and more
JAX - A curated list of resources https://github.com/google/jax
A high-performance C++ library for randomized numerical linear algebra
ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator
Development repository for the Triton language and compiler
Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.