8000 GitHub - ShelbyJenkins/llm_client: The Easiest Rust Interface for Local LLMs and an Interface for Deterministic Signals from Probabilistic LLM Vibes
[go: up one dir, main page]

Skip to content

The Easiest Rust Interface for Local LLMs and an Interface for Deterministic Signals from Probabilistic LLM Vibes

License

Notifications You must be signed in to change notification settings

ShelbyJenkins/llm_client

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

70 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Contributors Forks Stargazers Issues MIT License

llm_client will return

lmcpp – llama.cpp's llama-server for Rust

Fully Managed

  • Automated Toolchain – Downloads, builds, and manages the llama.cpp toolchain with [LmcppToolChain].
  • Supported Platforms – Linux, macOS, and Windows with CPU, CUDA, and Metal support.
  • Multiple Versions – Each release tag and backend is cached separately, allowing you to install multiple versions of llama.cpp.

Blazing Fast UDS

  • UDS IPC – Integrates with llama-server’s Unix-domain-socket client on Linux, macOS, and Windows.
  • Fast! – Is it faster than HTTP? Yes. Is it measurably faster? Maybe.

Fully Typed / Fully Documented

  • Server Args – All llama-server arguments implemented by [ServerArgs].
  • Endpoints – Each endpoint has request and response types defined.
  • Good Docs – Every parameter was researched to improve upon the original llama-server documentation.

CLI Tools & Web UI

  • lmcpp-toolchain-cli – Manage the llama.cpp toolchain: download, build, cache.
  • lmcpp-server-cli    – Start, stop, and list servers.
  • Easy Web UI – Use [LmcppServerLauncher::webui] to start with HTTP and the Web UI enabled.

use lmcpp::*;

fn main() -> LmcppResult<()> {
    let server = LmcppServerLauncher::builder()
        .server_args(
            ServerArgs::builder()
                .hf_repo("bartowski/google_gemma-3-1b-it-qat-GGUF")?
                .build(),
        )
        .load()?;

    let res = server.completion(
        CompletionRequest::builder()
            .prompt("Tell me a joke about Rust.")
            .n_predict(64),
    )?;

    println!("Completion response: {:#?}", res.content);
    Ok(())
}
# With default model
cargo run --bin lmcpp-server-cli -- --webui
# Or with a specific model from URL:
cargo run --bin lmcpp-server-cli -- --webui -u https://huggingface.co/bartowski/google_gemma-3-1b-it-qat-GGUF/blob/main/google_gemma-3-1b-it-qat-Q4_K_M.gguf
# Or with a specific local model:
cargo run --bin lmcpp-server-cli -- --webui -l /path/to/local/model.gguf

How It Works

Your Rust App
      │
      ├─→ LmcppToolChain        (downloads / builds / caches)
      │         ↓
      ├─→ LmcppServerLauncher   (spawns & monitors)
      │         ↓
      └─→ LmcppServer           (typed handle over UDS*)
                │
                ├─→ completion()       → text generation
                └─→ other endpoints    → stuff

Endpoints ⇄ Typed Helpers

HTTP Route Helper on LmcppServer Request type Response type
POST /completion completion() [CompletionRequest] [CompletionResponse]
POST /infill infill() [InfillRequest] [CompletionResponse]
POST /embeddings embeddings() [EmbeddingsRequest] [EmbeddingsResponse]
POST /tokenize tokenize() [TokenizeRequest] [TokenizeResponse]
POST /detokenize detokenize() [DetokenizeRequest] [DetokenizeResponse]
GET  /props props() [PropsResponse]
custom status() ¹ [ServerStatus]
Open AI open_ai_v1_*()  [serde_json::Value] [serde_json::Value]

¹ Internal helper for server health.


Supported Platforms

Platform CPU CUDA Metal Binary Sources
Linux x64  –  Pre-built + Source
macOS ARM  –  Pre-built + Source
macOS x64  –  Pre-built + Source
Windows x64  –  Pre-built + Source

What happened to llm_client?

And llm_devices, llm_testing, llm_prompt, llm_models, and the other crates that used to be in this repo?

  • I moved cross country and took a long time off.
  • Supporting local and cloud models exploded complexity.
  • I realized the goals of llm_client and the goals of most people did not overlap; most people just want an Open AI compatible endpoint. They didn't want a new DSL for building AI agents or low level workflow builders.

So, I decided to narrow my scope, and start fresh. The new goal of this project is to be the best Llama.cpp integration possible.

And so, this repo will stick to the barebones and low level LLM implementation details. Shortly I will rework llm_prompt, and llm_models towards this goal.

Any further tooling built on top of that, will be a different project, which I will link to here once published.

Contact

Shelby Jenkins - Here or Linkedin

About

The Easiest Rust Interface for Local LLMs and an Interface for Deterministic Signals from Probabilistic LLM Vibes

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

0