lmcpp – `llama.cpp`'s `llama-server` for Rust

llm_client will return

lmcpp – `llama.cpp`'s `llama-server` for Rust

Fully Managed

Automated Toolchain – Downloads, builds, and manages the llama.cpp toolchain with [LmcppToolChain].
Supported Platforms – Linux, macOS, and Windows with CPU, CUDA, and Metal support.
Multiple Versions – Each release tag and backend is cached separately, allowing you to install multiple versions of llama.cpp.

Blazing Fast UDS

UDS IPC – Integrates with llama-server’s Unix-domain-socket client on Linux, macOS, and Windows.
Fast! – Is it faster than HTTP? Yes. Is it measurably faster? Maybe.

Fully Typed / Fully Documented

Server Args – All llama-server arguments implemented by [ServerArgs].
Endpoints – Each endpoint has request and response types defined.
Good Docs – Every parameter was researched to improve upon the original llama-server documentation.

CLI Tools & Web UI

lmcpp-toolchain-cli – Manage the llama.cpp toolchain: download, build, cache.
lmcpp-server-cli – Start, stop, and list servers.
Easy Web UI – Use [LmcppServerLauncher::webui] to start with HTTP and the Web UI enabled.

use lmcpp::*;

fn main() -> LmcppResult<()> {
    let server = LmcppServerLauncher::builder()
        .server_args(
            ServerArgs::builder()
                .hf_repo("bartowski/google_gemma-3-1b-it-qat-GGUF")?
                .build(),
        )
        .load()?;

    let res = server.completion(
        CompletionRequest::builder()
            .prompt("Tell me a joke about Rust.")
            .n_predict(64),
    )?;

    println!("Completion response: {:#?}", res.content);
    Ok(())
}

# With default model
cargo run --bin lmcpp-server-cli -- --webui
# Or with a specific model from URL:
cargo run --bin lmcpp-server-cli -- --webui -u https://huggingface.co/bartowski/google_gemma-3-1b-it-qat-GGUF/blob/main/google_gemma-3-1b-it-qat-Q4_K_M.gguf
# Or with a specific local model:
cargo run --bin lmcpp-server-cli -- --webui -l /path/to/local/model.gguf

How It Works

Your Rust App
      │
      ├─→ LmcppToolChain        (downloads / builds / caches)
      │         ↓
      ├─→ LmcppServerLauncher   (spawns & monitors)
      │         ↓
      └─→ LmcppServer           (typed handle over UDS*)
                │
                ├─→ completion()       → text generation
                └─→ other endpoints    → stuff

Endpoints ⇄ Typed Helpers

HTTP Route	Helper on `LmcppServer`	Request type	Response type
`POST /completion`	`completion()`	[`CompletionRequest`]	[`CompletionResponse`]
`POST /infill`	`infill()`	[`InfillRequest`]	[`CompletionResponse`]
`POST /embeddings`	`embeddings()`	[`EmbeddingsRequest`]	[`EmbeddingsResponse`]
`POST /tokenize`	`tokenize()`	[`TokenizeRequest`]	[`TokenizeResponse`]
`POST /detokenize`	`detokenize()`	[`DetokenizeRequest`]	[`DetokenizeResponse`]
`GET /props`	`props()`	–	[`PropsResponse`]
custom	`status()` ¹	–	[`ServerStatus`]
Open AI	`open_ai_v1_*()`	[`serde_json::Value`]	[`serde_json::Value`]

¹ Internal helper for server health.

Supported Platforms

Platform	CPU	CUDA	Metal	Binary Sources
Linux x64	✅	✅	–	Pre-built + Source
macOS ARM	✅	–	✅	Pre-built + Source
macOS x64	✅	–	✅	Pre-built + Source
Windows x64	✅	✅	–	Pre-built + Source

What happened to llm_client?

And llm_devices, llm_testing, llm_prompt, llm_models, and the other crates that used to be in this repo?

I moved cross country and took a long time off.
Supporting local and cloud models exploded complexity.
I realized the goals of llm_client and the goals of most people did not overlap; most people just want an Open AI compatible endpoint. They didn't want a new DSL for building AI agents or low level workflow builders.

So, I decided to narrow my scope, and start fresh. The new goal of this project is to be the best Llama.cpp integration possible.

And so, this repo will stick to the barebones and low level LLM implementation details. Shortly I will rework llm_prompt, and llm_models towards this goal.

Any further tooling built on top of that, will be a different project, which I will link to here once published.

Contact

Shelby Jenkins - Here or Linkedin

Name		Name	Last commit message	Last commit date
Latest commit History 70 Commits
llm_models		llm_models
llm_models_macros		llm_models_macros
llm_prompt		llm_prompt
lmcpp		lmcpp
.gitignore		.gitignore
Cargo.toml		Cargo.toml
LICENSE		LICENSE
README.md		README.md
rustfmt.toml		rustfmt.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

lmcpp – `llama.cpp`'s `llama-server` for Rust

Fully Managed

Blazing Fast UDS

Fully Typed / Fully Documented

CLI Tools & Web UI

How It Works

Endpoints ⇄ Typed Helpers

Supported Platforms

What happened to llm_client?

Contact

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Languages

License

ShelbyJenkins/llm_client

Folders and files

Latest commit

History

Repository files navigation

lmcpp – llama.cpp's llama-server for Rust

Fully Managed

Blazing Fast UDS

Fully Typed / Fully Documented

CLI Tools & Web UI

How It Works

Endpoints ⇄ Typed Helpers

Supported Platforms

What happened to llm_client?

Contact

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Languages

lmcpp – `llama.cpp`'s `llama-server` for Rust

Packages