Crate llm

Expand description

This crate provides a unified interface for loading and using Large Language Models (LLMs). The following models are supported:

At present, the only supported backend is GGML, but this is expected to change in the future.

Example

use std::io::Write;
use llm::Model;

// load a GGML model from disk
let llama = llm::load::<llm::models::Llama>(
    // path to GGML file
    std::path::Path::new("/path/to/model"),
    // llm::ModelParameters
    Default::default(),
    // load progress callback
    llm::load_progress_callback_stdout
)
.unwrap_or_else(|err| panic!("Failed to load model: {err}"));

// use the model to generate text from a prompt
let mut session = llama.start_session(Default::default());
let res = session.infer::<std::convert::Infallible>(
    // model to use for text generation
    &llama,
    // randomness provider
    &mut rand::thread_rng(),
    // the prompt to use for text generation, as well as other
    // inference parameters
    &llm::InferenceRequest {
        prompt: "Rust is a cool programming language because",
        ..Default::default()
    },
    // llm::OutputRequest
    &mut Default::default(),
    // output callback
    |t| {
        print!("{t}");
        std::io::stdout().flush().unwrap();

        Ok(())
    }
);

match res {
    Ok(result) => println!("\n\nInference stats:\n{result}"),
    Err(err) => println!("\n{err}"),
}

Modules

ggml_format
Loading and saving of GGML files.
models
All available models.

Structs

InferenceParameters
The parameters for text generation.
InferenceRequest
Settings specific to InferenceSession::infer.
InferenceSession
An inference session represents the state of the text generation. This holds the full context window, as well as several additional parameters used during sampling.
InferenceSessionConfig
Configuration for an inference session.
InferenceSnapshot
A serializable snapshot of the inference process. Can be restored by calling InferenceSession::from_snapshot.
Loader
A GGML format loader for LLMs.
ModelParameters
Parameters for tuning model instances
OutputRequest
Used in a call to Model::evaluate or InferenceSession::infer to request information from the model. If a value is set to Some, the Vec will be cleared, resized, and filled with the related data.
TokenBias
A list of tokens to bias during the process of inferencing.
TokenUtf8Buffer
Used to buffer incoming tokens until they produce a valid string of UTF-8 text.
UnsupportedModelArchitecture
An unsupported model architecture was specified.
Vocabulary
The vocabulary used by a model.

Enums

ElementType
The type of a value in ggml.
FileType
How the tensors are stored in GGML LLM models.
InferenceError
Errors encountered during the inference process.
LoadError
Errors encountered during the loading process.
LoadProgress
Each variant represents a step within the process of loading the model. These can be used to report progress to the user.
ModelArchitecture
All available model architectures.
ModelKVMemoryType
Allowed types for the model memory K/V tensors.
QuantizeError
Errors encountered during the quantization process.
QuantizeProgress
Progress of quantization.
SnapshotError
Errors encountered during the snapshot process.

Traits

KnownModel
Interfaces for creating and interacting with a large language model with a known type of hyperparameters.
Model
A type-erased model to allow for interacting with a model without knowing its hyperparameters.

Functions

load
Load a GGML model from the path and configure it per the params. The status of the loading process will be reported through load_progress_callback.
load_dynamic
A helper function that loads the specified model from disk using an architecture specified at runtime.
load_progress_callback_stdout
A implementation for load_progress_callback that outputs to stdout.
quantize
Quantizes a model.

Type Definitions

TokenId
The identifier of a token in a vocabulary.