Expand description
This crate provides a unified interface for loading and using Large Language Models (LLMs). The following models are supported:
At present, the only supported backend is GGML, but this is expected to change in the future.
Example
use std::io::Write;
use llm::Model;
// load a GGML model from disk
let llama = llm::load::<llm::models::Llama>(
// path to GGML file
std::path::Path::new("/path/to/model"),
// llm::ModelParameters
Default::default(),
// load progress callback
llm::load_progress_callback_stdout
)
.unwrap_or_else(|err| panic!("Failed to load model: {err}"));
// use the model to generate text from a prompt
let mut session = llama.start_session(Default::default());
let res = session.infer::<std::convert::Infallible>(
// model to use for text generation
&llama,
// randomness provider
&mut rand::thread_rng(),
// the prompt to use for text generation, as well as other
// inference parameters
&llm::InferenceRequest {
prompt: "Rust is a cool programming language because",
..Default::default()
},
// llm::OutputRequest
&mut Default::default(),
// output callback
|t| {
print!("{t}");
std::io::stdout().flush().unwrap();
Ok(())
}
);
match res {
Ok(result) => println!("\n\nInference stats:\n{result}"),
Err(err) => println!("\n{err}"),
}
Modules
- Loading and saving of GGML files.
- All available models.
Structs
- The parameters for text generation.
- Settings specific to InferenceSession::infer.
- An inference session represents the state of the text generation. This holds the full context window, as well as several additional parameters used during sampling.
- Configuration for an inference session.
- A serializable snapshot of the inference process. Can be restored by calling InferenceSession::from_snapshot.
- A GGML format loader for LLMs.
- Parameters for tuning model instances
- Used in a call to Model::evaluate or InferenceSession::infer to request information from the model. If a value is set to
Some
, theVec
will be cleared, resized, and filled with the related data. - A list of tokens to bias during the process of inferencing.
- Used to buffer incoming tokens until they produce a valid string of UTF-8 text.
- An unsupported model architecture was specified.
- The vocabulary used by a model.
Enums
- The type of a value in
ggml
. - How the tensors are stored in GGML LLM models.
- Errors encountered during the inference process.
- Errors encountered during the loading process.
- Each variant represents a step within the process of loading the model. These can be used to report progress to the user.
- All available model architectures.
- Allowed types for the model memory K/V tensors.
- Errors encountered during the quantization process.
- Progress of quantization.
- Errors encountered during the snapshot process.
Traits
- Interfaces for creating and interacting with a large language model with a known type of hyperparameters.
- A type-erased model to allow for interacting with a model without knowing its hyperparameters.
Functions
- Load a GGML model from the
path
and configure it per theparams
. The status of the loading process will be reported throughload_progress_callback
. - A helper function that loads the specified model from disk using an architecture specified at runtime.
- A implementation for
load_progress_callback
that outputs tostdout
. - Quantizes a model.
Type Definitions
- The identifier of a token in a vocabulary.