Closed
Description
Proposed API
from llama_cpp import Llama
llama = Llama.from_pretrained(
"TheBloke/dolphin-2_6-phi-2-GGUF",
...
n_gpu_layers=-1
)
This will likely be implemented via the huggingface_hub
package, I intend to keep this optional and just throw an error if you try to use from_pretrained
without it installed.
Questions
- Pull full repo or just single file?
- Which quant level to use?