-
Notifications
You must be signed in to change notification settings - Fork 9.4k
Insights: ggerganov/llama.cpp
Overview
Could not load contribution data
Please try again later
26 Releases published by 1 person
-
b3829
published
Sep 27, 2024 -
b3831
published
Sep 28, 2024 -
b3832
published
Sep 28, 2024 -
b3834
published
Sep 28, 2024 -
b3835
published
Sep 28, 2024 -
b3837
published
Sep 28, 2024 -
b3841
published
Sep 29, 2024 -
b3848
published
Sep 29, 2024 -
b3847
published
Sep 29, 2024 -
b3849
published
Sep 30, 2024 -
b3853
published
Sep 30, 2024 -
b3855
published
Oct 1, 2024 -
b3856
published
Oct 1, 2024 -
b3861
published
Oct 1, 2024 -
b3863
published
Oct 2, 2024 -
b3864
published
Oct 2, 2024 -
b3865
published
Oct 2, 2024 -
b3866
published
Oct 2, 2024 -
b3867
published
Oct 2, 2024 -
b3868
published
Oct 3, 2024 -
b3869
published
Oct 3, 2024 -
b3870
published
Oct 3, 2024 -
b3872
published
Oct 3, 2024 -
b3873
published
Oct 3, 2024 -
b3874
published
Oct 3, 2024 -
b3878
published
Oct 3, 2024
32 Pull requests merged by 20 people
-
metal : fix compute pass descriptor autorelease crash
#9718 merged
Oct 3, 2024 -
ggml-backend : add device description to CPU backend
#9720 merged
Oct 3, 2024 -
ggml: unify backend logging mechanism
#9709 merged
Oct 3, 2024 -
convert : handle tokenizer merges format from transformers 4.45
#9696 merged
Oct 3, 2024 -
rpc : enable vulkan
#9714 merged
Oct 3, 2024 -
[SYCL] Fixed GET_ROWS failing unit-tests for type 1 quantizations
#9711 merged
Oct 3, 2024 -
ggml-backend : add device and backend reg interfaces
#9707 merged
Oct 2, 2024 -
llama : reduce compile time and binary size
#9712 merged
Oct 2, 2024 -
sycl: initial cmake support of SYCL for AMD GPUs
#9658 merged
Oct 2, 2024 -
vulkan : do not use tensor->extra
#9407 merged
Oct 2, 2024 -
make sure params --split and --merge are not specified at same time in gguf-split
#9619 merged
Oct 2, 2024 -
examples : remove benchmark
#9704 merged
Oct 2, 2024 -
Added link to Bielik model
#9591 merged
Oct 1, 2024 -
metal : reduce command encoding overhead
#9698 merged
Oct 1, 2024 -
convert : refactor rope_freqs generation
#9396 merged
Oct 1, 2024 -
Fix Docker ROCM builds, use AMDGPU_TARGETS instead of GPU_TARGETS
#9641 merged
Sep 30, 2024 -
ci : reduce severity of unused Pyright ignore comments
#9697 merged
Sep 30, 2024 -
update transfomers version.
#9694 merged
Sep 30, 2024 -
nix: update flake.lock
#9680 merged
Sep 30, 2024 -
utf-8 fix for windows stdin
#9690 merged
Sep 30, 2024 -
ggml : define missing HWCAP flags
#9684 merged
Sep 29, 2024 -
common: ensure token addition to batch does not exceed llama_batch size
#9668 merged
Sep 29, 2024 -
Use new model class for chameleon conversion
#9683 merged
Sep 29, 2024 -
contrib : add Resources section
#9675 merged
Sep 29, 2024 -
llama : add reranking support
#9510 merged
Sep 28, 2024 -
test-backend-ops : use flops for some performance tests
#9657 merged
Sep 28, 2024 -
vocab: refactor tokenizer to reduce the overhead of creating multi times tokenizer
#9449 merged
Sep 28, 2024 -
Add support for Chameleon
#8543 merged
Sep 28, 2024 -
Docs: Add akx/ollama-dl
#9655 merged
Sep 28, 2024 -
ggml: Add run-time detection of neon, i8mm and sve
#9331 merged
Sep 28, 2024 -
Enable use to the rebar feature to upload buffers to the device.
#9251 merged
Sep 28, 2024 -
cmake : add option for common library
#9661 merged
Sep 27, 2024
13 Pull requests opened by 12 people
-
Update building for Android
#9672 opened
Sep 27, 2024 -
`server`: cancel non-streamed requests w/ closed connection
#9679 opened
Sep 29, 2024 -
musa: add docker image support
#9685 opened
Sep 29, 2024 -
llama : first attempt to implement vision API (WIP)
#9687 opened
Sep 29, 2024 -
added implementation of DRY sampler (post-refactor)
#9702 opened
Oct 1, 2024 -
[SYCL] Implementing async model loading for non mapped memory
#9705 opened
Oct 1, 2024 -
ci : fine-grant permission
#9710 opened
Oct 1, 2024 -
ggml : add metal backend registry / device
#9713 opened
Oct 2, 2024 -
vulkan : add backend registry / device interfaces
#9721 opened
Oct 3, 2024 -
Fixed RNG seed docs
#9723 opened
Oct 3, 2024 -
Don't use a specific version for the main-cmake-pkg (CMake throws and error)
#9730 opened
Oct 3, 2024 -
ggml: Add POOL2D OP for GPU ACC to the Vulkan backend in the MobileVLM model.
#9733 opened
Oct 4, 2024 -
vulkan : add GGML_VK_FORCE_HEAP_INDEX env var
#9734 opened
Oct 4, 2024
35 Issues closed by 13 people
-
Encounter the "newline in constant" error while compiling with MSVC
#8334 closed
Oct 4, 2024 -
Feature Request: Support Codestral Mamba
#8519 closed
Oct 4, 2024 -
BF16 has no CUDA support
#8941 closed
Oct 4, 2024 -
Bug: Unable to load phi3:3B(2.2GB) model on Apple M1 Pro
#9049 closed
Oct 4, 2024 -
Bug: llama3.1 8B GGUF parallel inferring process leads to endless repeating results
#9104 closed
Oct 4, 2024 -
Bug: crash with CUDA graphs on A100
#9727 closed
Oct 3, 2024 -
Feature Request: Unify GGML logging mechanism
#9706 closed
Oct 3, 2024 -
Bug: RPC server doesn't load GPU if I use Vulkan
#8536 closed
Oct 3, 2024 -
Bug: ggml_cuda_host_malloc: failed to allocate 1900,00 MiB of pinned memory: invalid argument
#9629 closed
Oct 2, 2024 -
metal : increase GPU duty-cycle during inference
#9507 closed
Oct 1, 2024 -
Bug: There is an issue to execute llama-baby-llama.
#9478 closed
Oct 1, 2024 -
Bug: llama 3.2 error: Insufficient Memory (00000008:kIOGPUCommandBufferCallbackErrorOutOfMemory)
#9701 closed
Oct 1, 2024 -
Bug: Couldn't load GGUF file into Transformers
#9021 closed
Oct 1, 2024 -
Vulkan adreno error
#9064 closed
Oct 1, 2024 -
llama : refactor llama_vocab
#9369 closed
Sep 30, 2024 -
Feature Request: T-MAC: CPU Renaissance via Table Lookup for Low-Bit LLM Deployment on Edge
#8485 closed
Sep 30, 2024 -
Feature Request: Support vulkan when building on Android
#8933 closed
Sep 30, 2024 -
Bug: Initializing KV Cache Spikes Memory, Crashing on Android
#9671 closed
Sep 29, 2024 -
Bug: llama-parallel crashes when adding more tokens to llama_batch than context size
#9667 closed
Sep 29, 2024 -
llama : support reranking API endpoint and models
#8555 closed
Sep 29, 2024 -
Bug: Tokenizer not working on partial UTF-8 bytes
#8691 closed
Sep 29, 2024 -
Bug: Speed regression from early this year
#8945 closed
Sep 29, 2024 -
Feature Request: Support for fixie-ai/ultravox-v0_3
#9038 closed
Sep 29, 2024 -
Bug: Vulkan not compile
#9582 closed
Sep 28, 2024 -
test-backend-ops performance numbers incorrect
#8898 closed
Sep 28, 2024 -
vulkan backend failed to load models vk::Device::createComputePipeline: ErrorUnknown
#6843 closed
Sep 28, 2024 -
Bug: Slow response times with llama.cpp llama-server
#9013 closed
Sep 28, 2024 -
Bug: Failed to load llama3.1 405b model
#9613 closed
Sep 27, 2024
14 Issues opened by 14 people
-
Potential GPU Usage During CPU Inference (ngl=0)
#9724 opened
Oct 3, 2024 -
Feature Request: SYCL CI online
#9722 opened
Oct 3, 2024 -
Feature Request: GELUTanh Activation Support
#9719 opened
Oct 2, 2024 -
Bug: ggml_vulkan can only Found 1 Vulkan devices.
#9716 opened
Oct 2, 2024 -
Bug: Failed to process regex error with long repeating sequences
#9715 opened
Oct 2, 2024 -
Bug: win-vulkan-x64 crashed since b3831
#9708 opened
Oct 1, 2024 -
Feature Request: Support FlashAttention-3
#9700 opened
Sep 30, 2024 -
Bug: quality decreases in embeddings models
#9695 opened
Sep 30, 2024 -
Bug: cannot find tokenizer merges in model file
#9692 opened
Sep 30, 2024 -
Bug: `illegal hardware instruction` when running on M3 mac Sequoia installed with brew
#9676 opened
Sep 28, 2024 -
Bug: baby-llama fails
#9674 opened
Sep 28, 2024 -
Bug: Issue building hipBLAS error: call to undeclared function '_mm256_dpbusd_epi32'
#9666 opened
Sep 27, 2024 -
Bug: Termux adreno 618 vulkan support
#9664 opened
Sep 27, 2024
63 Unresolved conversations
Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.
-
Add Intel Advanced Matrix Extensions (AMX) support to ggml
#8998 commented on
Sep 30, 2024 • 4 new comments -
llama : initial Mamba-2 support
#9126 commented on
Oct 3, 2024 • 2 new comments -
Add PaliGemma Support
#7553 commented on
Oct 2, 2024 • 1 new comment -
Feature Request: UPX the growing binaries in packaging.
#9018 commented on
Oct 3, 2024 • 0 new comments -
Bug: runtime error in `llama_get_logits_ith` after `simplify Mamba with advanced batch splits` commit.
#9224 commented on
Oct 3, 2024 • 0 new comments -
Feature Request: Please use different name for function and enum type in llama.h
#9262 commented on
Oct 3, 2024 • 0 new comments -
Bug: (Server) Cannot properly cancel a non-stream completion request
#9273 commented on
Oct 3, 2024 • 0 new comments -
Feature Request: support embedding stella_en_400M and stella_en_400M.gguf conversion
#9202 commented on
Oct 2, 2024 • 0 new comments -
server: Bring back multimodal support
#8010 commented on
Oct 2, 2024 • 0 new comments -
Bug: Random inputs generated automatically in llama-cli
#9456 commented on
Oct 2, 2024 • 0 new comments -
Bug: Intel Arc - not working at all
#9106 commented on
Oct 2, 2024 • 0 new comments -
Support for InternVL
#6803 commented on
Oct 2, 2024 • 0 new comments -
Llama-3.2 11B Vision Support
#9643 commented on
Oct 2, 2024 • 0 new comments -
Suport for Jamba JambaForCausalLM
#6372 commented on
Oct 2, 2024 • 0 new comments -
Bug: Failed to convert minicpm-v2.5
#9098 commented on
Oct 2, 2024 • 0 new comments -
Feature Request: Add Support for MllamaForConditionalGeneration to Convert Llama 3.2 Vision Models to GGUF Format
#9663 commented on
Oct 1, 2024 • 0 new comments -
llama : support Mamba-2
#7727 commented on
Oct 1, 2024 • 0 new comments -
[CANN]Bug: Can't compile ggml/src/CMakeFiles/ggml.dir/ggml-cann/acl_tensor.cpp.o
#9560 commented on
Sep 27, 2024 • 0 new comments -
FR: Phi-3-vision-128k-instruct implementation
#7444 commented on
Oct 3, 2024 • 0 new comments -
Support for RecurrentGemma (Gemma with Griffin Architecture)
#6564 commented on
Oct 3, 2024 • 0 new comments -
CUDA non-determinism on identical requests
#2838 commented on
Oct 3, 2024 • 0 new comments -
Bug: [SYCL] crash since b-3805
#9612 commented on
Oct 3, 2024 • 0 new comments -
Feature Request: Add support for Phi-3.5 MoE and Vision Instruct
#9119 commented on
Oct 3, 2024 • 0 new comments -
Feature Request: Standalone Clip Example
#9292 commented on
Oct 4, 2024 • 0 new comments -
Research: Are there any plans to support AIGC models such as flux1.dev?
#9110 commented on
Oct 4, 2024 • 0 new comments -
Support QuaRot quantization scheme
#6444 commented on
Oct 4, 2024 • 0 new comments -
Freshly converted PLaMo fails assertion: vocab.id_to_token.size() == vocab.token_to_id.size()
#5669 commented on
Oct 4, 2024 • 0 new comments -
added implementation of DRY sampler
#6839 commented on
Oct 1, 2024 • 0 new comments -
llama : support Jamba hybrid Transformer-Mamba models
#7531 commented on
Oct 1, 2024 • 0 new comments -
feat: add changes to handle jina v2 chinese code
#7795 commented on
Sep 30, 2024 • 0 new comments -
Support video understanding
#9165 commented on
Sep 29, 2024 • 0 new comments -
Tool call support (Llama 3.x, Functionary v3, Hermes 2 Pro) w/ lazy grammars & minimalist Jinja engine
#9639 commented on
Oct 4, 2024 • 0 new comments -
Bug: Failed to run qwen2-57b-a14b-instruct-fp16.
#9628 commented on
Sep 27, 2024 • 0 new comments -
Error: llama_model_load: error loading model: failed to open ggml-bagel-2.8b-v0.2-q8_0.gguf
#9656 commented on
Sep 27, 2024 • 0 new comments -
Feature Request: Paligemma Support
#9227 commented on
Sep 28, 2024 • 0 new comments -
Bug: ggml_compute_forward_soft_max_f32: Assertion `sum > 0.0' failed.
#9222 commented on
Sep 28, 2024 • 0 new comments -
Feature Request: Add support for chatglm3 in example server.
#9164 commented on
Sep 28, 2024 • 0 new comments -
Feature Request: introduce Tool Call API in server mode
#9031 commented on
Sep 28, 2024 • 0 new comments -
Bug: llamacpp for CPU/GPU (avx avx2) quants IQ1xx, IQ2xx, IQ3xx are overheating (CPU 90C) CPU ryzen 9 7950x3d but IQ4xx and other quants not (CPU 65C)
#8760 commented on
Sep 28, 2024 • 0 new comments -
Feature Request: Add split model support in gguf-py
#9023 commented on
Sep 28, 2024 • 0 new comments -
changelog : `libllama` API
#9289 commented on
Sep 28, 2024 • 0 new comments -
llama : tool for evaluating quantization results per layer
#2783 commented on
Sep 28, 2024 • 0 new comments -
ggml : unified CMake build
#6913 commented on
Sep 28, 2024 • 0 new comments -
metal : compile-time kernel args and params
#4085 commented on
Sep 28, 2024 • 0 new comments -
Support speculative decoding in `server` example
#5877 commented on
Sep 28, 2024 • 0 new comments -
Feature Request: Molmo 72B vision support
#9645 commented on
Sep 28, 2024 • 0 new comments -
Bug: Slow model loading with mmap
#9244 commented on
Sep 29, 2024 • 0 new comments -
Bug: Release build on Windows stuck
#9242 commented on
Sep 29, 2024 • 0 new comments -
Bug: Incorrect operation of the context shift mechanism in some models.
#9238 commented on
Sep 29, 2024 • 0 new comments -
[Bug] LLava 1.6 core dump happened in bicubic_resize.
#9234 commented on
Sep 29, 2024 • 0 new comments -
Bug: Failure when converting model with small hidden_size (64) to GGUF in llama.cpp
#9236 commented on
Sep 29, 2024 • 0 new comments -
Bug: LLaVA 1.6 hallucinates badly with default batch size
#9233 commented on
Sep 29, 2024 • 0 new comments -
Feature Request: Some way to handle KV cache allocation failure during individual slot restore
#9201 commented on
Sep 29, 2024 • 0 new comments -
Bug: igpu
#9153 commented on
Sep 29, 2024 • 0 new comments -
Dynatemp and min_p upgrade?
#9178 commented on
Sep 30, 2024 • 0 new comments -
Bug: Fatal signal 11 (SIGSEGV) on Google Pixel 8 (dart)
#7908 commented on
Sep 30, 2024 • 0 new comments -
UGM tokenizer cost a long time than others
#9180 commented on
Oct 1, 2024 • 0 new comments -
Bug: context extension over self extend exhausts KV cache
#9171 commented on
Oct 1, 2024 • 0 new comments -
Bug: OpenBLAS compile for Android doesn‘t work in Ubuntu 22.04
#9039 commented on
Oct 1, 2024 • 0 new comments -
Feature Request: Support Zyphra/Zamba2-2.7B
#8795 commented on
Oct 1, 2024 • 0 new comments -
Issue: HuggingFace Documentation Refers to Outdated Binaries
#8659 commented on
Oct 1, 2024 • 0 new comments -
Newest apple model unsupported...
#8514 commented on
Oct 1, 2024 • 0 new comments -
Investigate gemma 2 generation quality
#8240 commented on
Oct 1, 2024 • 0 new comments