Pulse · ggerganov/llama.cpp · GitHub

September 27, 2024 – October 4, 2024

Overview

45 Active pull requests

49 Active issues

26 Releases published by 1 person

b3829
published Sep 27, 2024
b3831
published Sep 28, 2024
b3832
published Sep 28, 2024
b3834
published Sep 28, 2024
b3835
published Sep 28, 2024
b3837
published Sep 28, 2024
b3841
published Sep 29, 2024
b3848
published Sep 29, 2024
b3847
published Sep 29, 2024
b3849
published Sep 30, 2024
b3853
published Sep 30, 2024
b3855
published Oct 1, 2024
b3856
published Oct 1, 2024
b3861
published Oct 1, 2024
b3863
published Oct 2, 2024
b3864
published Oct 2, 2024
b3865
published Oct 2, 2024
b3866
published Oct 2, 2024
b3867
published Oct 2, 2024
b3868
published Oct 3, 2024
b3869
published Oct 3, 2024
b3870
published Oct 3, 2024
b3872
published Oct 3, 2024
b3873
published Oct 3, 2024
b3874
published Oct 3, 2024
b3878
published Oct 3, 2024

32 Pull requests merged by 20 people

metal : fix compute pass descriptor autorelease crash
#9718 merged Oct 3, 2024
ggml-backend : add device description to CPU backend
#9720 merged Oct 3, 2024
ggml: unify backend logging mechanism
#9709 merged Oct 3, 2024
convert : handle tokenizer merges format from transformers 4.45
#9696 merged Oct 3, 2024
rpc : enable vulkan
#9714 merged Oct 3, 2024
[SYCL] Fixed GET_ROWS failing unit-tests for type 1 quantizations
#9711 merged Oct 3, 2024
ggml-backend : add device and backend reg interfaces
#9707 merged Oct 2, 2024
llama : reduce compile time and binary size
#9712 merged Oct 2, 2024
sycl: initial cmake support of SYCL for AMD GPUs
#9658 merged Oct 2, 2024
vulkan : do not use tensor->extra
#9407 merged Oct 2, 2024
make sure params --split and --merge are not specified at same time in gguf-split
#9619 merged Oct 2, 2024
examples : remove benchmark
#9704 merged Oct 2, 2024
Added link to Bielik model
#9591 merged Oct 1, 2024
metal : reduce command encoding overhead
#9698 merged Oct 1, 2024
convert : refactor rope_freqs generation
#9396 merged Oct 1, 2024
Fix Docker ROCM builds, use AMDGPU_TARGETS instead of GPU_TARGETS
#9641 merged Sep 30, 2024
ci : reduce severity of unused Pyright ignore comments
#9697 merged Sep 30, 2024
update transfomers version.
#9694 merged Sep 30, 2024
nix: update flake.lock
#9680 merged Sep 30, 2024
utf-8 fix for windows stdin
#9690 merged Sep 30, 2024
ggml : define missing HWCAP flags
#9684 merged Sep 29, 2024
common: ensure token addition to batch does not exceed llama_batch size
#9668 merged Sep 29, 2024
Use new model class for chameleon conversion
#9683 merged Sep 29, 2024
contrib : add Resources section
#9675 merged Sep 29, 2024
llama : add reranking support
#9510 merged Sep 28, 2024
test-backend-ops : use flops for some performance tests
#9657 merged Sep 28, 2024
vocab: refactor tokenizer to reduce the overhead of creating multi times tokenizer
#9449 merged Sep 28, 2024
Add support for Chameleon
#8543 merged Sep 28, 2024
Docs: Add akx/ollama-dl
#9655 merged Sep 28, 2024
ggml: Add run-time detection of neon, i8mm and sve
#9331 merged Sep 28, 2024
Enable use to the rebar feature to upload buffers to the device.
#9251 merged Sep 28, 2024
cmake : add option for common library
#9661 merged Sep 27, 2024

13 Pull requests opened by 12 people

Update building for Android
#9672 opened Sep 27, 2024
`server`: cancel non-streamed requests w/ closed connection
#9679 opened Sep 29, 2024
musa: add docker image support
#9685 opened Sep 29, 2024
llama : first attempt to implement vision API (WIP)
#9687 opened Sep 29, 2024
added implementation of DRY sampler (post-refactor)
#9702 opened Oct 1, 2024
[SYCL] Implementing async model loading for non mapped memory
#9705 opened Oct 1, 2024
ci : fine-grant permission
#9710 opened Oct 1, 2024
ggml : add metal backend registry / device
#9713 opened Oct 2, 2024
vulkan : add backend registry / device interfaces
#9721 opened Oct 3, 2024
Fixed RNG seed docs
#9723 opened Oct 3, 2024
Don't use a specific version for the main-cmake-pkg (CMake throws and error)
#9730 opened Oct 3, 2024
ggml: Add POOL2D OP for GPU ACC to the Vulkan backend in the MobileVLM model.
#9733 opened Oct 4, 2024
vulkan : add GGML_VK_FORCE_HEAP_INDEX env var
#9734 opened Oct 4, 2024

35 Issues closed by 13 people

Encounter the "newline in constant" error while compiling with MSVC
#8334 closed Oct 4, 2024
Feature Request: Support Codestral Mamba
#8519 closed Oct 4, 2024
BF16 has no CUDA support
#8941 closed Oct 4, 2024
Bug: Unable to load phi3:3B(2.2GB) model on Apple M1 Pro
#9049 closed Oct 4, 2024
Bug: llama3.1 8B GGUF parallel inferring process leads to endless repeating results
#9104 closed Oct 4, 2024
Bug: crash with CUDA graphs on A100
#9727 closed Oct 3, 2024
Feature Request: Unify GGML logging mechanism
#9706 closed Oct 3, 2024
Bug: RPC server doesn't load GPU if I use Vulkan
#8536 closed Oct 3, 2024
Bug: ggml_cuda_host_malloc: failed to allocate 1900,00 MiB of pinned memory: invalid argument
#9629 closed Oct 2, 2024
Running Lllava in interactive mode just Quits after generating response without waiting for next prompt.
#3593 closed Oct 2, 2024
metal : increase GPU duty-cycle during inference
#9507 closed Oct 1, 2024
Bug: There is an issue to execute llama-baby-llama.
#9478 closed Oct 1, 2024
Bug: llama 3.2 error: Insufficient Memory (00000008:kIOGPUCommandBufferCallbackErrorOutOfMemory)
#9701 closed Oct 1, 2024
Bug: On a 3 GPU System [A-C] not using CUDA_VISIBLE_DEVICES but using tensor split [1,1,0] should not allocate ANY memory on GPU C
#8827 closed Oct 1, 2024
Bug: GGML_ASSERT(llama_add_eos_token(model) != 1) failed llama-server critical error with flan-t5 models
#8990 closed Oct 1, 2024
Bug: Couldn't load GGUF file into Transformers
#9021 closed Oct 1, 2024
Vulkan adreno error
#9064 closed Oct 1, 2024
llama : refactor llama_vocab
#9369 closed Sep 30, 2024
On Windows (but not on UNIX) redirecting the stdin of main to a pipe or a file results in wrong decoding of non-ASCII characters
#6294 closed Sep 30, 2024
Feature Request: T-MAC: CPU Renaissance via Table Lookup for Low-Bit LLM Deployment on Edge
#8485 closed Sep 30, 2024
Feature Request: Support vulkan when building on Android
#8933 closed Sep 30, 2024
Bug: Initializing KV Cache Spikes Memory, Crashing on Android
#9671 closed Sep 29, 2024
Bug: llama-parallel crashes when adding more tokens to llama_batch than context size
#9667 closed Sep 29, 2024
llama : support reranking API endpoint and models
#8555 closed Sep 29, 2024
Bug: Can't Convert Meta's Chameleon-7B to GGUF (ERROR:hf-to-gguf:Model ChameleonForConditionalGeneration is not supported)
#9678 closed Sep 29, 2024
Bug: convert_hf_to_gguf.py - Converting HF model to GGUF giving error Missing tokenizer.model - Qwen2.5 based
#9673 closed Sep 29, 2024
Bug: Tokenizer not working on partial UTF-8 bytes
#8691 closed Sep 29, 2024
Bug: Speed regression from early this year
#8945 closed Sep 29, 2024
Feature Request: Support for fixie-ai/ultravox-v0_3
#9038 closed Sep 29, 2024
Bug: Vulkan not compile
#9582 closed Sep 28, 2024
test-backend-ops performance numbers incorrect
#8898 closed Sep 28, 2024
vulkan backend failed to load models vk::Device::createComputePipeline: ErrorUnknown
#6843 closed Sep 28, 2024
Bug: Crashes when attempting to use iGPU: hipStreamCreateWithFlags: Returned hipErrorOutOfMemory : stream:<null>
#8618 closed Sep 28, 2024
Bug: Slow response times with llama.cpp llama-server
#9013 closed Sep 28, 2024
Bug: Failed to load llama3.1 405b model
#9613 closed Sep 27, 2024

14 Issues opened by 14 people

Potential GPU Usage During CPU Inference (ngl=0)
#9724 opened Oct 3, 2024
Feature Request: SYCL CI online
#9722 opened Oct 3, 2024
Feature Request: GELUTanh Activation Support
#9719 opened Oct 2, 2024
Bug: ggml_vulkan can only Found 1 Vulkan devices.
#9716 opened Oct 2, 2024
Bug: Failed to process regex error with long repeating sequences
#9715 opened Oct 2, 2024
Bug: win-vulkan-x64 crashed since b3831
#9708 opened Oct 1, 2024
Feature Request: Support FlashAttention-3
#9700 opened Sep 30, 2024
Bug: quality decreases in embeddings models
#9695 opened Sep 30, 2024
Bug: cannot find tokenizer merges in model file
#9692 opened Sep 30, 2024
Bug: `illegal hardware instruction` when running on M3 mac Sequoia installed with brew
#9676 opened Sep 28, 2024
Bug: baby-llama fails
#9674 opened Sep 28, 2024
Bug: Issue building hipBLAS error: call to undeclared function '_mm256_dpbusd_epi32'
#9666 opened Sep 27, 2024
Bug: nvidia-container-cli: requirement error: unsatisfied condition: cuda>=12.6, please update your driver to a newer version, or use an earlier cuda container: unknown.
#9665 opened Sep 27, 2024
Bug: Termux adreno 618 vulkan support
#9664 opened Sep 27, 2024

63 Unresolved conversations

Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.

Add Intel Advanced Matrix Extensions (AMX) support to ggml
#8998 commented on Sep 30, 2024 • 4 new comments
llama : initial Mamba-2 support
#9126 commented on Oct 3, 2024 • 2 new comments
Add PaliGemma Support
#7553 commented on Oct 2, 2024 • 1 new comment
Feature Request: UPX the growing binaries in packaging.
#9018 commented on Oct 3, 2024 • 0 new comments
Bug: runtime error in `llama_get_logits_ith` after `simplify Mamba with advanced batch splits` commit.
#9224 commented on Oct 3, 2024 • 0 new comments
Feature Request: Please use different name for function and enum type in llama.h
#9262 commented on Oct 3, 2024 • 0 new comments
Bug: (Server) Cannot properly cancel a non-stream completion request
#9273 commented on Oct 3, 2024 • 0 new comments
Feature Request: support embedding stella_en_400M and stella_en_400M.gguf conversion
#9202 commented on Oct 2, 2024 • 0 new comments
server: Bring back multimodal support
#8010 commented on Oct 2, 2024 • 0 new comments
Bug: Random inputs generated automatically in llama-cli
#9456 commented on Oct 2, 2024 • 0 new comments
Bug: Intel Arc - not working at all
#9106 commented on Oct 2, 2024 • 0 new comments
Support for InternVL
#6803 commented on Oct 2, 2024 • 0 new comments
Llama-3.2 11B Vision Support
#9643 commented on Oct 2, 2024 • 0 new comments
Suport for Jamba JambaForCausalLM
#6372 commented on Oct 2, 2024 • 0 new comments
Bug: Failed to convert minicpm-v2.5
#9098 commented on Oct 2, 2024 • 0 new comments
Feature Request: Add Support for MllamaForConditionalGeneration to Convert Llama 3.2 Vision Models to GGUF Format
#9663 commented on Oct 1, 2024 • 0 new comments
llama : support Mamba-2
#7727 commented on Oct 1, 2024 • 0 new comments
[CANN]Bug: Can't compile ggml/src/CMakeFiles/ggml.dir/ggml-cann/acl_tensor.cpp.o
#9560 commented on Sep 27, 2024 • 0 new comments
FR: Phi-3-vision-128k-instruct implementation
#7444 commented on Oct 3, 2024 • 0 new comments
Support for RecurrentGemma (Gemma with Griffin Architecture)
#6564 commented on Oct 3, 2024 • 0 new comments
CUDA non-determinism on identical requests
#2838 commented on Oct 3, 2024 • 0 new comments
Bug: [SYCL] crash since b-3805
#9612 commented on Oct 3, 2024 • 0 new comments
Feature Request: Add support for Phi-3.5 MoE and Vision Instruct
#9119 commented on Oct 3, 2024 • 0 new comments
Feature Request: Standalone Clip Example
#9292 commented on Oct 4, 2024 • 0 new comments
Research: Are there any plans to support AIGC models such as flux1.dev?
#9110 commented on Oct 4, 2024 • 0 new comments
Support QuaRot quantization scheme
#6444 commented on Oct 4, 2024 • 0 new comments
Freshly converted PLaMo fails assertion: vocab.id_to_token.size() == vocab.token_to_id.size()
#5669 commented on Oct 4, 2024 • 0 new comments
added implementation of DRY sampler
#6839 commented on Oct 1, 2024 • 0 new comments
llama : support Jamba hybrid Transformer-Mamba models
#7531 commented on Oct 1, 2024 • 0 new comments
feat: add changes to handle jina v2 chinese code
#7795 commented on Sep 30, 2024 • 0 new comments
Support video understanding
#9165 commented on Sep 29, 2024 • 0 new comments
Tool call support (Llama 3.x, Functionary v3, Hermes 2 Pro) w/ lazy grammars & minimalist Jinja engine
#9639 commented on Oct 4, 2024 • 0 new comments
Bug: Failed to run qwen2-57b-a14b-instruct-fp16.
#9628 commented on Sep 27, 2024 • 0 new comments
Error: llama_model_load: error loading model: failed to open ggml-bagel-2.8b-v0.2-q8_0.gguf
#9656 commented on Sep 27, 2024 • 0 new comments
Feature Request: Paligemma Support
#9227 commented on Sep 28, 2024 • 0 new comments
Bug: ggml_compute_forward_soft_max_f32: Assertion `sum > 0.0' failed.
#9222 commented on Sep 28, 2024 • 0 new comments
Feature Request: Add support for chatglm3 in example server.
#9164 commented on Sep 28, 2024 • 0 new comments
Feature Request: introduce Tool Call API in server mode
#9031 commented on Sep 28, 2024 • 0 new comments
Bug: llamacpp for CPU/GPU (avx avx2) quants IQ1xx, IQ2xx, IQ3xx are overheating (CPU 90C) CPU ryzen 9 7950x3d but IQ4xx and other quants not (CPU 65C)
#8760 commented on Sep 28, 2024 • 0 new comments
Feature Request: Add split model support in gguf-py
#9023 commented on Sep 28, 2024 • 0 new comments
changelog : `libllama` API
#9289 commented on Sep 28, 2024 • 0 new comments
llama : tool for evaluating quantization results per layer
#2783 commented on Sep 28, 2024 • 0 new comments
ggml : unified CMake build
#6913 commented on Sep 28, 2024 • 0 new comments
metal : compile-time kernel args and params
#4085 commented on Sep 28, 2024 • 0 new comments
Support speculative decoding in `server` example
#5877 commented on Sep 28, 2024 • 0 new comments
Feature Request: Molmo 72B vision support
#9645 commented on Sep 28, 2024 • 0 new comments
Bug: Slow model loading with mmap
#9244 commented on Sep 29, 2024 • 0 new comments
Bug: Release build on Windows stuck
#9242 commented on Sep 29, 2024 • 0 new comments
Bug: Incorrect operation of the context shift mechanism in some models.
#9238 commented on Sep 29, 2024 • 0 new comments
[Bug] LLava 1.6 core dump happened in bicubic_resize.
#9234 commented on Sep 29, 2024 • 0 new comments
Bug: Failure when converting model with small hidden_size (64) to GGUF in llama.cpp
#9236 commented on Sep 29, 2024 • 0 new comments
Bug: LLaVA 1.6 hallucinates badly with default batch size
#9233 commented on Sep 29, 2024 • 0 new comments
Feature Request: Some way to handle KV cache allocation failure during individual slot restore
#9201 commented on Sep 29, 2024 • 0 new comments
Bug: igpu
#9153 commented on Sep 29, 2024 • 0 new comments
Dynatemp and min_p upgrade?
#9178 commented on Sep 30, 2024 • 0 new comments
Bug: Fatal signal 11 (SIGSEGV) on Google Pixel 8 (dart)
#7908 commented on Sep 30, 2024 • 0 new comments
UGM tokenizer cost a long time than others
#9180 commented on Oct 1, 2024 • 0 new comments
Bug: context extension over self extend exhausts KV cache
#9171 commented on Oct 1, 2024 • 0 new comments
Bug: OpenBLAS compile for Android doesn‘t work in Ubuntu 22.04
#9039 commented on Oct 1, 2024 • 0 new comments
Feature Request: Support Zyphra/Zamba2-2.7B
#8795 commented on Oct 1, 2024 • 0 new comments
Issue: HuggingFace Documentation Refers to Outdated Binaries
#8659 commented on Oct 1, 2024 • 0 new comments
Newest apple model unsupported...
#8514 commented on Oct 1, 2024 • 0 new comments
Investigate gemma 2 generation quality
#8240 commented on Oct 1, 2024 • 0 new comments