Fix: Propagate flash attn to model loader #1424

dthuerck · 2024-05-03T05:39:31Z

I noticed that even though setting flash_attn to true in my model config file, llama.cpp kept reporting llama_new_context_with_model: flash_attn = 0. This super-small PR fixes that - turns out the setting wasn't passed on to the model loader.

abetlen · 2024-05-03T16:16:51Z

@dthuerck thank you!

BadisG · 2024-05-08T15:12:58Z

I installed the latest version of llama_cpp_python (0.2.70) with this command:

pip install llama-cpp-python \
  --extra-index-url https://abetlen.github.io/llama-cpp-python/whl/cu121

But after using it on oobabooga's software (llama_cpp_hf), I still have this flash_attn = 0 issue:

llama_new_context_with_model: n_ctx      = 2048
llama_new_context_with_model: n_batch    = 2048
llama_new_context_with_model: n_ubatch   = 512
llama_new_context_with_model: flash_attn = 0
llama_new_context_with_model: freq_base  = 8000000.0
llama_new_context_with_model: freq_scale = 1
llama_kv_cache_init:      CUDA0 KV buffer size =  1728.00 MiB
llama_kv_cache_init:      CUDA1 KV buffer size =   832.00 MiB
llama_new_context_with_model: KV self size  = 2560.00 MiB, K (f16): 1280.00 MiB, V (f16): 1280.00 MiB
llama_new_context_with_model:  CUDA_Host  output buffer size =     0.98 MiB
llama_new_context_with_model: pipeline parallelism enabled (n_copies=4)
llama_new_context_with_model:      CUDA0 compute buffer size =   400.01 MiB
llama_new_context_with_model:      CUDA1 compute buffer size =   596.02 MiB
llama_new_context_with_model:  CUDA_Host compute buffer size =    32.02 MiB
llama_new_context_with_model: graph nodes  = 1208
llama_new_context_with_model: graph splits = 3

Propagate flash attn to model load. 8000

3c402c5

abetlen merged commit 2138561 into abetlen:main May 3, 2024

BadisG mentioned this pull request May 23, 2024

"flash_attn = 0" is still still present on the newest llama-cpp-python versions #1479

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix: Propagate flash attn to model loader #1424

Fix: Propagate flash attn to model loader #1424

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Fix: Propagate flash attn to model loader #1424

Fix: Propagate flash attn to model loader #1424

Uh oh!

Conversation

Uh oh!

Uh oh!

Uh oh!

Uh oh!