8000 "flash_attn = 0" is still still present on the newest llama-cpp-python versions · Issue #1479 · abetlen/llama-cpp-python · GitHub
[go: up one dir, main page]

Skip to content
"flash_attn = 0" is still still present on the newest llama-cpp-python versions #1479
Closed
@BadisG

Description

@BadisG

Hello,

This PR #1424 was supposed to fix this flash_attn = 0 thing but I'm still seeing it when loading a model on llamacpp_HF with the latest version (0.2.75)

llama_new_context_with_model: n_ctx      = 2048
llama_new_context_with_model: n_batch    = 2048
llama_new_context_with_model: n_ubatch   = 512
llama_new_context_with_model: flash_attn = 0
llama_new_context_with_model: freq_base  = 8000000.0
llama_new_context_with_model: freq_scale = 1
llama_kv_cache_init:      CUDA0 KV buffer size =  1728.00 MiB
llama_kv_cache_init:      CUDA1 KV buffer size =   832.00 MiB
llama_new_context_with_model: KV self size  = 2560.00 MiB, K (f16): 1280.00 MiB, V (f16): 1280.00 MiB
llama_new_context_with_model:  CUDA_Host  output buffer size =     0.98 MiB
llama_new_context_with_model: pipeline parallelism enabled (n_copies=4)
llama_new_context_with_model:      CUDA0 compute buffer size =   400.01 MiB
llama_new_context_with_model:      CUDA1 compute buffer size =   596.02 MiB
llama_new_context_with_model:  CUDA_Host compute buffer size =    32.02 MiB
llama_new_context_with_model: graph nodes  = 1208
llama_new_context_with_model: graph splits = 3

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      0