8000 [falcon] Fix Falcon for rw-1b model by akawrykow · Pull Request #2887 · ggml-org/llama.cpp · GitHub
[go: up one dir, main page]

Skip to content

[falcon] Fix Falcon for rw-1b model #2887

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 9 commits into from
Closed
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Skip qkv reshaping for non-parallel attention
  • Loading branch information
akawrykow committed Aug 29, 2023
commit de64f091c8b0ab1df364f93a2a0396d112f55692
3 changes: 2 additions & 1 deletion convert-falcon-hf-to-gguf.py
Original file line number Diff line number Diff line change
Expand Up @@ -206,6 +206,7 @@ def count_model_parts(dir_model: str) -> int:

# params for qkv transform
head_dim = hparams["hidden_size"] // n_head
parallel_attn = hparams["parallel_attn"]

# tensor info
print("gguf: get tensor metadata")
Expand Down Expand Up @@ -240,7 +241,7 @@ def count_model_parts(dir_model: str) -> int:
# in contiguous fashion.
# ref: https://github.com/jploski/ggml/blob/falcon40b/examples/falcon/convert-hf-to-ggml.py

if "query_key_value" in name:
if "query_key_value" in name and parallel_attn:
qkv = data.view(n_head_kv, n_head // n_head_kv + 2, head_dim, head_dim * n_head)
q = qkv[:, :-2 ].reshape(n_head * head_dim, head_dim * n_head)
k = qkv[:, [-2]].reshape(n_head_kv * head_dim, head_dim * n_head)
Expand Down
0