-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Add Llama-3 chat format #1371
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Llama-3 chat format #1371
Conversation
a1bfeb3
to
f114963
Compare
I made the system message be just like any other role, since from the reference code there doesn't seem to be a distinction between those. |
Includes proper Llama-3 <|eot_id|> token handling.
c7a0548
to
93833a1
Compare
@abetlen also added chat template for format auto-detect and bumped llama.cpp to latest version to properly support the eot token. N.B. unrelated issue, I noticed the LLAMA_CUBLAS cmake arg has been deprecated in favor of LLAMA_CUDA, so that should get changed in this codebase too eventually. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just some suggestions based on similar changes I saw in the llama.cpp project.
_messages = _map_roles(messages, _roles) | ||
_messages.append((_roles["assistant"], None)) | ||
_prompt = _format_no_colon_single(_begin_token, _messages, _sep) | ||
return ChatFormatterResponse(prompt=_prompt, stop=_sep) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could consider adding "<|im_end|>"
and "<end_of_turn>"
as an additional stop tokens. I don't know if it's completely necessary, but ChatFormatterResponse looks like it accepts a list of stop tokens, and the llama.cpp project uses all three as stop tokens for Llama 3: ggml-org/llama.cpp@8960fe8
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it makes sense to follow what's done in the llama.cpp library. One thing I'm not sure is if that code from the linked commit is only for llama-3 or more generic for any llama models? Because if it's the latter, maybe it's better to just stick to the <|eot_id|>
that's explicitly defined in the released llama-3 code?
93833a1
to
71bc488
Compare
Thanks for this PR! I have been using these changes with success locally to test Llama 3. |
@andreabak thank you for this! I'll go ahead and merge this shortly! |
* feat: Add Llama-3 chat format * feat: Auto-detect Llama-3 chat format from gguf template * feat: Update llama.cpp to b2715 Includes proper Llama-3 <|eot_id|> token handling. --------- Co-authored-by: Andrei Betlen <abetlen@gmail.com>
This PR adds support for the recently-released Llama-3 models by Meta. Specifically this chat format is used for their
Instruct
pretrained models.Model card: https://github.com/meta-llama/llama3/blob/main/MODEL_CARD.md
See reference implementation for the chat format: https://github.com/meta-llama/llama3/blob/main/llama/tokenizer.py#L202-L229