8000 Add Llama-3 chat format by andreabak · Pull Request #1371 · abetlen/llama-cpp-python · GitHub
[go: up one dir, main page]

Skip to content

Add Llama-3 chat format #1371

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Apr 23, 2024
Merged

Conversation

andreabak
Copy link
Contributor

This PR adds support for the recently-released Llama-3 models by Meta. Specifically this chat format is used for their Instruct pretrained models.

Model card: https://github.com/meta-llama/llama3/blob/main/MODEL_CARD.md
See reference implementation for the chat format: https://github.com/meta-llama/llama3/blob/main/llama/tokenizer.py#L202-L229

@andreabak andreabak force-pushed the dev-llama3_chat_format-abk16 branch from a1bfeb3 to f114963 Compare April 21, 2024 16:57
@andreabak
Copy link
Contributor Author

I made the system message be just like any other role, since from the reference code there doesn't seem to be a distinction between those.
Conventionally the first message is the system one, and usually no extra system ones are allowed within the dialog lines, but there doesn't seem to be such restriction in the llama-3 format (at least just from looking at the code).

@andreabak andreabak force-pushed the dev-llama3_chat_format-abk16 branch 2 times, most recently from c7a0548 to 93833a1 Compare April 22, 2024 20:56
@andreabak
Copy link
Contributor Author

@abetlen also added chat template for format auto-detect and bumped llama.cpp to latest version to properly support the eot token.
Rebased up to date with main.

N.B. unrelated issue, I noticed the LLAMA_CUBLAS cmake arg has been deprecated in favor of LLAMA_CUDA, so that should get changed in this codebase too eventually.

Copy link
@jakekarnes42 jakekarnes42 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some suggestions based on similar changes I saw in the llama.cpp project.

_messages = _map_roles(messages, _roles)
_messages.append((_roles["assistant"], None))
_prompt = _format_no_colon_single(_begin_token, _messages, _sep)
return ChatFormatterResponse(prompt=_prompt, stop=_sep)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could consider adding "<|im_end|>" and "<end_of_turn>" as an additional stop tokens. I don't know if it's completely necessary, but ChatFormatterResponse looks like it accepts a list of stop tokens, and the llama.cpp project uses all three as stop tokens for Llama 3: ggml-org/llama.cpp@8960fe8

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it makes sense to follow what's done in the llama.cpp library. One thing I'm not sure is if that code from the linked commit is only for llama-3 or more generic for any llama models? Because if it's the latter, maybe it's better to just stick to the <|eot_id|> that's explicitly defined in the released llama-3 code?

@andreabak andreabak force-pushed the dev-llama3_chat_format-abk16 branch from 93833a1 to 71bc488 Compare April 22, 2024 21:37
@ramipellumbi
Copy link

Thanks for this PR! I have been using these changes with success locally to test Llama 3.

@abetlen
Copy link
Owner
abetlen commented Apr 23, 2024

@andreabak thank you for this! I'll go ahead and merge this shortly!

@abetlen abetlen merged commit 8559e8c into abetlen:main Apr 23, 2024
@andreabak andreabak deleted the dev-llama3_chat_format-abk16 branch April 23, 2024 07:26
xhedit pushed a commit to xhedit/llama-cpp-conv that referenced this pull request Apr 30, 2024
* feat: Add Llama-3 chat format

* feat: Auto-detect Llama-3 chat format from gguf template

* feat: Update llama.cpp to b2715

Includes proper Llama-3 <|eot_id|> token handling.

---------

Co-authored-by: Andrei Betlen <abetlen@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants
0