Closed
Description
Prerequisites
Please answer the following questions for yourself before submitting an issue.
- I am running the latest code. Development is very rapid so there are no tagged versions as of now.
- I carefully followed the README.md.
- I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
- I reviewed the Discussions, and have a new bug or useful enhancement to share.
Expected Behavior
Run create_chat_completion
on Llama 3 (in gguf format).
llm = Llama.from_pretrained(
repo_id="bartowski/Meta-Llama-3-8B-Instruct-GGUF",
filename="*Q6_K.gguf",
verbose=True,
)
messages = [
{"role": "system", "content": "You are a helpful assistant."},
{
"role": "user",
"content": "Q: Name the planets in the solar system? A: Mercury, Venus, Earth, Mars, Jupiter, Saturn, Uranus, Neptune and Pluto."
}
]
llm.create_chat_completion(
messages=messages,
max_tokens=2048,
)
Current Behavior
Get the warning
llama_tokenize_internal: Added a BOS token to the prompt as specified by the model but the prompt also starts with a BOS token. So now the final prompt starts with 2 BOS tokens. Are you sure this is what you want?
Environment and Context
Mac M1
Python 3.11.6
Failure Information (for bugs)
The warning is gone when applying the following chat template manually and using create_completion
to get the response:
template = "{% set loop_messages = messages %}{% for message in loop_messages %}{% set content = '<|start_header_id|>' + message['role'] + '<|end_header_id|>\n\n'+ message['content'] | trim + '<|eot_id|>' %}{% if loop.index0 == 0 %}{% set content = content %}{% endif %}{{ content }}{% endfor %}{% if add_generation_prompt %}{{ '<|start_header_id|>assistant<|end_header_id|>\n\n' }}{% endif %}"
The difference from the default Llama 3 template is that set content = bos_token + content
is changed to set content = content
.
Based on that, it seems the double BOS token is coming from the chat template applying the BOS token, but create_completion
(probably when calling tokenize
) is additionally adding the BOS token.
Metadata
Metadata
Assignees
Labels
No labels