Llama 3 Double BOS

Prerequisites

Please answer the following questions for yourself before submitting an issue.

I am running the latest code. Development is very rapid so there are no tagged versions as of now.
I carefully followed the README.md.
I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
I reviewed the Discussions, and have a new bug or useful enhancement to share.

Expected Behavior

Run create_chat_completion on Llama 3 (in gguf format).

llm = Llama.from_pretrained(
    repo_id="bartowski/Meta-Llama-3-8B-Instruct-GGUF",
    filename="*Q6_K.gguf",
    verbose=True,
)
messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {
        "role": "user",
        "content": "Q: Name the planets in the solar system? A: Mercury, Venus, Earth, Mars, Jupiter, Saturn, Uranus, Neptune and Pluto."
    }
]
llm.create_chat_completion(
    messages=messages,
    max_tokens=2048,
)

Current Behavior

Get the warning

llama_tokenize_internal: Added a BOS token to the prompt as specified by the model but the prompt also starts with a BOS token. So now the final prompt starts with 2 BOS tokens. Are you sure this is what you want?

Environment and Context

Mac M1
Python 3.11.6

Failure Information (for bugs)

The warning is gone when applying the following chat template manually and using create_completion to get the response:

template = "{% set loop_messages = messages %}{% for message in loop_messages %}{% set content = '<|start_header_id|>' + message['role'] + '<|end_header_id|>\n\n'+ message['content'] | trim + '<|eot_id|>' %}{% if loop.index0 == 0 %}{% set content = content %}{% endif %}{{ content }}{% endfor %}{% if add_generation_prompt %}{{ '<|start_header_id|>assistant<|end_header_id|>\n\n' }}{% endif %}"

The difference from the default Llama 3 template is that set content = bos_token + content is changed to set content = content.

Based on that, it seems the double BOS token is coming from the chat template applying the BOS token, but create_completion (probably when calling tokenize) is additionally adding the BOS token.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Prerequisites

Expected Behavior

Current Behavior

Environment and Context

Failure Information (for bugs)

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Description

Prerequisites

Expected Behavior

Current Behavior

Environment and Context

Failure Information (for bugs)

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions