8000 Llama 3 Double BOS · Issue #1501 · abetlen/llama-cpp-python · GitHub
[go: up one dir, main page]

Skip to content
Llama 3 Double BOS #1501
Closed
Closed
@ellieyhcheng

Description

@ellieyhcheng

Prerequisites

Please answer the following questions for yourself before submitting an issue.

  • I am running the latest code. Development is very rapid so there are no tagged versions as of now.
  • I carefully followed the README.md.
  • I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
  • I reviewed the Discussions, and have a new bug or useful enhancement to share.

Expected Behavior

Run create_chat_completion on Llama 3 (in gguf format).

llm = Llama.from_pretrained(
    repo_id="bartowski/Meta-Llama-3-8B-Instruct-GGUF",
    filename="*Q6_K.gguf",
    verbose=True,
)
messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {
        "role": "user",
        "content": "Q: Name the planets in the solar system? A: Mercury, Venus, Earth, Mars, Jupiter, Saturn, Uranus, Neptune and Pluto."
    }
]
llm.create_chat_completion(
    messages=messages,
    max_tokens=2048,
)

Current Behavior

Get the warning

llama_tokenize_internal: Added a BOS token to the prompt as specified by the model but the prompt also starts with a BOS token. So now the final prompt starts with 2 BOS tokens. Are you sure this is what you want?

Environment and Context

Mac M1
Python 3.11.6

Failure Information (for bugs)

The warning is gone when applying the following chat template manually and using create_completion to get the response:

template = "{% set loop_messages = messages %}{% for message in loop_messages %}{% set content = '<|start_header_id|>' + message['role'] + '<|end_header_id|>\n\n'+ message['content'] | trim + '<|eot_id|>' %}{% if loop.index0 == 0 %}{% set content = content %}{% endif %}{{ content }}{% endfor %}{% if add_generation_prompt %}{{ '<|start_header_id|>assistant<|end_header_id|>\n\n' }}{% endif %}"

The difference from the default Llama 3 template is that set content = bos_token + content is changed to set content = content.

Based on that, it seems the double BOS token is coming from the chat template applying the BOS token, but create_completion (probably when calling tokenize) is additionally adding the BOS token.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      0