8000 Add Google's Gemma formatting via `chat_format="gemma"` by alvarobartt · Pull Request #1210 · abetlen/llama-cpp-python · GitHub
[go: up one dir, main page]

Skip to content

Add Google's Gemma formatting via chat_format="gemma" #1210

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Feb 23, 2024

Conversation

alvarobartt
Copy link
Contributor

Description

This PR adds the recently released Google's Gemma formatting, so that it can be used via the chat_format arg within the Llama class i.e. chat_format="gemma".

More information about the models in the following HuggingFace Collection at https://huggingface.co/collections/google/gemma-release-65d5efbccdbb8c4202ec078b

Note that some of the current GGUF versions uploaded to the HuggingFace Hub are not working fine with llama.cpp, but I got it working when using e.g. https://huggingface.co/rahuldshetty/gemma-7b-it-gguf-quantized/blob/main/gemma-7b-it-Q4_K_M.gguf.

Example

from llama_cpp import Llama

llm = Llama(
    model_path="./models/gemma-7b-it-Q4_K_M.gguf",
    chat_format="gemma",
    n_gpu_layers=-1,
)
print(
    llm.create_chat_completion(
        messages=[
            {"role": "user", "content": "What's the capital of Spain"},
            {"role": "assistant", "content": "Barcelona"},
            {"role": "user", "content": "No, it's not, try again."},
        ],
        max_tokens=4,
    )
)
# {'id': 'chatcmpl-6037698e-2f05-496d-b49f-f7d61dde32f3', 'object': 'chat.completion', 'created': 1708599453, 'model': './models/gemma-7b-it-Q4_K_M.gguf', 'choices': [{'index': 0, 'message': {'role': 'assistant', 'content': 'The answer is Madrid'}, 'finish_reason': 'length'}], 'usage': {'prompt_tokens': 37, 'completion_tokens': 4, 'total_tokens': 41}}

@alvarobartt
Copy link
Contributor Author

P.S. I'm still unsure whether the format_gemma should raise a ValueError when the system_prompt is provided, as maybe instead if safer to just print out a warning stating that the system_prompt cannot be used i.e. will be ignored, WDYT @abetlen?

@felipelo
Copy link
Contributor
felipelo commented Feb 22, 2024

P.S. I'm still unsure whether the format_gemma should raise a ValueError when the system_prompt is provided, as maybe instead if safer to just print out a warning stating that the system_prompt cannot be used i.e. will be ignored, WDYT @abetlen?

Agree ^^^

There are cases where a Prompt System/Repository is being used and happens to hold a 'system' value. I would even suggest to append the system_prompt on the first user message.

@abetlen
Copy link
Owner
abetlen commented Feb 23, 2024

@alvarobartt thank you for the contribution.

As for system message I'm a little conflicted on this, the issue is that there are at least 3 entirely valid ways to handle it: raise an error, concat it to the user message, or ignore it entirely. For now I think ignoring a system message is the safest option (potentially raising a warning in verbose mode). In the future I'd like to migrate all of the chat templates to jinja2 then people can just pass custom templates easily to override these model defined presets.

@alvarobartt
Copy link
Contributor Author

As for system message I'm a little conflicted on this, the issue is that there are at least 3 entirely valid ways to handle it: raise an error, concat it to the user message, or ignore it entirely. For now I think ignoring a system message is the safest option (potentially raising a warning in verbose mode). In the future I'd like to migrate all of the chat templates to jinja2 then people can just pass custom templates easily to override these model defined presets.

Fair, then I'll do that for the moment, and if you happen to open a draft PR with the Jinja2 migration I'm happy to help out! 🤝🏻

@abetlen abetlen merged commit 251a8a2 into abetlen:main Feb 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants
0