8000 Chat completions crashes when asked for JSON response · Issue #1655 · abetlen/llama-cpp-python · GitHub
[go: up one dir, main page]

Skip to content
Chat completions crashes when asked for JSON response #1655
Closed
@shakedzy

Description

@shakedzy

Prerequisites

Please answer the following questions for yourself before submitting an issue.

  • I am running the latest code. Development is very rapid so there are no tagged versions as of now.
  • I carefully followed the README.md.
  • I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
  • I reviewed the Discussions, and have a new bug or useful enhancement to share.

Expected Behavior

Please provide a detailed written description of what you were trying to do, and what you expected llama-cpp-python to do.

Current Behavior

Using response_format={ "type": "json_object" } crashes

Environment and Context

I'm using:

from llama_cpp import Llama
llm = Llama("/Users/shakedz/local_models/Meta-Llama-3.1-8B-Instruct-Q4_K_M.gguf", n_ctx=4096, verbose=False)

Which were downloaded form here:
https://huggingface.co/lmstudio-community/Meta-Llama-3.1-8B-Instruct-GGUF/tree/main

This fails:

print(llm.create_chat_completion([{'role': 'user', 'content': 'What is the capital of France? Replay using a JSON: {"answer": "YOUR_ANSWER"}!'}], response_format={ "type": "json_object" }))

Due to:

libc++abi: terminating due to uncaught exception of type std::out_of_range: vector
[1]    25897 abort      /Users/shakedz/bitbucket/achilles/.venv/bin/python 

But without the response_format:

print(llm.create_chat_completion([{'role': 'user', 'content': 'What is the capital of France? Replay using a JSON: {"answer": "YOUR_ANSWER"}!'}]))

It works:

{'id': 'chatcmpl-987cdac0-5398-44a7-b6ca-36c1121392dc', 'object': 'chat.completion', 'created': 1722861547, 'model': '/Users/shakedz/local_models/Meta-Llama-3.1-8B-Instruct-Q4_K_M.gguf', 'choices': [{'index': 0, 'message': {'role': 'assistant', 'content': '{"answer": "Paris"}'}, 'logprobs': None, 'finish_reason': 'stop'}], 'usage': {'prompt_tokens': 56, 'completion_tokens': 6, 'total_tokens': 62}}

System

  • Apple M2 Max, 32 GB, Sonoma 14.5
  • Python 3.12.1
  • llama_cpp_python==0.2.85
  • installed using CMAKE_ARGS="-DGGML_METAL=on" pip install llama-cpp-python

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      0