8000 Grammar File not Working for Japanese · Issue #1349 · abetlen/llama-cpp-python · GitHub
[go: up one dir, main page]

Skip to content
Grammar File not Working for Japanese #1349
Closed
@rgmerck

Description

@rgmerck

Prerequisites

Please answer the following questions for yourself before submitting an issue.

  • [ X] I am running the latest code. Development is very rapid so there are no tagged versions as of now.
  • [ X] I carefully followed the README.md.
  • [ X] I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
  • [X ] I reviewed the Discussions, and have a new bug or useful enhancement to share.

Expected Behavior

The Llama.cpp grammar file should be accepted if it meets the typical syntax definition, even for Japanese characters.

Current Behavior

Currently the Llama-cpp-python fails to load the grammar file (failure to parse) if it contains the following inside:

(above definition skipped, showing problematic lines only)
jp-char ::= hiragana | katakana | punctuation | cjk
hiragana ::= [ぁ-ゟ]
katakana ::= [ァ-ヿ]
punctuation ::= [、-〾]
cjk ::= [一-鿿]

THe above is the recommended set of definition from the grammar llama.cpp project - this works when used directly with llama.cpp, but fails when used with llama-cpp-python bindings.

Environment and Context

  • Physical (or virtual) hardware you are using, e.g. for Linux:
    Ubuntu 22.04
    Virtualenv with Python 3.10, and latest llama-cpp-python bindings rebuilt from source with pip based on April 16th's version.

Failure Information (for bugs)

`parse: error parsing grammar: unknown escape at \ぁ-\ゟ]

Traceback (most recent call last):
File "/home//CODING/llm-grammar/main-bvc.py", line 13, in
grammar = LlamaGrammar.from_string(grammar_text)
File "/home//.virtualenvs/rag-test/lib/python3.10/site-packages/llama_cpp/llama_grammar.py", line 70, in from_string
raise ValueError(
ValueError: from_string: error parsing grammar file: parsed_grammar.rules is empty`

Steps to Reproduce

Please provide detailed steps for reproducing the issue. We are not sitting in front of your screen, so the more detail the better.

from llama_cpp.llama import Llama, LlamaGrammar
import requests
import json

grammar_text_path = "./grammar-test.gbnf"

with open(grammar_text_path, 'r') as file:
    content = file.read()

grammar_text= content
grammar = LlamaGrammar.from_string(grammar_text)

It fails at the last line above because it can't parse the Japanese lines. Note that the grammar file is accepted if I comment all of the lines containing Japanese characters. The exact same grammar file works fine with llama.cpp directly.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      0