Description
Prerequisites
Please answer the following questions for yourself before submitting an issue.
- [ X] I am running the latest code. Development is very rapid so there are no tagged versions as of now.
- [ X] I carefully followed the README.md.
- [ X] I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
- [X ] I reviewed the Discussions, and have a new bug or useful enhancement to share.
Expected Behavior
The Llama.cpp grammar file should be accepted if it meets the typical syntax definition, even for Japanese characters.
Current Behavior
Currently the Llama-cpp-python fails to load the grammar file (failure to parse) if it contains the following inside:
(above definition skipped, showing problematic lines only)
jp-char ::= hiragana | katakana | punctuation | cjk
hiragana ::= [ぁ-ゟ]
katakana ::= [ァ-ヿ]
punctuation ::= [、-〾]
cjk ::= [一-鿿]
THe above is the recommended set of definition from the grammar llama.cpp project - this works when used directly with llama.cpp, but fails when used with llama-cpp-python bindings.
Environment and Context
- Physical (or virtual) hardware you are using, e.g. for Linux:
Ubuntu 22.04
Virtualenv with Python 3.10, and latest llama-cpp-python bindings rebuilt from source with pip based on April 16th's version.
Failure Information (for bugs)
`parse: error parsing grammar: unknown escape at \ぁ-\ゟ]
Traceback (most recent call last):
File "/home//CODING/llm-grammar/main-bvc.py", line 13, in
grammar = LlamaGrammar.from_string(grammar_text)
File "/home//.virtualenvs/rag-test/lib/python3.10/site-packages/llama_cpp/llama_grammar.py", line 70, in from_string
raise ValueError(
ValueError: from_string: error parsing grammar file: parsed_grammar.rules is empty`
Steps to Reproduce
Please provide detailed steps for reproducing the issue. We are not sitting in front of your screen, so the more detail the better.
from llama_cpp.llama import Llama, LlamaGrammar
import requests
import json
grammar_text_path = "./grammar-test.gbnf"
with open(grammar_text_path, 'r') as file:
content = file.read()
grammar_text= content
grammar = LlamaGrammar.from_string(grammar_text)
It fails at the last line above because it can't parse the Japanese lines. Note that the grammar file is accepted if I comment all of the lines containing Japanese characters. The exact same grammar file works fine with llama.cpp directly.