-
Notifications
You must be signed in to change notification settings - Fork 12.5k
grammars: x{min,max} repetition operator #6640
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 1 commit
0160469
f2030e3
de0fd3f
9d9b5a3
6b5518c
0ceb69a
8938a05
0d7347f
2e2df72
ffe321d
a9351b8
9d8efa5
2d98ebf
ec91342
22faba6
1fb7787
15585e0
93b754e
2ecc2ae
a9a2983
d47f537
724f879
a61281f
d03c98e
21bac1e
0c74ad3
218f41f
2813835
46fe648
eb7ccd8
3c02508
476c97d
990bf57
d070aee
8266b7c
2b79d47
File filter
Filter by extension
Conversations
Diff view
Diff view
- Loading branch i 8000 nformation
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -174,11 +174,11 @@ namespace grammar_parser { | |
// Sstar ::= Scopy Sstar | | ||
|
||
uint32_t content_rule_id = 0; | ||
if (last_sym_start == out_elements.size() - 1) { | ||
if (last_sym_start >= 0 && last_sym_start == out_elements.size() - 1 && out_elements[last_sym_start].type == LLAMA_GRETYPE_RULE_REF) { | ||
// The repeated content is already a rule ref, no need to copy it | ||
content_rule_id = out_elements[last_sym_start].value; | ||
} else { | ||
content_rule_id = generate_symbol_id(state, rule_name); | ||
content_rule_id = generate_symbol_id(state, rule_name + "_copy"); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. +1 -- One nice thing about using "_copy" for the name here is that underscores are not valid for rule names, so you're guaranteed that you won't get collisions with this. I like it! There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Note that the generate symbol id helper takes care of avoiding collisions anyway |
||
// add preceding symbol to generated copy rule | ||
std::vector<llama_grammar_element> copy_rule(out_elements.begin() + last_sym_start, out_elements.end()); | ||
copy_rule.push_back({LLAMA_GRETYPE_END, 0}); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm trying to wrap my head around this change, and its potential impact -- best I can tell, this is the change that you refer to in the PR description re:
One thing I haven't fully wrapped my head around is if the new rewrite rules are more efficient than they were before, less efficient, or about the same. I might need to expand the gbnf-validator and compare the parsed grammars between old-and-new before I really wrap my mind around what's going on here.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah only some benchmarking will tell (and we’d need more real-world grammars for that). My hunch is it’s probably the same, the extra indirection might be negligible, and the saving of avoiding duplication too.