8000 server: fix streaming crashes by ochafik · Pull Request #13786 · ggml-org/llama.cpp · GitHub
[go: up one dir, main page]

Skip to content

server: fix streaming crashes #13786

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
May 26, 2025
Merged

Conversation

ochafik
Copy link
Collaborator
@ochafik ochafik commented May 25, 2025

Follow up to #12379

  • allow all parsers to parse non-tool-call content (only activate tool parsing logic if there were tools in the request)
  • add preludes to content when finding parsing partial regex matches: prevents many (hopefully most!) diff crashes

(also see #13785 which fixes non-chat streaming + stop regression)

This (+ other PR) hopefully fixes crashes such as reported here (cc/ @teleprint-me ) and there (cc/ @pwilkin )

@github-actions github-actions bot added testing Everything test related examples server labels May 25, 2025
@ochafik ochafik marked this pull request as ready for review May 26, 2025 00:44
@ochafik ochafik requested a review from ngxson as a code owner May 26, 2025 00:44
@teleprint-me
Copy link
Contributor

Yeah, it works! No issues so far after some quick testing. The server is no longer crashing. I'll poke around a little more later on when I have some time.

@PrideIsLife
Copy link

Will your PR also tackle the issue described in #13774 ?

@ochafik
Copy link
Collaborator Author
ochafik commented May 26, 2025

Will your PR also tackle the issue described in #13774 ?

@PrideIsLife Taking care of this in #13800, thanks for bringing my attention to it!

@ochafik ochafik force-pushed the fix-chat-parsers branch from cfd2f2b to e064360 Compare May 26, 2025 12:11
@ochafik
Copy link
Collaborator Author
ochafik commented May 26, 2025

Apologies to anyone tracking this branch, had to amend my email - google.com no more! - and force push!

@pwilkin
Copy link
Contributor
pwilkin commented May 26, 2025

Can confirm as well that it fixes my case.

Copy link
Collaborator
@CISC CISC left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't test right now, but this looks reasonable.

common/chat.cpp Outdated
Comment on lines 1017 to 1024
// If we didn't extract thoughts, prelude includes them.
builder.add_content(res->prelude);
if (auto res = builder.try_find_regex(end_response_regex)) {
builder.add_content(res->prelude);
// If we didn't extract thoughts, prelude includes them.
} else {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since you're not using res any more and this block is a no-op, perhaps change it to

        if (!builder.try_find_regex(end_response_regex)) {
            ...
        }

@ochafik ochafik changed the title server: streaming fixes server: fix streaming crashes May 26, 2025
@ochafik ochafik added the bugfix fixes an issue or bug label May 26, 2025
ochafik and others added 3 commits May 26, 2025 06:49
Co-Authored-By: ochafik <ochafik@google.com>
…3.1 format. still not ideal but hopefully less prone to crash

Co-Authored-By: ochafik <ochafik@google.com>
@ochafik ochafik merged commit 03f582a into ggml-org:master May 26, 2025
46 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bugfix fixes an issue or bug examples server testing Everything test related
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants
0