-
Notifications
You must be signed in to change notification settings - Fork 227
chore: bump llama.cpp to support tool streaming #1438
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Robert Sturla <robertsturla@outlook.com>
Reviewer's GuideThis PR updates the llama.cpp clone target to a newer commit that supports tool streaming (including PR #12379) and reapplies consistent formatting across the build_llama_and_whisper.sh script to standardize indentation, remove extraneous semicolons, and align multiline arrays. Sequence diagram for conceptual tool streaming with updated llama.cppsequenceDiagram
actor User
participant OllamaService as "Ollama Service\n(with updated llama.cpp)"
participant LlamaCppInternal as "llama.cpp (b5499)"
participant ExternalTool as "External Tool\n(e.g., Codex)"
User->>OllamaService: Prompt requiring tool use
OllamaService->>LlamaCppInternal: Process prompt
LlamaCppInternal-->>ExternalTool: Call Tool API (e.g., code interpreter)
ExternalTool-->>LlamaCppInternal: Tool Response
LlamaCppInternal->>OllamaService: Formatted response incorporating tool output
OllamaService->>User: Final Response
File-Level Changes
Assessment against linked issues
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @p5 - I've reviewed your changes and they look great!
Here's what I looked at during the review
- 🟡 General issues: 1 issue found
- 🟢 Security: all looks good
- 🟢 Testing: all looks good
- 🟢 Complexity: all looks good
- 🟢 Documentation: all looks good
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
.github/workflows/ci-images.yml
Outdated
@@ -52,7 +52,7 @@ jobs: | |||
sudo rm -rf \ | |||
/usr/share/dotnet /usr/local/lib/android /opt/ghc \ | |||
/usr/local/share/powershell /usr/share/swift /usr/local/.ghcup \ | |||
/usr/lib/jvm || true | |||
/usr/lib/jvm /opt/hostedtoolcache/CodeQL || true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a potential fix to the storage issues in the runner.
Removing CodeQL (which is only used when you explicitly call it) frees up an additional 5GB.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for figuring this out @p5 !
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately it didn't work. I'm unsure whether it helped, or is freeing up space in the wrong disk.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The newest commit frees up an additional 8GB of storage, which I'm hoping is sufficient.
If not, the next option is probably to matrix these builds into separate jobs.
Signed-off-by: Robert Sturla <robertsturla@outlook.com>
LGTM |
Awesome, thank you! |
Closes #1431
Bumps llama.cpp to the commit sha in https://github.com/ggml-org/llama.cpp/releases/tag/b5499
These commits include ggml-org/llama.cpp#12379, plus all fixes related to this in order to support running AI code assistants like Codex.
While I was able to call Codex and get semi-sane responses from it after building the cuda container, I am not familiar enough to demonstrate an AI assistant doing it's magic.

And apologies for the unrelated changes - my IDE decided it wanted to format the code too. Checked through these and they don't appear to be functionally different, and just switch to using a consistent number of spaces within the script.
Summary by Sourcery
Bump llama.cpp to the latest commit to enable tool streaming support and update the build script indentation
New Features:
Enhancements:
Chores: