8000 #1 Updating to latest repo version with LLM Monitoring metrics by juanroesel · Pull Request #1 · ZenHubHQ/llama-cpp-python · GitHub
[go: up one dir, main page]

Skip to content

Conversation

@juanroesel
Copy link
@juanroesel juanroesel commented May 7, 2024

Closes ZenHubHQ/devops#2205

It also adds includes the following:

  • A new metric kv_cache_usage_ratio, which measures how much KV cache is being used.
  • Synced commits with the parent repo (not relevant for the PR review).
  • A Llama 3 8B model baked into the image.

New image us.gcr.io/zenhub-ops/llama_cpp_python-llama3_8b_f16:v0.3.1 was successfully deployed into staging.

abetlen and others added 15 commits May 2, 2024 11:32
* set up streaming for v2

* assert v2 streaming, fix tool_call vs function_call

* fix streaming with tool_choice/function_call

* make functions return 1 function call only when 'auto'

* fix

---------

Co-authored-by: Andrei <abetlen@gmail.com>
…ing space (abetlen#1375)

* Fix tokenization edge case where llama output does not start with a space

See this notebook:
https://colab.research.google.com/drive/1Ooz11nFPk19zyJdMDx42CeesU8aWZMdI#scrollTo=oKpHw5PZ30uC

* Update _internals.py

Fixing to compare to b' ' instead of (str)' '

---------

Co-authored-by: Andrei <abetlen@gmail.com>
)

* Update dependabot.yml

Add github-actions update

* Update dependabot.yml

* Update dependabot.yml
@juanroesel juanroesel requested review from cwarje and m62534 May 7, 2024 02:32
@juanroesel
Copy link
Author

NOTE: GH Actions need to be updated in this repo. I will create a ticket for this soon.

@juanroesel juanroesel requested a review from blacklander May 7, 2024 17:17
Copy link
@cwarje cwarje left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@juanroesel
Copy link
Author
juanroesel commented May 9, 2024

@m62534 @cwarje Just FYI, given today's events with Llama3, I built a new image us.gcr.io/zenhub-ops/llama_cpp_python_zh-mistral7b_f16:v0.2.1 containing these code changes plus the Mistral model and redeployed it in staging.

@juanroesel juanroesel merged commit 8cd638c into main May 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants

0