8000 Model Repeats Nonsensical Output · Issue #13066 · ggml-org/llama.cpp · GitHub
[go: up one dir, main page]

Skip to content

Model Repeats Nonsensical Output #13066

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
18marie05 opened this issue Apr 22, 2025 · 1 comment
Open

Model Repeats Nonsensical Output #13066

18marie05 opened this issue Apr 22, 2025 · 1 comment
Labels

Comments

@18marie05
Copy link
18marie05 commented Apr 22, 2025

Hello,

I need to run a gguf model on an embedded device with limited resources.
I use the qwen2.5-0.5b-instruct-q4_0.gguf model.

After testing a bunch of combinations, this is the llama-cli command I use:
./llama.cpp/build/bin/llama-cli -m "$MODEL" -sys "$SYS_PROMPT" -p "$PROMPT" -co -c 700

And this is the full .sh code I run:

MODEL="gguf_qwen/qwen2.5-0.5b-instruct-q4_0.gguf"
CONTEXT="$(cat ../data/input_data.txt)"

# Only now build the full prompt
SYS_PROMPT="You are a helpful assistant. Be polite with the user. Use the following context to answer the question. If you can't answer based on the context, say 'Sorry, I am not able to provide this information.'

Context:
$CONTEXT"

./llama.cpp/build/bin/llama-cli -m "$MODEL" -sys "$SYS_PROMPT" -p "Greet the user." -co -c 700

while true; do
  printf "> "
  read QUESTION
  [ "$QUESTION" = "exit" ] && break

  PROMPT="Question: $QUESTION"
  ./llama.cpp/build/bin/llama-cli \
    -m "$MODEL" \
    -sys "$SYS_PROMPT" \
    -p "$PROMPT" \
    -co \
    -c 700 \
    -n 100 \
    --repeat-penalty 1.3 \
    --repeat-last-n 256 \
    --temp 0.6 \
    --top-k 40 \
    --top-p 0.85



done

Setting the limitation at -c 700 is what works for my device.

This is the issue I have : as soon as the limitation is exceeded, the model is looping on the same token or set of tokens.
Here is an example:

Certainly, here are the details I have on your network:
1. Wi-Fi password: xxxxx
2. Gateway type: xxxx
3. Average RSSI:xxxx
4. Most used band: 2.4.4444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444

I tried a few things to try to cut this kind of output:

    -m "$MODEL" \
    -sys "$SYS_PROMPT" \
    -p "$PROMPT" \
    -co \
    -c 700 \
    -n 100 \
    --repeat-penalty 1.3 \
    --repeat-last-n 256 \
    --temp 0.6 \
    --top-k 40 \
    --top-p 0.85

But none of them work to control the model behavior.

Is there something to do that I didn't think of?

Thank you in advance.

@Manamama
Copy link

Probably your -p "Greet the user." is too generic.
It is not a question.

Also check the hints at https://huggingface.co/Qwen/Qwen2.5-0.5B-Instruct-GGUF for the prompt tags or just change the model.

< 581B path d="M15 8a7.002 7.002 0 00-7-7" stroke="currentColor" stroke-width="2" stroke-linecap="round" vector-effect="non-scaling-stroke" />

@github-actions github-actions bot added the stale label May 24, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants
0