You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
$ ./build_cpu/bin/llama-cli
build: 5374 (72df31d) with cc (Ubuntu 13.1.0-8ubuntu1~22.04) 13.1.0 for x86_64-linux-gnu
Operating systems
Linux
GGML backends
CPU
Hardware
Intel 6248R
Models
Llama2-7b-q8_0.gguf
Problem description & steps to reproduce
Hi, when I run the scripts ./build_cpu/bin/llama-batched-bench -m ~/LLM/GGUF/Llama-2-7B-GGUF/llama-2-7b.Q8_0.gguf -npp 512 -ntg 512 -npl 128, there is no output in the table:
I also evaluate the script on ARM CPUs. It has the same outputs.
Thanks.
main: n_kv_max = 4096, n_batch = 2048, n_ubatch = 512, flash_attn = 0, is_pp_shared = 0, n_gpu_layers = -1, n_threads = 48, n_threads_batch = 48
| PP | TG | B | N_KV | T_PP s | S_PP t/s | T_TG s | S_TG t/s | T s | S t/s | |-------|--------|------|--------|----------|----------|----------|----------|----------|----------|
llama_perf_context_print: load time = 1418.38 ms
llama_perf_context_print: prompt eval time = 195.92 ms / 16 tokens ( 12.24 ms per token, 81.67 tokens per second) llama_perf_context_print: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second) llama_perf_context_print: total time = 1418.39 ms / 17 tokens
First Bad Commit
No response
Relevant log output
main: n_kv_max = 4096, n_batch = 2048, n_ubatch = 512, flash_attn = 0, is_pp_shared = 0, n_gpu_layers = -1, n_threads = 48, n_threads_batch = 48
| PP | TG | B | N_KV | T_PP s | S_PP t/s | T_TG s | S_TG t/s | T s | S t/s ||-------|--------|------|--------|----------|----------|----------|----------|----------|----------|
llama_perf_context_print: load time = 1418.38 ms
llama_perf_context_print: prompt evaltime = 195.92 ms / 16 tokens ( 12.24 ms per token, 81.67 tokens per second) llama_perf_context_print: evaltime = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second) llama_perf_context_print: total time = 1418.39 ms / 17 tokens
The text was updated successfully, but these errors were encountered:
The -npl 128 means that it will try to allocate 128 parallel sequences with prompt of 512 tokens. This does not fit in the context size that you specified (n_kv_max = 4096). Try to reduce -npl 1,2,3,4,....
Name and Version
$ ./build_cpu/bin/llama-cli
build: 5374 (72df31d) with cc (Ubuntu 13.1.0-8ubuntu1~22.04) 13.1.0 for x86_64-linux-gnu
Operating systems
Linux
GGML backends
CPU
Hardware
Intel 6248R
Models
Llama2-7b-q8_0.gguf
Problem description & steps to reproduce
Hi, when I run the scripts
./build_cpu/bin/llama-batched-bench -m ~/LLM/GGUF/Llama-2-7B-GGUF/llama-2-7b.Q8_0.gguf -npp 512 -ntg 512 -npl 128
, there is no output in the table:I also evaluate the script on ARM CPUs. It has the same outputs.
Thanks.
First Bad Commit
No response
Relevant log output
The text was updated successfully, but these errors were encountered: