Closed
Description
Prerequisites
Please answer the following questions for yourself before submitting an issue.
- I am running the latest code. Development is very rapid so there are no tagged versions as of now.
- I carefully followed the README.md.
- I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
- I reviewed the Discussions, and have a new bug or useful enhancement to share.
Expected Behavior
It is supposed to compute perplexity like the original PR: #270
Current Behavior
However, it fails with the following exception:
llama_tokenize: too many tokens
libc++abi: terminating with uncaught exception of type std::length_error: vector
Environment and Context
- macOS (M2 Max)
$ python3 --version 3.8.16
$ make --version i386-apple-darwin11.3.0
$ g++ --version arm64-apple-darwin22.3.0
Steps to Reproduce
Please provide detailed steps for reproducing the issue. We are not sitting in front of your screen, so the more detail the better.
- git pull
- make
- python3 convert-pth-to-ggml.py models/7B/ 1
- python3 quantize.py 7B
- ./main -m ./models/7B/ggml-model-q4_0.bin -t 4 -n 128 --perplexity -f ~/wikitext-2-raw/wiki.test.raw
Failure Logs
llama.cpp % ./main -m ./models/7B/ggml-model-q4_0.bin -t 4 -n 128 --perplexity -f ~/wikitext-2-raw/wiki.test.raw
main: seed = 1679472306
llama_model_load: loading model from './models/7B/ggml-model-q4_0.bin' - please wait ...
llama_model_load: n_vocab = 32000
llama_model_load: n_ctx = 512
llama_model_load: n_embd = 4096
llama_model_load: n_mult = 256
llama_model_load: n_head = 32
llama_model_load: n_layer = 32
llama_model_load: n_rot = 128
llama_model_load: f16 = 2
llama_model_load: n_ff = 11008
llama_model_load: n_parts = 1
llama_model_load: ggml ctx size = 4529.34 MB
llama_model_load: memory_size = 512.00 MB, n_mem = 16384
llama_model_load: loading model part 1/1 from './models/7B/ggml-model-q4_0.bin'
llama_model_load: .................................... done
llama_model_load: model size = 4017.27 MB / num tensors = 291
system_info: n_threads = 4 / 12 | AVX = 0 | AVX2 = 0 | AVX512 = 0 | FMA = 0 | NEON = 1 | ARM_FMA = 1 | F16C = 0 | FP16_VA = 1 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 0 | VSX = 0 |
llama_tokenize: too many tokens
libc++abi: terminating with uncaught exception of type std::length_error: vector
zsh: abort ./main -m ./models/7B/ggml-model-q4_0.bin -t 4 -n 128 --perplexity -f