8000 Compute perplexity fails with too many tokens exception · Issue #385 · ggml-org/llama.cpp · GitHub
[go: up one dir, main page]

Skip to content
Compute perplexity fails with too many tokens exception #385
Closed
@maziyarpanahi

Description

@maziyarpanahi

Prerequisites

Please answer the following questions for yourself before submitting an issue.

  • I am running the latest code. Development is very rapid so there are no tagged versions as of now.
  • I carefully followed the README.md.
  • I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
  • I reviewed the Discussions, and have a new bug or useful enhancement to share.

Expected Behavior

It is supposed to compute perplexity like the original PR: #270

Current Behavior

However, it fails with the following exception:

llama_tokenize: too many tokens
libc++abi: terminating with uncaught exception of type std::length_error: vector

Environment and Context

  • macOS (M2 Max)
$ python3 --version 3.8.16
$ make --version i386-apple-darwin11.3.0
$ g++ --version arm64-apple-darwin22.3.0

Steps to Reproduce

Please provide detailed steps for reproducing the issue. We are not sitting in front of your screen, so the more detail the better.

  1. git pull
  2. make
  3. python3 convert-pth-to-ggml.py models/7B/ 1
  4. python3 quantize.py 7B
  5. ./main -m ./models/7B/ggml-model-q4_0.bin -t 4 -n 128 --perplexity -f ~/wikitext-2-raw/wiki.test.raw

Failure Logs

llama.cpp % ./main -m ./models/7B/ggml-model-q4_0.bin -t 4 -n 128 --perplexity -f ~/wikitext-2-raw/wiki.test.raw

main: seed = 1679472306
llama_model_load: loading model from './models/7B/ggml-model-q4_0.bin' - please wait ...
llama_model_load: n_vocab = 32000
llama_model_load: n_ctx   = 512
llama_model_load: n_embd  = 4096
llama_model_load: n_mult  = 256
llama_model_load: n_head  = 32
llama_model_load: n_layer = 32
llama_model_load: n_rot   = 128
llama_model_load: f16     = 2
llama_model_load: n_ff    = 11008
llama_model_load: n_parts = 1
llama_model_load: ggml ctx size = 4529.34 MB
llama_model_load: memory_size =   512.00 MB, n_mem = 16384
llama_model_load: loading model part 1/1 from './models/7B/ggml-model-q4_0.bin'
llama_model_load: .................................... done
llama_model_load: model size =  4017.27 MB / num tensors = 291

system_info: n_threads = 4 / 12 | AVX = 0 | AVX2 = 0 | AVX512 = 0 | FMA = 0 | NEON = 1 | ARM_FMA = 1 | F16C = 0 | FP16_VA = 1 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 0 | VSX = 0 |
llama_tokenize: too many tokens
libc++abi: terminating with uncaught exception of type std::length_error: vector
zsh: abort      ./main -m ./models/7B/ggml-model-q4_0.bin -t 4 -n 128 --perplexity -f

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      0