You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
lco@rtx:~$ llama-cli --version
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
Device 0: NVIDIA GeForce RTX 3090, compute capability 8.6, VMM: yes
version: 810 (658987c)
built with cc (Debian 12.2.0-14) 12.2.0 for x86_64-linux-gnu
INFO:hf-to-gguf:gguf: context length = 8192
INFO:hf-to-gguf:gguf: embedding length = 2048
INFO:hf-to-gguf:gguf: feed forward length = 11264
INFO:hf-to-gguf:gguf: head count = 16
INFO:hf-to-gguf:gguf: key-value head count = 1
INFO:hf-to-gguf:gguf: rope theta = 50000.0
INFO:hf-to-gguf:gguf: rms norm epsilon = 1e-05
INFO:hf-to-gguf:gguf: experts used count = 6
INFO:hf-to-gguf:gguf: file type = 1
INFO:hf-to-gguf:Set model quantization version
INFO:hf-to-gguf:Set model tokenizer
The repository for moonshotai/Moonlight-16B-A3B-Instruct contains custom code which must be executed to correctly load the model. You can inspect the repository content at https://hf.co/moonshotai/Moonlight-16B-A3B-Instruct.
You can avoid this prompt in future by passing the argument `trust_remote_code=True`.
Do you wish to run the custom code? [y/N] y
INFO:transformers_modules.Moonlight-16B-A3B-Instruct.tokenization_moonshot:Reloaded tiktoken model from moonshotai/Moonlight-16B-A3B-Instruct/tiktoken.model
INFO:transformers_modules.Moonlight-16B-A3B-Instruct.tokenization_moonshot:#words: 163842 - BOS ID: 163584 - EOS ID: 163585
Traceback (most recent call last):
File "/home/data1/protected/Programming/git/llama.cpp/convert_hf_to_gguf.py", line 5820, in<module>main()
File "/home/data1/protected/Programming/git/llama.cpp/convert_hf_to_gguf.py", line 5814, in main
model_instance.write()
File "/home/data1/protected/Programming/git/llama.cpp/convert_hf_to_gguf.py", line 401, in write
self.prepare_metadata(vocab_only=False)
File "/home/data1/protected/Programming/git/llama.cpp/convert_hf_to_gguf.py", line 493, in prepare_metadata
self.set_vocab()
File "/home/data1/protected/Programming/git/llama.cpp/convert_hf_to_gguf.py", line 4567, in set_vocab
self._set_vocab_gpt2()
File "/home/data1/protected/Programming/git/llama.cpp/convert_hf_to_gguf.py", line 805, in _set_vocab_gpt2
tokens, toktypes, tokpre = self.get_vocab_base()
^^^^^^^^^^^^^^^^^^^^^
File "/home/data1/protected/Programming/git/llama.cpp/convert_hf_to_gguf.py", line 580, in get_vocab_base
vocab_size = self.hparams.get("vocab_size", len(tokenizer.vocab))
^^^^^^^^^^^^^^^
File "/home/data1/protected/venv/lib/python3.11/site-packages/transformers/tokenization_utils_base.py", line 1108, in __getattr__
raise AttributeError(f"{self.__class__.__name__} has no attribute {key}")
AttributeError: TikTokenTokenizer has no attribute vocab
The text was updated successfully, but these errors were encountered:
Name and Version
lco@rtx:~$ llama-cli --version
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
Device 0: NVIDIA GeForce RTX 3090, compute capability 8.6, VMM: yes
version: 810 (658987c)
built with cc (Debian 12.2.0-14) 12.2.0 for x86_64-linux-gnu
Operating systems
Linux
GGML backends
CUDA
Hardware
CPU: 13th Gen Intel(R) Core(TM) i7-13700T (24) @ 4.90 GHz
GPU: NVIDIA GeForce RTX 3090 [Discrete]
Models
Moonlight-16B-A3B-Instruct
Problem description & steps to reproduce
THREADS=24 python $HOME/Programming/git/llama.cpp/convert_hf_to_gguf.py moonshotai/Moonlight-16B-A3B-Instruct --outtype f16 --outfile moonshotai/quantized/Moonlight-16B-A3B-Instruct.gguf
there is error
First Bad Commit
No response
Relevant log output
The text was updated successfully, but these errors were encountered: