Feature Request: Add "trust_remote_code support" to 'convert_hf_to_gguf.py' for compatibility with modern HF models

### Prerequisites

- [x] I am running the latest code. Mention the version if possible as well.
- [x] I carefully followed the [README.md](https://github.com/ggml-org/llama.cpp/blob/master/README.md).
- [x] I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
- [x] I reviewed the [Discussions](https://github.com/ggml-org/llama.cpp/discussions), and have a new and useful enhancement to share.

### Feature Description

### Summary

The current `convert_hf_to_gguf.py` script fails to convert HuggingFace models that require `trust_remote_code=True`, especially those using custom tokenizers or architectures (e.g., `TikTokenTokenizer`). This results in runtime errors or input prompts that break automation and scripting workflows.

Adding support for `trust_remote_code` (via CLI flag or built-in toggle) would enable full compatibility with a growing set of modern models.

---

### 🔥 Encountered Issues

During conversion of [`huihui-ai/Moonlight-16B-A3B-Instruct-abliterated`](https://huggingface.co/huihui-ai/Moonlight-16B-A3B-Instruct-abliterated), the following blockers were hit:

1. **Broken `tokenizer_config.json`**
   - Trailing comma → `JSONDecodeError`
   - Manual fix required

2. **Script lacks support for `trust_remote_code=True`**
   - Models using TikTokenTokenizer trigger:
     ```
     ValueError: Please pass trust_remote_code=True
     ```
   - Prompt appears even when run in non-interactive mode
   - No `--trust-remote-code` CLI flag exists

3. **Script attempts `.vocab` on TikTokenTokenizer**
   - Raises:
     ```
     AttributeError: 'TikTokenTokenizer' object has no attribute 'vocab'
     ```
   - Workaround: use `tokenizer.model.n_vocab` or hardcoded value

---

### 🧠 Suggested Fixes

(Flexible depending on preference)

- ✅ Add CLI flag: `--trust-remote-code`
- ✅ Pass the flag to `AutoTokenizer.from_pretrained()` and `AutoModelForCausalLM.from_pretrained()`
- ✅ Gracefully handle vocab fallback for tokenizers missing `.vocab`

Optional:
- ❓ Auto-enable `trust_remote_code` if `tokenizer_class` is custom?
- ❓ Add an example or mention this behavior in `README` or `docs/gguf.md`

---

### 🧪 Repro Steps

```bash
git clone https://huggingface.co/huihui-ai/Moonlight-16B-A3B-Instruct-abliterated
./convert_hf_to_gguf.py --verbose  --outfile moonlight.gguf --outtype bf16 huihui-ai/Moonlight-16B-A3B-Instruct-abliterated
# fails with prompt + crashes if not patched
```

---

### ✅ Workaround (What I Did)

- Manually patched:
  ```python
  AutoTokenizer.from_pretrained(..., trust_remote_code=True)
  ```
- Replaced `.vocab` call with `tokenizer.model.n_vocab`
- Then conversion succeeded

---

### 🙏 Why This Matters

Many newer models are starting to:
- Use custom tokenizers (`TikToken`, `JinjaTokenizer`, etc.)
- Require remote code execution
- Break current conversion pipelines

Fixing this once will unlock dozens of models for GGUF and `llama.cpp`.

Happy to PR this if helpful. Thanks again for all the work — your tooling is incredible.


### Motivation

### Motivation

Support for `trust_remote_code=True` is increasingly important as more HuggingFace models rely on custom tokenizers or architectures. Without it, `convert_hf_to_gguf.py` cannot load or convert these models, blocking compatibility with `llama.cpp`.


### Possible Implementation

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Prerequisites

Feature Description

Summary

🔥 Encountered Issues

🧠 Suggested Fixes

🧪 Repro Steps

✅ Workaround (What I Did)

🙏 Why This Matters

Motivation

Motivation

Possible Implementation

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Description

Prerequisites

Feature Description

Summary

🔥 Encountered Issues

🧠 Suggested Fixes

🧪 Repro Steps

✅ Workaround (What I Did)

🙏 Why This Matters

Motivation

Motivation

Possible Implementation

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions