8000 llama : custom attention mask + parallel decoding + no context swaps by ggerganov · Pull Request #3228 · ggml-org/llama.cpp · GitHub

llama : custom attention mask + parallel decoding + no context swaps#3228

Merged

ggerganov merged 57 commits intomasterfrom

custom-attention-mask

Sep 28, 2023

Commits on Sep 18, 2023

Commits on Sep 19, 2023

Commits on Sep 20, 2023

Commits on Sep 21, 2023

Commits on Sep 27, 2023

llama : fix kv cache heuristic when context is less than 32
ggerganov
committed

Commits on Sep 28, 2023

0