Slowdown with tokens · Issue #6 · cmp-nct/ggllm.cpp · GitHub

8000 Slowdown with tokens · Issue #6 · cmp-nct/ggllm.cpp · GitHub

Slowdown with tokens #6

Open

Open

Slowdown with tokens#6

opened

on Jun 18, 2023

With each token proc 57EB essed the inference speed slows down a little bit, starts to become noticeable at around 50 tokens on 40B Q3_K and adds up.

Metadata

Assignees

No one assigned

Labels

No labels

No labels

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

0