With each token processed the inference speed slows down a little bit, starts to become noticeable at around 50 tokens on 40B Q3_K and adds up.