perplexity: update README FP16 results [no ci] #7413

JohannesGaessler · 2024-05-20T10:58:17Z

The logits used for comparative runs of perplexity are stored as uint16_t instead of float. The difference from this downcasting can be non-negligible when looking at quants like q8_0 or q6_K. This PR adds a disclaimer and results to estimate the impact of the downcasting.

fedric95 · 2024-05-22T03:56:14Z

@JohannesGaessler great work, plans to add also the scoreboard for llama3-70b? It would be very useful to compare the trend in perplexity loss of llama2-70b and llama-3-70b

JohannesGaessler · 2024-05-22T07:38:35Z

I'm hesitant to publish anything with LLaMA 3 70b because it turned out that the machine I built with 6x RTX 4090 has stability issues which means I have to be very careful that the data isn't being affected by random bit flips.

fedric95 · 2024-05-23T06:08:30Z

I'm hesitant to publish anything with LLaMA 3 70b because it turned out that the machine I built with 6x RTX 4090 has stability issues which means I have to be very careful that the data isn't being affected by random bit flips.

:-( how much time does it take to run all the experiments for llama3 70b (more or less)? Just to understand how much it would cost.

JohannesGaessler · 2024-05-23T07:34:50Z

At standard settings a single LLaMA 3 70b run takes ~6 minutes on 6x RTX 4090.

@JohannesGaessler

…alues Uses the values computed by @JohannesGaessler in PR ggml-org#7413

@JohannesGaessler

…alues (#8058) Uses the values computed by @JohannesGaessler in PR #7413

@JohannesGaessler

…alues (ggml-org#8058) Uses the values computed by @JohannesGaessler in PR ggml-org#7413

perplexity: update README FP16 results [no ci]

01116c4

github-actions bot added the examples label May 20, 2024

mofosyne added the documentation Improvements or additions to documentation label May 20, 2024

slaren approved these changes May 20, 2024

View reviewed changes

JohannesGaessler merged commit 20385ce into ggml-org:master May 20, 2024
1 check passed

ddh0 added a commit to ddh0/llama.cpp that referenced this pull request Jun 21, 2024

Update llama-quantize ppl/file size output from LLaMA-v1 to Llama-3 v…

c51b0e5

…alues Uses the values computed by @JohannesGaessler in PR ggml-org#7413

ddh0 mentioned this pull request Jun 21, 2024

Update llama-quantize ppl/file size output from LLaMA-v1 to Llama-3 values #8058

Merged

4 tasks

Galunid pushed a commit that referenced this pull request Jun 22, 2024

Update llama-quantize ppl/file size output from LLaMA-v1 to Llama-3 v…

5b48cd5

…alues (#8058) Uses the values computed by @JohannesGaessler in PR #7413

arthw pushed a commit to arthw/llama.cpp that referenced this pull request Jun 30, 2024

Update llama-quantize ppl/file size output from LLaMA-v1 to Llama-3 v…

7bb7c41

…alues (ggml-org#8058) Uses the values computed by @JohannesGaessler in PR ggml-org#7413

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

perplexity: update README FP16 results [no ci] #7413

perplexity: update README FP16 results [no ci] #7413

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

perplexity: update README FP16 results [no ci] #7413

perplexity: update README FP16 results [no ci] #7413

Uh oh!

Conversation

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!