10000 perplexity: update README FP16 results [no ci] by JohannesGaessler · Pull Request #7413 · ggml-org/llama.cpp · GitHub
[go: up one dir, main page]

Skip to content

perplexity: update README FP16 results [no ci] #7413

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
May 20, 2024

Conversation

JohannesGaessler
Copy link
Collaborator

The logits used for comparative runs of perplexity are stored as uint16_t instead of float. The difference from this downcasting can be non-negligible when looking at quants like q8_0 or q6_K. This PR adds a disclaimer and results to estimate the impact of the downcasting.

@mofosyne mofosyne added the documentation Improvements or additions to documentation label May 20, 2024
@JohannesGaessler JohannesGaessler merged commit 20385ce into ggml-org:master May 20, 2024
1 check passed
@fedric95
Copy link
fedric95 commented May 22, 2024

@JohannesGaessler great work, plans to add also the scoreboard for llama3-70b? It would be very useful to compare the trend in perplexity loss of llama2-70b and llama-3-70b

@JohannesGaessler
Copy link
Collaborator Author

I'm hesitant to publish anything with LLaMA 3 70b because it turned out that the machine I built with 6x RTX 4090 has stability issues which means I have to be very careful that the data isn't being affected by random bit flips.

@fedric95
Copy link

I'm hesitant to publish anything with LLaMA 3 70b because it turned out that the machine I built with 6x RTX 4090 has stability issues which means I have to be very careful that the data isn't being affected by random bit flips.

:-( how much time does it take to run all the experiments for llama3 70b (more or less)? Just to understand how much it would cost.

@JohannesGaessler
Copy link
Collaborator Author

At standard settings a single LLaMA 3 70b run takes ~6 minutes on 6x RTX 4090.

ddh0 added a commit to ddh0/llama.cpp that referenced this pull request Jun 21, 2024
Galunid pushed a commit that referenced this pull request Jun 22, 2024
arthw pushed a commit to arthw/llama.cpp that referenced this pull request Jun 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation examples
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants
0