Comparing changes

* Call cuda empty_cache to prevent OOM when quantizing model * empty cache during export and after forward

* fix ascend get_started.md link * fix en ascend get_started.md

* Support min_tokens for api_server * fix * use min_new_tokens * add min_p

* fix index error when computing ppl on long-text prompt * update user guide

* miss to read moe_ffn weights * fix linting * fix linting * fix linting

* fix zh_cn supported_models.md llama3.2 version * fix zh_cn supported_models.md llama3.2 version

* update * update * update * update * update

Commits on Nov 7, 2024

bump version to 0.6.2.post1 (#2717 )

lvhan028 authored Nov 7, 2024

Configuration menu

View commit details

Copy full SHA for 4fc9479

Browse repository at this point

Copy the full SHA

4fc9479 View commit details

Browse the repository at this point in the history

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Comparing changes

Open a pull request

Uh oh!

Commits on Nov 5, 2024

Commits on Nov 7, 2024

This comparison is taking too long to generate.

Uh oh!