-
Notifications
You must be signed in to change notification settings - Fork 24.8k
Description
🐛 Describe the bug
We are measuring performance on windows and observed that Windows whl performance regressed on rls/2.4 pre-released whl. In my local env, dev20240410 nightly whl was downloaded few month ago and we can see windows performance improved which might related to optimization PR by @xuhancn . We are get our best to search which nightly whl caused this regression, but the oldest nightly whl is 0513 which is already regressed.
Model | BS | Pytorch 2.1 THP | Pytorch 0314 THP | Pytorch 0410 | Pytorch 2.4 THP |
---|---|---|---|---|---|
RN50 | 4 | 40.1 | 40.0962 | 41.401 | 13.7635 |
Mobilenetv3 Large | 8 | 147.3972 | 139.6188 | 219.277 | 116.182 |
distilbert-base | 8 | 6.641 | 5.603333 | 8.7825 | 3.249667 |
roberta-base | 8 | 3.48425 | 2.572 | 4.201 | 1.6335 |
Hardware: 13th Gen Intel Core i7-13700H 2.4GHz
OS: Windows 11 23H2 22631.3593
Versions
How to reproduce:
https://github.com/WeizhuoZhang-intel/win_benchmarks/blob/main/torchvision_models.py
# torchvision
python torchvision_models.py
# transformers
pip install datasets evaluate accelerate transformers==4.34.1 scipy scikit-learn
git clone -b v4.34.1 --depth 1 https://github.com/huggingface/transformers.git
cd .\transformers\examples\pytorch\text-classification\
python run_glue.py --model_name_or_path distilbert-base-uncased-finetuned-sst-2-english --task_name sst2 --do_eval --max_seq_length 384 --output_dir ./tmp --per_device_eval_batch_size 8 --dataloader_drop_last
python run_glue.py --model_name_or_path "deepset/roberta-base-squad2" --task_name sst2 --do_eval --max_seq_length 384 --output_dir ./tmp --per_device_eval_batch_size 8 --dataloader_drop_last
cc @ezyang @gchanan @zou3519 @kadeng @msaroufim @seemethere @malfet @osalpekar @atalman @peterjc123 @mszhanyi @skyline75489 @nbcsm @vladimir-aubrecht @iremyux @Blackhex @cristianPanaite