-
-
Notifications
You must be signed in to change notification settings - Fork 11.1k
USIMD: Optimize the performace of np.einsum for all platforms #16641
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Closed
Changes from 1 commit
Commits
Show all changes
256 commits
Select commit
Hold shift + click to select a range
9d5812f
optimize sum_of_products_contig_stride0_outcontig_two using neon.
Qiyu8 0b6e5b6
optimize sum_of_products_stride0_contig_outstride0_two using neon.
Qiyu8 5ec4da5
optimize sum_of_products_contig_stride0_outstride0_two using neon.
Qiyu8 4454cb0
add dtype parameter
Qiyu8 7e40b1b
rebase
Qiyu8 fecd458
modified accoriding to new NPY_HAVE_NEON flag.
Qiyu8 c90ac6c
MAINT: Explicitly disallow object user dtypes
seberg 1cdc9a8
BUG: fix mgrid output for lower precision float inputs
cjblocker 5d1fbf4
TST: fixed dtype check error from code review
cjblocker 2ab7954
rebase
Qiyu8 2b790d2
Merge branch 'einsum-neon' of github.com:Qiyu8/numpy; branch 'master'…
Qiyu8 f82c7d7
TST: update mgrid test from code review
cjblocker 7a3962d
MAINT: reference issue in comments for added index_tricks tests
cjblocker 4dcbcc2
recontructing einsum using usimd
Qiyu8 05cb5b7
using usimd based on current framework
Qiyu8 689b3ab
add prefetch in memory
Qiyu8 5aa6515
add reverse usimd
Qiyu8 d4286b9
initialize the cpu dispatching of einsum
seiko2plus f93f567
Merge pull request #1 from seiko2plus/einsum-neon-dispatch
Qiyu8 d3414f5
Merge branch 'einsum-neon' of github.com:Qiyu8/numpy into einsum-neon
Qiyu8 a03e729
Merge branch 'master' of github.com:numpy/numpy into einsum-neon
Qiyu8 7b028f4
rewrite using simd api
Qiyu8 f1329db
Update numpy/core/src/common/simd/avx2/reorder.h
Qiyu8 dee5064
Update numpy/core/src/common/simd/avx512/reorder.h
Qiyu8 1ba69b2
Update numpy/core/src/common/simd/vsx/reorder.h
Qiyu8 ad4fc5b
Merge branch 'master' of github.com:numpy/numpy into improve-usimd
Qiyu8 e1265b4
add shuffle api
Qiyu8 6173d1a
remove tabs and offset
Qiyu8 4c3d283
DOC: Remove links for C codes
takanori-pskq 6bb947c
Fix exception causes in __init__.py
Ashutosh619-sudo 5dd1fe6
Fix exception causes in __init__.py
Ashutosh619-sudo 889a043
Update numpy/core/src/common/simd/neon/reorder.h
Qiyu8 c4f35ff
Update numpy/core/src/common/simd/neon/reorder.h
Qiyu8 08954bd
Update numpy/core/src/common/simd/neon/reorder.h
Qiyu8 ebe08ed
Merge branch 'improve-usimd' of github.com:Qiyu8/numpy into einsum-neon
Qiyu8 a58cd31
add shuffle api
Qiyu8 e4c9005
Merge branch 'master' of github.com:numpy/numpy into einsum-neon
Qiyu8 7050666
remove redundant func
Qiyu8 eb33fe2
Transform to usimd, SSE/SSE2/AVX2 passed
Qiyu8 038de24
update
Qiyu8 3d74fab
Configure hypothesis for np.test()
Zac-HD 668547a
Merge branch 'master' of https://github.com/numpy/numpy into einsum-neon
Qiyu8 59ba38c
modify neon shuffle
Qiyu8 8b9e8b2
fix neon shuffle api
Qiyu8 71b3618
Merge branch 'master' of https://github.com/numpy/numpy into einsum-neon
Qiyu8 8a4f3e8
Merge pull request #16900 from Ashutosh619-sudo/master
mattip cfb7a9c
BLD: update OpenBLAS build
mattip d45b16d
BUG: Allow array-like types to be coerced as object array elements
seberg b743bcc
Merge pull request #16940 from mattip/issue-16913
charris ba09393
DEP: Deprecate size-one ragged array coercion
seberg 5920407
Update numpy/core/tests/test_array_coercion.py
seberg 1e031f1
Update numpy/core/src/multiarray/array_coercion.c
seberg fe70857
DOC: add release note for #16815
cjblocker c7931f5
DOC: Fix the role of references (var -> macro)
takanori-pskq 1377418
changed the name of the folder icons to logo
18673c5
DOC: Fixup
takanori-pskq e627135
add sum api to usimd
Qiyu8 8c8c3b7
remove print
Qiyu8 6c8be6a
Merge pull request #16944 from InessaPawson/master
rgommers f457a1a
Merge pull request #16815 from cjblocker/mgrid-float
mattip 1ce5457
Merge pull request #16943 from seberg/deprecate-single-element-arrayl…
charris ce77458
ENH: enable colors for `runtests.py --ipython`
person142 c8df720
fix neon sum api, use normal for loop
Qiyu8 cba6d44
remove log
Qiyu8 745b1af
open maxop option
Qiyu8 7d04e22
Merge pull request #16949 from person142/runtests-ipython-colors
rgommers 7b8bda5
MAINT: Bump hypothesis from 5.20.2 to 5.23.2
dependabot-preview[bot] e8d32d8
Merge pull request #16952 from numpy/dependabot/pip/hypothesis-5.23.2
charris 9495f36
MAINT: Use arm64 instead of aarch64 on travis.
charris b26ef67
Merge pull request #16957 from charris/fix-arm64-warning
charris 7c0c83e
use more efficient instrument.
Qiyu8 4690248
update numpy/lib/arraypad.py with appropriate chain exception (#16953)
nomanarshad94 ae008b4
add AVX512DQ compatibility
Qiyu8 622514a
Merge branch 'einsum-neon' of https://github.com/Qiyu8/numpy into ein…
Qiyu8 9743409
fix avx512 segment fault problem
Qiyu8 fb79b9b
ENH: Use f90 compiler specified in command line args for pgi compiler…
62ca9df
Merge pull request #16941 from seberg/types-are-not-arraylikes
charris d28ac9a
BLD: add win32 pypy build
mattip f99c01a
DOC: Fixed typo in lib/recfunctions.py (#16973)
jesseli2002 6f67399
DOC: Clarify input to irfft/irfft2/irfftn (#16950)
bharatr21 cf5e766
TST: fix tests for windows + PyPy
mattip b46e5d3
Merge pull request #16974 from mattip/pypy-win32
charris 5311300
Update numpy/core/src/multiarray/einsum_p.h
Qiyu8 1493142
re-implment SIMD kernels of einsum
seiko2plus 5b020c7
re-implment SIMD kernels of einsum
seiko2plus 8879319
move to NPYV
Qiyu8 f4e5816
fix type error
Qiyu8 aca0ce6
MAINT: Added the `order` parameter to `np.array()` (#16966)
BvB93 e7c1d01
Merge pull request #16730 from danbeibei/fcompiler
charris b66f02b
MAINT: Implemented two dtype-related TODO's (#16622)
BvB93 6f0436d
ENH: Add Neon SIMD implementations for add, sub, mul, and div (#16969)
DumbMice d67326d
DOC: update val to be scalar or array like optional closes #16901 (#1…
leeyspaul 27cf59d
DOC: Fix the declarations of C fuctions (#16897)
takanori-pskq 210e542
Merge pull request #16896 from takanori-pskq/i13114-5
mattip 2d39e7f
DOC: Remove the links for ``True`` and ``False`` (#16887)
takanori-pskq 186c765
add vsx sum reduce
Qiyu8 4b83f05
DOC: Fix wrong markups in `arrays.dtypes`
takanori-pskq 800c43b
Merge pull request #16894 from takanori-pskq/fix-doc-dtypes-quote
mattip 7bda953
DOC: Add the new NumPy logo to Sphinx pages
bjnath f154484
DOC: Styling update for PR #16988
bjnath c3f7d3e
DOC: Delete old logo; updates PR #16988
bjnath 122330d
Merge pull request #16988 from bjnath/update_logo_on_sphinx_pages
rgommers 0f12338
Merge pull request #16879 from Zac-HD/isolate-hypothesis-config
mattip 77bd10c
Update numpy/core/src/common/simd/neon/arithmetic.h
Qiyu8 320fa52
Update numpy/core/src/common/simd/avx2/arithmetic.h
Qiyu8 8c81038
Update numpy/core/src/common/simd/avx512/arithmetic.h
Qiyu8 778a40e
Update numpy/core/src/common/simd/sse/arithmetic.h
Qiyu8 8c45114
Update numpy/core/src/multiarray/einsum.dispatch.c.src
Qiyu8 9076d48
Update numpy/core/src/multiarray/einsum.dispatch.c.src
Qiyu8 4cf23d4
Update numpy/core/src/multiarray/einsum.dispatch.c.src
Qiyu8 4ae3394
Update numpy/core/src/multiarray/einsum.dispatch.c.src
Qiyu8 b473e9a
Update numpy/core/src/multiarray/einsum.dispatch.c.src
Qiyu8 723f103
Update numpy/core/src/multiarray/einsum.dispatch.c.src
Qiyu8 cc99242
Update numpy/core/src/multiarray/einsum.dispatch.c.src
Qiyu8 bed1032
Update numpy/core/src/multiarray/einsum.dispatch.c.src
Qiyu8 2e1df6d
BLD: pin setuptools<49.2.0
mattip 60a21a3
Merge pull request #16993 from mattip/setuptools
charris 113ef15
MAINT: Bump hypothesis from 5.23.2 to 5.23.9
dependabot-preview[bot] 5b5a740
DOC: Add correctness vs strictness consideration for np.dtype (#16917)
anirudh2290 333e08e
BUG: Set readonly flag in array interface instead of warning (gh-16350)
abalkin 5d09976
Merge pull request #16991 from numpy/dependabot/pip/hypothesis-5.23.9
charris dd3d935
MAINT: Bump pytest from 5.4.3 to 6.0.1
dependabot-preview[bot] 2283e26
Merge pull request #16802 from seberg/user-dtypes-no-objects
mattip 8f60522
Merge pull request #16992 from numpy/dependabot/pip/pytest-6.0.1
charris 593ef5f
ENH: Speed up trim_zeros (#16911)
BvB93 e3c5213
MAINT: Chain exception in ``distutils/fcompiler/environment.py``. (#1…
nomanarshad94 e242859
DOC: Add note that allclose and isclose do not accept non-numeric typ…
iamsoto e1211b8
ENH: Add NumPy declarations to be used by Cython 3.0+ (#16986)
scoder 6bed9a9
DOC: Improve intersect1d docstring (#16420)
dkogan 40e8400
MAINT: Improve error handling in umathmodule setup (#17014)
eric-wieser 3023d06
DOC: Fix non-matching pronoun in format.py documentation. (gh-17022)
phoenix-meadowlark 29e2293
BUG: Raise correct errors in boolean indexing fast path (gh-17010)
asmeurer eec0aa2
NEP: Updated NEP-35 with keyword-only instruction (#17009)
pentschev cbd0897
BUG: fix a compile and a test warning
mattip 8a92eb4
DOC: Disclaimer for FFT library
bjnath dbf3744
Merge pull request #17028 from bjnath/fft-disclaimer
rgommers 961b56f
Merge pull request #17033 from mattip/random-pool_size
charris 00a45b4
DOC: Use a less ambiguous example for array_split (#17039)
yogeshr59 b1d88e0
optimize sum_of_products_stride0_contig_outcontig_two by using neon i…
Qiyu8 e27e051
optimize sum_of_products_contig_contig_outstride0_two by using neon i…
Qiyu8 74c31f1
optimize sum_of_products_contig_outstride0_one by using neon intrinsics
Qiyu8 5abc3a3
add benchmarks
Qiyu8 8f14897
optimize sum_of_products_contig_two using neon.
Qiyu8 5e201dd
optimize sum_of_products_contig_stride0_outcontig_two using neon.
Qiyu8 cc69acc
optimize sum_of_products_stride0_contig_outstride0_two using neon.
Qiyu8 86abb98
optimize sum_of_products_contig_stride0_outstride0_two using neon.
Qiyu8 b82f6c9
add dtype parameter
Qiyu8 cc81a94
modified accoriding to new NPY_HAVE_NEON flag.
Qiyu8 d0dee4d
recontructing einsum using usimd
Qiyu8 23b11a6
using usimd based on current framework
Qiyu8 8950bbe
initialize the cpu dispatching of einsum
seiko2plus cdf2c63
rewrite using simd api
Qiyu8 35ac5bb
add prefetch in memory
Qiyu8 3982569
add reverse usimd
Qiyu8 3602bfa
Update numpy/core/src/common/simd/avx2/reorder.h
Qiyu8 f4f7823
Update numpy/core/src/common/simd/avx512/reorder.h
Qiyu8 72603fe
Update numpy/core/src/common/simd/vsx/reorder.h
Qiyu8 078adbf
add shuffle api
Qiyu8 51345f0
remove tabs and offset
Qiyu8 0804a32
Update numpy/core/src/common/simd/neon/reorder.h
Qiyu8 c8029db
Update numpy/core/src/common/simd/neon/reorder.h
Qiyu8 1ae2518
Update numpy/core/src/common/simd/neon/reorder.h
Qiyu8 5b4c786
add shuffle api
Qiyu8 a0c8ac0
remove redundant func
Qiyu8 e0a951e
Transform to usimd, SSE/SSE2/AVX2 passed
Qiyu8 1260e95
modify neon shuffle
Qiyu8 9b38fba
fix neon shuffle api
Qiyu8 007a82e
add sum api to usimd
Qiyu8 3522690
remove print
Qiyu8 865173d
fix neon sum api, use normal for loop
Qiyu8 3bf0674
remove log
Qiyu8 d493ed4
open maxop option
Qiyu8 f846ae1
use more efficient instrument.
Qiyu8 de25c6c
add AVX512DQ compatibility
Qiyu8 d837ad3
fix avx512 segment fault problem
Qiyu8 48f6d51
Update numpy/core/src/multiarray/einsum_p.h
Qiyu8 ca849ce
re-implment SIMD kernels of einsum
seiko2plus 9fa5b4f
re-implment SIMD kernels of einsum
seiko2plus 82844b7
move to NPYV
Qiyu8 9905f08
fix type error
Qiyu8 91e4bba
add vsx sum reduce
Qiyu8 ebf4933
Update numpy/core/src/common/simd/neon/arithmetic.h
Qiyu8 753b38d
Update numpy/core/src/common/simd/avx2/arithmetic.h
Qiyu8 1b9d283
Update numpy/core/src/common/simd/avx512/arithmetic.h
Qiyu8 ae70f32
Update numpy/core/src/common/simd/sse/arithmetic.h
Qiyu8 1180231
Update numpy/core/src/multiarray/einsum.dispatch.c.src
Qiyu8 fcdfada
Update numpy/core/src/multiarray/einsum.dispatch.c.src
Qiyu8 43ef288
Update numpy/core/src/multiarray/einsum.dispatch.c.src
Qiyu8 85d10d5
Update numpy/core/src/multiarray/einsum.dispatch.c.src
Qiyu8 17fb2f0
Update numpy/core/src/multiarray/einsum.dispatch.c.src
Qiyu8 a588d04
Update numpy/core/src/multiarray/einsum.dispatch.c.src
Qiyu8 f4025f2
Update numpy/core/src/multiarray/einsum.dispatch.c.src
Qiyu8 cebee98
Update numpy/core/src/multiarray/einsum.dispatch.c.src
Qiyu8 5b0dbd7
Merge branch 'einsum-neon' of github.com:Qiyu8/numpy into einsum-neon
Qiyu8 b0f7b8c
optimize sum_of_products_stride0_contig_outcontig_two by using neon i…
Qiyu8 e7a158f
optimize sum_of_products_contig_contig_outstride0_two by using neon i…
Qiyu8 3224f56
optimize sum_of_products_contig_outstride0_one by using neon intrinsics
Qiyu8 1796d61
add benchmarks
Qiyu8 161a645
optimize sum_of_products_contig_two using neon.
Qiyu8 967476f
optimize sum_of_products_contig_stride0_outcontig_two using neon.
Qiyu8 7b33da3
optimize sum_of_products_stride0_contig_outstride0_two using neon.
Qiyu8 8dba9d6
optimize sum_of_products_contig_stride0_outstride0_two using neon.
Qiyu8 88e8ddb
add dtype parameter
Qiyu8 65d3260
modified accoriding to new NPY_HAVE_NEON flag.
Qiyu8 76db405
recontructing einsum using usimd
Qiyu8 eec1857
using usimd based on current framework
Qiyu8 21137f8
initialize the cpu dispatching of einsum
seiko2plus 731879c
rewrite using simd api
Qiyu8 22180b2
add prefetch in memory
Qiyu8 1de1692
add reverse usimd
Qiyu8 edab5e0
Update numpy/core/src/common/simd/avx2/reorder.h
Qiyu8 4abce24
Update numpy/core/src/common/simd/avx512/reorder.h
Qiyu8 f818fab
Update numpy/core/src/common/simd/vsx/reorder.h
Qiyu8 0f534e2
add shuffle api
Qiyu8 09a3ebc
remove tabs and offset
Qiyu8 1ec8126
Update numpy/core/src/common/simd/neon/reorder.h
Qiyu8 42aa799
Update numpy/core/src/common/simd/neon/reorder.h
Qiyu8 f568bb0
Update numpy/core/src/common/simd/neon/reorder.h
Qiyu8 dbd79cd
add shuffle api
Qiyu8 a10c3ce
remove redundant func
Qiyu8 20e5fa8
Transform to usimd, SSE/SSE2/AVX2 passed
Qiyu8 88d5838
modify neon shuffle
Qiyu8 b0c526e
fix neon shuffle api
Qiyu8 6751d48
add sum api to usimd
Qiyu8 ea9fad0
remove print
Qiyu8 01e5145
fix neon sum api, use normal for loop
Qiyu8 ff8ab27
remove log
Qiyu8 e9d0d61
open maxop option
Qiyu8 be46956
use more efficient instrument.
Qiyu8 b85e32a
add AVX512DQ compatibility
Qiyu8 e66ff74
fix avx512 segment fault problem
Qiyu8 e208542
Update numpy/core/src/multiarray/einsum_p.h
Qiyu8 2298ea9
re-implment SIMD kernels of einsum
seiko2plus adb094c
re-implment SIMD kernels of einsum
seiko2plus 96eb54f
move to NPYV
Qiyu8 dfaeb14
fix type error
Qiyu8 aed4e7b
add vsx sum reduce
Qiyu8 96aa316
Update numpy/core/src/common/simd/neon/arithmetic.h
Qiyu8 89f7861
Update numpy/core/src/common/simd/avx2/arithmetic.h
Qiyu8 787cc67
Update numpy/core/src/common/simd/avx512/arithmetic.h
Qiyu8 a561d94
Update numpy/core/src/common/simd/sse/arithmetic.h
Qiyu8 458c4ba
Update numpy/core/src/multiarray/einsum.dispatch.c.src
Qiyu8 4e26633
Update numpy/core/src/multiarray/einsum.dispatch.c.src
Qiyu8 f9b9250
Update numpy/core/src/multiarray/einsum.dispatch.c.src
Qiyu8 31aac3a
Update numpy/core/src/multiarray/einsum.dispatch.c.src
Qiyu8 1aee497
Update numpy/core/src/multiarray/einsum.dispatch.c.src
Qiyu8 f46c61f
Update numpy/core/src/multiarray/einsum.dispatch.c.src
Qiyu8 a172e68
Update numpy/core/src/multiarray/einsum.dispatch.c.src
Qiyu8 25f4897
Update numpy/core/src/multiarray/einsum.dispatch.c.src
Qiyu8 ea7638c
Merge branch 'einsum-neon' of github.com:Qiyu8/numpy into einsum-neon
Qiyu8 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
open maxop option
- Loading branch information
commit 745b1afc1b3c77dd362f3f540b51aad03ff5bc2f
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.