8000 Added SSE support for Quaternion operations by AntonDan · Pull Request #97 · HandmadeMath/HandmadeMath · GitHub
[go: up one dir, main page]

Skip to content

Conversation

@AntonDan
Copy link

Added SSE support for almost all operations involving quaternions (Inverse, NLerp, Normalize, Dot, DivF, MulF, Mul, Sub, Add). SLerp also mostly supports SSE now since it largely calls the other quaternion operations.

Benchmarks:


Optimization: O2
| Function    |     SSE         |      NO SSE      |
====================================================
| Inverse     |     163 (0.89s) |      165 (1.89s) |
| NLerp       |     330 (1.70s) |      330 (1.75s) |
| Normalize   |     169 (1.03s) |      169 (1.06s) |
| Dot         |     22  (1.15s) |      23  (1.14s) |
| DivF        |     23  (0.72s) |      23  (0.82s) |
| MulF        |     22  (0.75s) |      22  (0.79s) |
| Mul         |     24  (1.14s) |      23  (1.24s) |
| Sub         |     23  (1.17s) |      37  (1.20s) |
| Add         |     23  (1.20s) |      24  (1.19s) |

Where values are:  avg cycles spent in operation (avg total execution time in seconds)


Optimization: O0
| Function    |     SSE         |      NO SSE      |
====================================================
| Inverse     |     394 (1.62s) |      430 (3.05s) |
| NLerp       |     694 (2.71s) |      1035(4.81s) |
| Normalize   |     374 (1.58s) |      412 (2.95s) |
| Dot         |     81  (1.83s) |      23  (2.50s) |
| DivF        |     61  (1.12s) |      25  (2.37s) |
| MulF        |     58  (1.09s) |      23  (2.31s) |
| Mul         |     94  (1.97s) |      42  (2.88s) |
| Sub         |     75  (1.83s) |      23  (2.82s) |
| Add         |     75  (1.81s) |      23  (2.81s) |

Where values are: avg cycles spent in operation (avg total execution time in seconds)

Each benchmark was 10 executions of a test containing 10,000,000 operations. 

Tests:

ScalarMath:
    Trigonometry: [PASS] (15/15 passed)
    ToRadians: [PASS] (3/3 passed)
    SquareRoot: [PASS] (1/1 passed)
    RSquareRootF: [PASS] (1/1 passed)
    Power: [PASS] (3/3 passed)
    PowerF: [PASS] (3/3 passed)
    Lerp: [PASS] (3/3 passed)
    Clamp: [PASS] (3/3 passed)
8/8 tests passed, 0 failures

Initialization:
    Vectors: [PASS] (148/148 passed)
    MatrixEmpty: [PASS] (32/32 passed)
    MatrixDiagonal: [PASS] (16/16 passed)
    Quaternion: [PASS] (12/12 passed)
4/4 tests passed, 0 failures

VectorOps:
    LengthSquared: [PASS] (6/6 passed)
    Length: [PASS] (6/6 passed)
    Normalize: [PASS] (24/24 passed)
    NormalizeZero: [PASS] (18/18 passed)
    FastNormalize: [PASS] (24/24 passed)
    FastNormalizeZero: [PASS] (18/18 passed)
    Cross: [PASS] (3/3 passed)
    DotVec2: [PASS] (2/2 passed)
    DotVec3: [PASS] (2/2 passed)
    DotVec4: [PASS] (2/2 passed)
10/10 tests passed, 0 failures

MatrixOps:
    Transpose: [PASS] (16/16 passed)
1/1 tests passed, 0 failures

QuaternionOps:
    Inverse: [PASS] (4/4 passed)
    Dot: [PASS] (2/2 passed)
    Normalize: [PASS] (8/8 passed)
    NLerp: [PASS] (4/4 passed)
    Slerp: [PASS] (4/4 passed)
    ToMat4: [PASS] (16/16 passed)
    FromAxisAngle: [PASS] (4/4 passed)
7/7 tests passed, 0 failures

Addition:
    Vec2: [PASS] (8/8 passed)
    Vec3: [PASS] (12/12 passed)
    Vec4: [PASS] (16/16 passed)
    Mat4: [PASS] (64/64 passed)
    Quaternion: [PASS] (16/16 passed)
5/5 tests passed, 0 failures

Subtraction:
    Vec2: [PASS] (8/8 passed)
    Vec3: [PASS] (12/12 passed)
    Vec4: [PASS] (16/16 passed)
    Mat4: [PASS] (64/64 passed)
    Quaternion: [PASS] (16/16 passed)
5/5 tests passed, 0 failures

Multiplication:
    Vec2Vec2: [PASS] (8/8 passed)
    Vec2Scalar: [PASS] (10/10 passed)
    Vec3Vec3: [PASS] (12/12 passed)
    Vec3Scalar: [PASS] (15/15 passed)
    Vec4Vec4: [PASS] (16/16 passed)
    Vec4Scalar: [PASS] (19/19 passed)
    Mat4Mat4: [PASS] (48/48 passed)
    Mat4Scalar: [PASS] (80/80 passed)
    Mat4Vec4: [PASS] (12/12 passed)
    QuaternionQuaternion: [PASS] (12/12 passed)
    QuaternionScalar: [PASS] (20/20 passed)
11/11 tests passed, 0 failures

Division:
    Vec2Vec2: [PASS] (8/8 passed)
    Vec2Scalar: [PASS] (8/8 passed)
    Vec3Vec3: [PASS] (12/12 passed)
    Vec3Scalar: [PASS] (12/12 passed)
    Vec4Vec4: [PASS] (16/16 passed)
    Vec4Scalar: [PASS] (16/16 passed)
    Mat4Scalar: [PASS] (64/64 passed)
    QuaternionScalar: [PASS] (16/16 passed)
8/8 tests passed, 0 failures

Equality:
    Vec2: [PASS] (6/6 passed)
    Vec3: [PASS] (6/6 passed)
    Vec4: [PASS] (6/6 passed)
3/3 tests passed, 0 failures

Projection:
    Orthographic: [PASS] (4/4 passed)
    Perspective: [PASS] (8/8 passed)
2/2 tests passed, 0 failures

Transformations:
    Translate: [PASS] (4/4 passed)
    Rotate: [PASS] (12/12 passed)
    Scale: [PASS] (4/4 passed)
    LookAt: [PASS] (16/16 passed)
4/4 tests passed, 0 failures

SSE:
    LinearCombine: [PASS] (16/16 passed)
1/1 tests passed, 0 failures

69/69 tests passed overall, 0 failures

O2
| Function    |     SSE         |      NO SSE      |
====================================================
| Inverse     |     163 (0.89s) |      165 (1.89s) |
| NLerp       |     330 (1.70s) |      330 (1.75s) |
| Normalize   |     169 (1.03s) |      169 (1.06s) |
| Dot         |     22  (1.15s) |      23  (1.14s) |
| DivF        |     23  (0.72s) |      23  (0.82s) |
| MulF        |     22  (0.75s) |      22  (0.79s) |
| Mul         |     24  (1.14s) |      23  (1.24s) |
| Sub         |     23  (1.17s) |      37  (1.20s) |
| Add         |     23  (1.20s) |      24  (1.19s) |



O0
| Function    |     SSE         |      NO SSE      |
====================================================
| Inverse     |     394 (1.62s) |      430 (3.05s) |
| NLerp       |     694 (2.71s) |      1035(4.81s) |
| Normalize   |     374 (1.58s) |      412 (2.95s) |
| Dot         |     81  (1.83s) |      23  (2.50s) |
| DivF        |     61  (1.12s) |      25  (2.37s) |
| MulF        |     58  (1.09s) |      23  (2.31s) |
| Mul         |     94  (1.97s) |      42  (2.88s) |
| Sub         |     75  (1.83s) |      23  (2.82s) |
| Add         |     75  (1.81s) |      23  (2.81s) |
Old quaternion multiplication had a bug, this is a different approach.
@RandyGaul
Copy link

Cool!

@strangezakary
Copy link
Member

Talk about a well done PR! I’ll check this out tonight

@strangezakary
Copy link
Member

I like this a lot, can't find any issues after doing a pass over this. @bvisness are you okay with me merging this in?

@bvisness
Copy link
Member

Travis isn't showing the results here for some reason but you can see here that it's working fine. I'm gonna merge this into another branch so we can write release notes and then I'll merge and make a release.

@AntonDan, thanks so much for your contribution!

@AntonDan
Copy link
Author

No problem :)
I'm willing to do further optimizations if you ever add SSE3 support.

@bvisness bvisness changed the base branch from master to quaternion-sse March 11, 2019 18:04
@bvisness bvisness merged commit e9b8f6d into HandmadeMath:quaternion-sse Mar 11, 2019
bvisness added a commit that referenced this pull request Mar 11, 2019
* Added SSE support for Quaternion operations (#97)

* Added SSE support for Quaternion operations

O2
| Function    |     SSE         |      NO SSE      |
====================================================
| Inverse     |     163 (0.89s) |      165 (1.89s) |
| NLerp       |     330 (1.70s) |      330 (1.75s) |
| Normalize   |     169 (1.03s) |      169 (1.06s) |
| Dot         |     22  (1.15s) |      23  (1.14s) |
| DivF        |     23  (0.72s) |      23  (0.82s) |
| MulF        |     22  (0.75s) |      22  (0.79s) |
| Mul         |     24  (1.14s) |      23  (1.24s) |
| Sub         |     23  (1.17s) |      37  (1.20s) |
| Add         |     23  (1.20s) |      24  (1.19s) |



O0
| Function    |     SSE         |      NO SSE      |
====================================================
| Inverse     |     394 (1.62s) |      430 (3.05s) |
| NLerp       |     694 (2.71s) |      1035(4.81s) |
| Normalize   |     374 (1.58s) |      412 (2.95s) |
| Dot         |     81  (1.83s) |      23  (2.50s) |
| DivF        |     61  (1.12s) |      25  (2.37s) |
| MulF        |     58  (1.09s) |      23  (2.31s) |
| Mul         |     94  (1.97s) |      42  (2.88s) |
| Sub         |     75  (1.83s) |      23  (2.82s) |
| Add         |     75  (1.81s) |      23  (2.81s) |

* Fixed quaternion multiplication

Old quaternion multiplication had a bug, this is a different approach.

* Added release notes and version for 1.9.0
@bvisness bvisness mentioned this pull request Mar 11, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants

0