-
Notifications
You must be signed in to change notification settings - Fork 5.1k
Faster Vector128/64 compare on arm64 #75864
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch Issue DetailsApply @TamarChristinaArm's suggestions for faster vector comparison in #75849 bool Test1(Vector128<int> a, Vector128<int> b) => a == b; Now emits: ; Method Test1
G_M3164_IG01:
A9BF7BFD stp fp, lr, [sp, #-0x10]!
910003FD mov fp, sp
G_M3164_IG02:
6EA18C10 cmeq v16.4s, v0.4s, v1.4s
6EB0AE10 uminp v16.4s, v16.4s, v16.4s
4E083E00 umov x0, v16.d[0]
B100041F cmn x0, #1
9A9F17E0 cset x0, eq
G_M3164_IG03:
A8C17BFD ldp fp, lr, [sp], #0x10
D65F03C0 ret lr
; Total bytes of code: 36
|
That doesn't look right, The 64-bit case should be transferring the entire register, so I'm expecting the same |
Ah, so I thought, let me fix it 🙂 |
I assume the diff for the 128-bit one is reversed? but otherwise that looks good to me now. |
Yes, should be correct now, thanks! |
@kunalspathak PTAL this one too please |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@TamarChristinaArm it's too early to estimate the whole impact but I already see nice improvements from this, e.g.: And even more here: |
Naise! |
Improvements: dotnet/perf-autofiling-issues#8660 |
Apply @TamarChristinaArm's suggestions for faster vector comparison in #75849
Now emits: