-
Notifications
You must be signed in to change notification settings - Fork 5.1k
Closed
Labels
area-CodeGen-coreclrCLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMICLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMIenhancementProduct code improvement that does NOT require public API changes/additionsProduct code improvement that does NOT require public API changes/additionsoptimizationtenet-performancePerformance related issuePerformance related issue
Milestone
Description
Conditional jumps, especially those that are hard to predict, are fairly expensive, so they should be avoided if possible. One way to avoid them is to use conditional moves and similar instructions (like sete
). As far as I can tell, RuyJit never does this and I think it should.
For example, take these two methods:
[MethodImpl(MethodImplOptions.NoInlining)]
static long sete_or_mov(bool cond) {
return cond ? 4 : 0;
}
[MethodImpl(MethodImplOptions.NoInlining)]
static long cmov(long longValue) {
long tmp1 = longValue & 0x00000000ffffffff;
return tmp1 == 0 ? longValue : tmp1;
}
For both of them, RyuJit generates a conditional jump:
; Assembly listing for method Program:sete_or_mov(bool):long
; Emitting BLENDED_CODE for X64 CPU with SSE2
; optimized code
; rsp based frame
; partially interruptible
; Final local variable assignments
;
; V00 arg0 [V00,T00] ( 3, 3 ) bool -> rcx
; V01 tmp0 [V01,T01] ( 3, 2 ) int -> rax
;# V02 OutArgs [V02 ] ( 1, 1 ) lclBlk ( 0) [rsp+0x00]
;
; Lcl frame size = 0
G_M60330_IG01:
G_M60330_IG02:
84C9 test cl, cl
7504 jne SHORT G_M60330_IG03
33C0 xor eax, eax
EB05 jmp SHORT G_M60330_IG04
G_M60330_IG03:
B804000000 mov eax, 4
G_M60330_IG04:
4863C0 movsxd rax, eax
G_M60330_IG05:
C3 ret
; Total bytes of code 17, prolog size 0 for method Program:sete_or_mov(bool):long
; ============================================================
; Assembly listing for method Program:cmov(long):long
; Emitting BLENDED_CODE for X64 CPU with SSE2
; optimized code
; rsp based frame
; partially interruptible
; Final local variable assignments
;
; V00 arg0 [V00,T00] ( 4, 3.5) long -> rcx
; V01 loc0 [V01,T01] ( 3, 2.5) long -> rax
;# V02 OutArgs [V02 ] ( 1, 1 ) lclBlk ( 0) [rsp+0x00]
;
; Lcl frame size = 0
G_M53075_IG01:
G_M53075_IG02:
B8FFFFFFFF mov eax, 0xFFFFFFFF
4823C1 and rax, rcx
4885C0 test rax, rax
7401 je SHORT G_M53075_IG04
G_M53075_IG03:
C3 ret
G_M53075_IG04:
488BC1 mov rax, rcx
G_M53075_IG05:
C3 ret
; Total bytes of code 18, prolog size 0 for method Program:cmov(long):long
; ============================================================
For comparison, here are the same methods compiled using Clang and GCC with -O1
(by Compiler Explorer):
GCC 6.2:
sete_or_mov(bool):
test dil, dil
setne al
movzx eax, al
sal rax, 2
ret
cmov(unsigned long):
mov eax, edi
test rax, rax
cmove rax, rdi
ret
Clang 3.9.0:
sete_or_mov(bool): # @sete_or_mov(bool)
movzx eax, dil
shl rax, 2
ret
cmov(unsigned long): # @cmov(unsigned long)
mov eax, edi
mov ecx, 4294967295
and rcx, rdi
cmove rax, rdi
ret
category:cq
theme:basic-cq
skill-level:expert
cost:large
impact:small
HFadeel, pentp, ilexp, ActuallyaDeviloper, Tornhoof and 7 more
Metadata
Metadata
Assignees
Labels
area-CodeGen-coreclrCLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMICLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMIenhancementProduct code improvement that does NOT require public API changes/additionsProduct code improvement that does NOT require public API changes/additionsoptimizationtenet-performancePerformance related issuePerformance related issue
Type
Projects
Status
Done