Skip to content

RyuJit: avoid conditional jumps using cmov and similar instructions #6749

@svick

Description

@svick

Conditional jumps, especially those that are hard to predict, are fairly expensive, so they should be avoided if possible. One way to avoid them is to use conditional moves and similar instructions (like sete). As far as I can tell, RuyJit never does this and I think it should.

For example, take these two methods:

[MethodImpl(MethodImplOptions.NoInlining)]
static long sete_or_mov(bool cond) {
    return cond ? 4 : 0;
}

[MethodImpl(MethodImplOptions.NoInlining)]
static long cmov(long longValue) {
    long tmp1 = longValue & 0x00000000ffffffff;
    return tmp1 == 0 ? longValue : tmp1;
}

For both of them, RyuJit generates a conditional jump:

; Assembly listing for method Program:sete_or_mov(bool):long
; Emitting BLENDED_CODE for X64 CPU with SSE2
; optimized code
; rsp based frame
; partially interruptible
; Final local variable assignments
;
;  V00 arg0         [V00,T00] (  3,   3  )    bool  ->  rcx
;  V01 tmp0         [V01,T01] (  3,   2  )     int  ->  rax
;# V02 OutArgs      [V02    ] (  1,   1  )  lclBlk ( 0) [rsp+0x00]
;
; Lcl frame size = 0

G_M60330_IG01:

G_M60330_IG02:
       84C9                 test     cl, cl
       7504                 jne      SHORT G_M60330_IG03
       33C0                 xor      eax, eax
       EB05                 jmp      SHORT G_M60330_IG04

G_M60330_IG03:
       B804000000           mov      eax, 4

G_M60330_IG04:
       4863C0               movsxd   rax, eax

G_M60330_IG05:
       C3                   ret

; Total bytes of code 17, prolog size 0 for method Program:sete_or_mov(bool):long
; ============================================================
; Assembly listing for method Program:cmov(long):long
; Emitting BLENDED_CODE for X64 CPU with SSE2
; optimized code
; rsp based frame
; partially interruptible
; Final local variable assignments
;
;  V00 arg0         [V00,T00] (  4,   3.5)    long  ->  rcx
;  V01 loc0         [V01,T01] (  3,   2.5)    long  ->  rax
;# V02 OutArgs      [V02    ] (  1,   1  )  lclBlk ( 0) [rsp+0x00]
;
; Lcl frame size = 0

G_M53075_IG01:

G_M53075_IG02:
       B8FFFFFFFF           mov      eax, 0xFFFFFFFF
       4823C1               and      rax, rcx
       4885C0               test     rax, rax
       7401                 je       SHORT G_M53075_IG04

G_M53075_IG03:
       C3                   ret

G_M53075_IG04:
       488BC1               mov      rax, rcx

G_M53075_IG05:
       C3                   ret

; Total bytes of code 18, prolog size 0 for method Program:cmov(long):long
; ============================================================

For comparison, here are the same methods compiled using Clang and GCC with -O1 (by Compiler Explorer):

GCC 6.2:

sete_or_mov(bool):
        test    dil, dil
        setne   al
        movzx   eax, al
        sal     rax, 2
        ret
cmov(unsigned long):
        mov     eax, edi
        test    rax, rax
        cmove   rax, rdi
        ret

Clang 3.9.0:

sete_or_mov(bool):                       # @sete_or_mov(bool)
        movzx   eax, dil
        shl     rax, 2
        ret

cmov(unsigned long):                               # @cmov(unsigned long)
        mov     eax, edi
        mov     ecx, 4294967295
        and     rcx, rdi
        cmove   rax, rdi
        ret

category:cq
theme:basic-cq
skill-level:expert
cost:large
impact:small

Metadata

Metadata

Assignees

Labels

area-CodeGen-coreclrCLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMIenhancementProduct code improvement that does NOT require public API changes/additionsoptimizationtenet-performancePerformance related issue

Type

No type

Projects

Status

Done

Relationships

None yet

Development

No branches or pull requests

Issue actions