.NET8 ultimate performance optimization CHRL

Preface

.NET8 has further optimized on the basis of .NET7, such as CHRL (full name: CORINFO_HELP_RNGCHKFAIL) optimization technology. CORINFO_HELP_RNGCHKFAIL is a boundary check. In .NET7 it has been partially optimized, but in .NET8 it continues to optimize, similar to manual Smart, .NET8 is aware of certain performance issues and can optimize them. Let’s take a look at this article

Original address:. NET8 Ultimate Performance Optimization CHRL

Overview

JIT will check the range boundaries of arrays and strings. For example, whether the index of the array is within the array length range and cannot exceed it. So JIT will generate boundary checking steps.

public class Tests
{
    private byte[] _array = new byte[8];
    private int _index = 4;

    public void Get() => Get(_array, _index);

    [MethodImpl(MethodImplOptions.NoInlining)]
    private static byte Get(byte[] array, int index) => array[index];
}

The ASM of Get function .NET7 is as follows:

; Tests.Get(Byte[], Int32)
       sub       rsp,28
       cmp       edx,[rcx+8]
       jae       short M01_L00
       mov       eax,edx
       movzx     eax,byte ptr [rcx+rax+10]
       add       rsp,28
       ret
M01_L00:
       call      CORINFO_HELP_RNGCHKFAIL
       int       3

The cmp instruction compares the array length offset by 8 positions of the array's MT (method table) with the current array index. If the two indexes are greater than (the latter) or equal to (jae) the array length (the former). It will jump to CORINFO_HELP_RNGCHKFAIL for boundary checking, which may cause an exception IndexOutOfRangeException that exceeds the index range. But in fact, the access to this code only requires two movs, one is the index of the array, and the other is (MT (method table) + 0x10 + index) and returns the value. So there is clearly visible optimization here.
.NET8 has learned some intelligent optimization of range boundaries. In other words, boundary checks are not required in some places, so boundary checks can be optimized to improve code performance. Example below:

 private readonly int[] _array = new int[7];
   public int GetBucket() => GetBucket(_array, 42);
   private static int GetBucket(int[] buckets, int hashcode) =>
   buckets[(uint)hashcode % buckets.Length];

.NET7 its ASM is as follows:

; Tests.GetBucket()
       sub       rsp,28
       mov       rcx,[rcx+8]
       mov       eax,2A
       mov       edx,[rcx+8]
       mov       r8d,edx
       xor       edx,edx
       idiv      r8
       cmp       rdx,r8
       jae       short M00_L00
       mov       eax,[rcx+rdx*4+10]
       add       rsp,28
       ret
M00_L00:
       call      CORINFO_HELP_RNGCHKFAIL
       int       3

It still performs boundary checking, but the JIT of .NET8 can automatically recognize that the index of (uint)hashcode%buckets.Length cannot exceed the length of the array, which is buckets.Length. Therefore, .NET8 can omit boundary checking, as follows.NET8 ASM

; Tests.GetBucket()
       mov       rcx,[rcx+8]
       mov       eax,2A
       mov       r8d,[rcx+8]
       xor       edx,edx
       div       r8
       mov       eax,[rcx+rdx*4+10]
       ret

Let’s look at another example:

public class Tests
{
    private readonly string _s = "\"Hello, World!\"";

    public bool IsQuoted() => IsQuoted(_s);

    private static bool IsQuoted(string s) =>
    s.Length >= 2 && s[0] == '"' && s[^1] == '"';
}

IsQuoted checks whether the string has at least two characters, and the beginning and end of the string are ended with quotation marks. s[^1] means s[s.Length - 1], which is the length of the string. .NET7 ASM is as follows:

; Tests.IsQuoted(System.String)
       sub       rsp,28
       mov       eax,[rcx+8]
       cmp       eax,2
       jl        short M01_L00
       cmp       word ptr [rcx+0C],22
       jne       short M01_L00
       lea       edx,[rax-1]
       cmp       edx,eax
       jae       short M01_L01
       mov       eax,edx
       cmp       word ptr [rcx+rax*2+0C],22
       sete      al
       movzx     eax,al
       add       rsp,28
       ret
M01_L00:
       xor       eax,eax
       add       rsp,28
       ret
M01_L01:
       call      CORINFO_HELP_RNGCHKFAIL
       int       3

Notice how .NET7 actually performs a bounds check, but only checks one because it only has one jae instruction jump. Why is this? The JIT already knows that there is no need to perform a bounds check on s[0], because s.Length >= 2 has already been checked, and no check is required as long as the index is less than 2 (because the index is unsigned and there are no negative numbers). However, the boundary check is still performed on s[s.Length - 1], so although .NET7 is also cunning, it is not thorough enough.
Let’s take a look at .NET8 which is completely sexy

; Tests.IsQuoted(System.String)
       mov       eax,[rcx+8]
       cmp       eax,2
       jl        short M01_L00
       cmp       word ptr [rcx+0C],22
       jne       short M01_L00
       dec       eax
       cmp       word ptr [rcx+rax*2+0C],22
       sete      al
       movzx     eax,al
       ret
M01_L00:
       xor       eax,eax
       ret

With no bounds checking at all, the JIT not only realizes that s[0] is safe because s.Length >= 2 has been checked. Because I checked s.Length >= 2, I also realized that s.length> s.Length-1 >=1. So there is no need for boundary checking, it is all optimized.

You can see how powerful the performance optimization of .NET8 is. It basically drains the JIT engine and allows it to optimize to the maximum degree of intelligence.


Click below to join the technical discussion group:

Welcome to join the .NET technology exchange group

Spring Boot 3.2.0 is officially released. The most serious service failure in Didi’s history. Is the culprit the underlying software or “reducing costs and increasing laughter”? Programmers tampered with ETC balances and embezzled more than 2.6 million yuan a year. Google employees criticized the big boss after leaving their jobs. They were deeply involved in the Flutter project and formulated HTML-related standards. Microsoft Copilot Web AI will be officially launched on December 1, supporting Chinese PHP 8.3 GA Firefox in 2023 Rust Web framework Rocket has become faster and released v0.5: supports asynchronous, SSE, WebSockets, etc. Loongson 3A6000 desktop processor is officially released, the light of domestic production! Broadcom announces successful acquisition of VMware
{{o.name}}
{{m.name}}

Guess you like

Origin my.oschina.net/u/5407571/blog/10314437