.NET performance optimization - fast traversal of List collection

Introduction

System.Collections.Generic.List<T>It is a generic collection class in .NET, which can store any type of data. Because of its convenience and rich API, it is widely used in our daily life, and it can be said to be the most used collection class.

In code writing, we often need to traverse a List<T>collection to obtain the elements in it for some business processing. Usually, there are not many elements in the collection, and it is very fast to traverse. But for some big data processing, statistics, real-time computing and other List<T>collections of tens of thousands or hundreds of thousands of data, how to quickly traverse it? This is what I need to share with you today.

traversal mode

Let's take a look at the performance of different traversal methods, construct the following performance benchmark test, and use collection traversal of different orders of magnitude to see the performance of different methods. The code snippet looks like this:

public class ListBenchmark
{
    private List<int> _list = default!;

    // 分别测试10、1千、1万、10万及100万数据时的表现
    [Params(10, 1000, 1_0000, 10_0000, 100_0000)]
    public int Size { get; set; }

    [GlobalSetup]
    public void Setup()
    {
        // 提前创建好数组
        _list = new List<int>(Size);
        for (var i = 0; i < Size; i++)
        {
            _list.Add(i);
        }
    }
}

use foreach statement

foreachIt is the most commonly used way for us to traverse the collection. It is a syntactic sugar that implements the iterator mode, and it is also our benchmark for this time.

[Benchmark(Baseline = true)]  
public void Foreach()  
{  
    foreach (var item in _list)  
    {  
    }
}

Because foreachthe statement is a syntactic sugar, the compiler will eventually use whilethe circular call GetEnumerator()and MoveNext()to achieve the function. The compiled code looks like this:


Among them MoveNext(), the method implementation will ensure that no other thread will modify the collection during the iteration. If a modification occurs, an InvalidOperationExceptionexception will be thrown. In addition, it will have an overflow check to check whether the current index is legal, and the corresponding element needs to be assigned to enumerator.Currentproperties, so in fact its performance is not the best . The code snippet is as follows:

Let’s see how it performs in different collection sizes. The results are as follows:

You can see that in Sizedifferent situations, the time-consuming process Linear growth relationship, even if there is no processing logic to traverse 100w of data, it takes at least 1s.

Using the ForEach method of List

Another common way is to use List<T>.ForEach()a method, this method allows you to pass in a Action<T>delegate, which will call the delegate when traversing the elements Action<T>.

[Benchmark]  
public void List_Foreach()  
{  
    _list.ForEach(_ => { });  
}

It is an List<T>internally implemented method, so it can directly access the private array, and it can avoid overflow checks; theoretically, it should be very fast; but in our scenario, there is only one empty method, and the performance may not be completely internal. The method of linking foreachis good.
The following is ForEachthe source code of the method. You can see that it has no overflow check, but it still retains the concurrent version number check.


In addition, because it is necessary to ForEachpass a delegate to the method, in the calling code, it will check whether the delegate object in the closure generation class is empty every time. If it is not empty, it is as follows: Let's see how it new Action<T>()compares

with foreachkeywords What is the difference in performance. The figure below shows the results of the benchmark test:

From the test results, it is foreach40% slower than using keywords directly. It seems that direct use foreachis a better choice if it is not necessary. So is there any faster way?

for loop through

Going back to our oldest way of foriterating over collections using keywords. It should be the best traversal method at present, because it does not need some redundant code like the previous methods ( but the indexer also has checks to prevent overflow ), and obviously it will not check the version No., so the collection is changed in a multi-threaded environment, and forno exception will be thrown when using it. The test code looks like this:

public void For()  
{  
    for (var i = 0; i < _list.Count; i++)  
    {  
        // 如果是空循环的话,会被编译器优化
        // 我们加一行代码使其不会被编译器优化
        _ = _list[i];  
    }  
}

Let's see how it turns out.


This seems to be the way we expected, using forthe loop directly is 60%foreach faster than traversing the collection that originally took 1 second, now it only takes 400 milliseconds. So is there a faster way?

Using Collections Marshal

After .NET5, the dotnet community implements CollectionsMarshala class in order to improve the performance of collection operations; this class implements the access method for native arrays of collection types (if you have read my [.NET performance optimization - you should be Set the initial size of the collection type] article, you know that the underlying implementation of many data structures is an array ). So it can skip various detections and directly access the original array, which should be the fastest. The code looks like this:

// 为了测试编译器有没有针对foreach span优化
// 同时测试for span
public void Foreach_Span()  
{  
    foreach (var item in CollectionsMarshal.AsSpan(_list))  
    {  
    }
}  
  

public void For_Span()  
{  
    var span = CollectionsMarshal.AsSpan(_list);  
    for (int i = 0; i < span.Length; i++)  
    {  
        _ = span[i];  
    }  
}

You can see that the code generated by the compiler is very efficient.

It is very dangerous to directly access the underlying array. You must know what each line of code is doing and have enough tests .
The benchmark results are as follows:


Wow, using CollectionsMarshal is 79% faster than using foreach , but it should be the reason for JIT optimization, and there is no big difference between using foreachand forkeyword looping .Span

Summarize

Today I talked with you about how to quickly traverse the List collection. In most cases, it is recommended that you use foreachkeywords. It has both overflow checks and multi-threaded version number control, which makes it easier for us to write correct code. .

If high performance and large data volume are required, it is recommended to directly use forand CollectionsMarshal.AsSpanto traverse the collection; of course, you CollectionsMarshal.AsSpanmust pay attention to the usage method.

appendix

Source link of this article: BlogCodes/Fast-Enumerate-List at main InCerryGit/BlogCodes GitHub

Guess you like

Origin blog.csdn.net/weixin_45499836/article/details/126442185