C#'s yield return mechanism and usage: data collections are generated on demand

   When we write C# code, we often need to deal with large data sets. In the traditional way, we often need to load the entire data set into memory before performing operations. But if the data set is very large, this method will lead to high memory usage, and may even cause the program to crash.

    Mechanisms in C# yield returncan help us solve this problem. By using yield return, we can generate the data set on demand instead of generating the entire data set at one time. This can greatly reduce memory usage and improve program performance.

    In this article, we will deeply discuss yield returnthe mechanism and usage in C# to help you better understand this powerful function and use it flexibly in actual development.

How to use

Above we mentioned yield returnthat the data set is generated on demand, rather than the entire data set is generated at one time. Next, through a simple example, let's take a look at how it works, so as to deepen our understanding of it

foreach (var num in GetInts())
{
    Console.WriteLine("外部遍历了:{0}", num);
}

IEnumerable<int> GetInts()
{
    for (int i = 0; i < 5; i++)
    {
        Console.WriteLine("内部遍历了:{0}", i);
        yield return i;
    }
}

First, in GetIntsthe method, we use yield returnthe keyword to define an iterator. This iterator can generate sequences of integers on demand. Each time through the loop, use yield returnreturns the current integer. Loop through 1 foreachto traverse  GetIntsthe sequence of integers returned by the method. The method will be executed while iterating GetInts, but the entire sequence will not be loaded into memory. Instead, each element in the sequence is generated on demand, when needed. At each iteration, the information corresponding to the integer of the current iteration is output. So the output is

内部遍历了:0
外部遍历了:0
内部遍历了:1
外部遍历了:1
内部遍历了:2
外部遍历了:2
内部遍历了:3
外部遍历了:3
内部遍历了:4
外部遍历了:4

It can be seen that the integer sequence is generated on demand, and the corresponding information will be output every time it is generated. This approach can greatly reduce memory usage and improve program performance. Of course, the method of asynchronous iteration from c# 8the beginning also supports

await foreach (var num in GetIntsAsync())
{
    Console.WriteLine("外部遍历了:{0}", num);
}

async IAsyncEnumerable<int> GetIntsAsync()
{
    for (int i = 0; i < 5; i++)
    {
        await Task.Yield();
        Console.WriteLine("内部遍历了:{0}", i);
        yield return i;
    }
}

The difference from the above is that if we need to use the asynchronous method, we need to return IAsyncEnumerablethe type. The execution result of this method is consistent with the execution result of the above synchronous method, so we will not show it. Our examples above are all based on continuous iteration of the loop. In fact, yield returnthe method used can also be output on demand, which is suitable for flexible iteration. As shown in the example below

foreach (var num in GetInts())
{
    Console.WriteLine("外部遍历了:{0}", num);
}

IEnumerable<int> GetInts()
{
    Console.WriteLine("内部遍历了:0");
    yield return 0;

    Console.WriteLine("内部遍历了:1");
    yield return 1;

    Console.WriteLine("内部遍历了:2");
    yield return 2;
}

foreachEach time the loop calls GetInts()the method, GetInts()the method internally uses yield returnthe keyword to return a result. Each traversal will go to the next one yield return. So the output of the above code is

内部遍历了:0
外部遍历了:0
内部遍历了:1
外部遍历了:1
内部遍历了:2
外部遍历了:2

explore the essence

Above we showed yield returnan example of how to use it, which is a lazy loading mechanism that allows us to process data one by one instead of reading all the data into memory at once. Next, let's explore how the magical operation is realized, so that everyone can have a clearer understanding of the iterative system.

foreach essence

First, let's take a look at foreachwhy it can be traversed, that is, if the object can be foreachtraversed, what conditions must be met by the traversed operation. At this time, we can use the decompilation tool to see what the compiled code looks like. I believe everyone is most familiar with it. The most important thing is List<T>the traversal method of the collection, so let's use List<T>the example to demonstrate

List<int> ints = new List<int>();
foreach(int item in ints)
{
    Console.WriteLine(item);
}

The above code is very simple, and we did not give it any initialization data, which can eliminate interference, let us see the decompilation result more clearly, and eliminate other interference. Its decompiled code looks like this

List<int> list = new List<int>();
List<int>.Enumerator enumerator = list.GetEnumerator();
try
{
    while (enumerator.MoveNext())
    {
        int current = enumerator.Current;
        Console.WriteLine(current);
    }
}
finally
{
    ((IDisposable)enumerator).Dispose();
}

There are many tools that can decompile code. I usually use ILSpy, dnSpy, dotPeekand the online c#decompilation website sharplab.io[1], which dnSpycan also debug decompiled code.

Through the above decompiled code, we can see that foreachit will be compiled into a fixed structure, which is the iterator pattern structure in the design pattern we often mention

Enumerator enumerator = list.GetEnumerator();
while (enumerator.MoveNext())
{
   var current = enumerator.Current;
}

Through this fixed structure, we summarize foreachthe working principle

  • • Accessible foreachobjects need to contain GetEnumerator()methods

  • • Iterator objects contain MoveNext()methods and Currentproperties

  • •  MoveNext()The return booltype of the method, to judge whether it can continue to iterate. CurrentProperty returns the current iteration result.

We can take a look at List<T>how the class iterable source code structure is implemented

public class List<T> : IList<T>, IList, IReadOnlyList<T>
{
    public Enumerator GetEnumerator() => new Enumerator(this);
 
    IEnumerator<T> IEnumerable<T>.GetEnumerator() => Count == 0 ? SZGenericArrayEnumerator<T>.Empty : GetEnumerator();
 
    IEnumerator IEnumerable.GetEnumerator() => ((IEnumerable<T>)this).GetEnumerator();

    public struct Enumerator : IEnumerator<T>, IEnumerator
    {
        public T Current => _current!;
        public bool MoveNext()
        {
        }
    }
}

There are two core interfaces IEnumerable<and they are involved here IEnumerator. The two of them define the ability abstraction that can realize iteration. The implementation method is as follows

public interface IEnumerable
{
    IEnumerator GetEnumerator();
}

public interface IEnumerator
{
    bool MoveNext();
    object Current{ get; }
    void Reset();
}

If the class implements IEnumerablethe interface and implements GetEnumerator()the method, it can be used foreach. The iterable object is IEnumeratora type, which contains a MoveNext()method and Currenta property. The above interface is the way of the original object, and this kind of operation is for objectthe type collection object. Most of our actual development process uses generic collections, and of course there are corresponding implementation methods, as shown below

public interface IEnumerable<out T> : IEnumerable
{
    new IEnumerator<T> GetEnumerator();
}

public interface IEnumerator<out T> : IDisposable, IEnumerator
{
    new T Current{ get; }
}

Being foreachiterable does not mean that the interface must be implemented IEnumerable, it just provides us with an abstract ability that can be iterated. As long as the class contains GetEnumerator()methods and returns an iterator, the iterator contains methods that return boolthe type MoveNext()and get Currentthe properties of the current iterated object.

The essence of yield return

Above we have seen foreachwhat is the essence of being iterable, then yield returnthe return value can be IEnumerable<T>received, which means that there must be something strange. Let’s decompile our above example and look at the decompiled code. In order to facilitate the comparison of the decompiled results, here I Paste the above example again

foreach (var num in GetInts())
{
    Console.WriteLine("外部遍历了:{0}", num);
}

IEnumerable<int> GetInts()
{
    for (int i = 0; i < 5; i++)
    {
        Console.WriteLine("内部遍历了:{0}", i);
        yield return i;
    }
}

We will not show all of its decompilation results here, but only show the core logic

//foeach编译后的结果
IEnumerator<int> enumerator = GetInts().GetEnumerator();
try
{
    while (enumerator.MoveNext())
    {
        int current = enumerator.Current;
        Console.WriteLine("外部遍历了:{0}", current);
    }
}
finally
{
    if (enumerator != null)
    {
        enumerator.Dispose();
    }
}

//GetInts方法编译后的结果
private IEnumerable<int> GetInts()
{
    <GetInts>d__1 <GetInts>d__ = new <GetInts>d__1(-2);
    <GetInts>d__.<>4__this = this;
    return <GetInts>d__;
}

Here we can see GetInts()that the original code in the method is gone, but there is an additional <GetInts>d__1 l type, that is to say, yield returnthe essence is 语法糖. Let's look at <GetInts>d__1the implementation of the class

//生成的类即实现了IEnumerable接口也实现了IEnumerator接口
//说明它既包含了GetEnumerator()方法,也包含MoveNext()方法和Current属性
private sealed class <>GetIntsd__1 : IEnumerable<int>, IEnumerable, IEnumerator<int>, IEnumerator, IDisposable
{
    private int <>1__state;
    //当前迭代结果
    private int <>2__current;
    private int <>l__initialThreadId;
    public C <>4__this;
    private int <i>5__1;

    //当前迭代到的结果
    int IEnumerator<int>.Current
    {
        get{ return <>2__current; }
    }

    //当前迭代到的结果
    object IEnumerator.Current
    {
        get{ return <>2__current; }
    }

    //构造函数包含状态字段,变向说明靠状态机去实现核心流程流转
    public <GetInts>d__1(int <>1__state)
    {
        this.<>1__state = <>1__state;
        <>l__initialThreadId = Environment.CurrentManagedThreadId;
    }

    //核心方法MoveNext
    private bool MoveNext()
    {
        int num = <>1__state;
        if (num != 0)
        {
            if (num != 1)
            {
                return false;
            }
            //控制状态
            <>1__state = -1;
            //自增 也就是代码里循环的i++
            <i>5__1++;
        }
        else
        {
            <>1__state = -1;
            <i>5__1 = 0;
        }
        //循环终止条件 上面循环里的i<5
        if (<i>5__1 < 5)
        {
            Console.WriteLine("内部遍历了:{0}", <i>5__1);
            //把当前迭代结果赋值给Current属性
            <>2__current = <i>5__1;
            <>1__state = 1;
            //说明可以继续迭代
            return true;
        }
        //迭代结束
        return false;
    }

    //IEnumerator的MoveNext方法
    bool IEnumerator.MoveNext()
    {
        return this.MoveNext();
    }

    //IEnumerable的IEnumerable方法
    IEnumerator<int> IEnumerable<int>.IEnumerable()
    {
        //实例化<GetInts>d__1实例
        <GetInts>d__1 <GetInts>d__;
        if (<>1__state == -2 && <>l__initialThreadId == Environment.CurrentManagedThreadId)
        {
            <>1__state = 0;
            <GetInts>d__ = this;
        }
        else
        {
            //给状态机初始化
            <GetInts>d__ = new <GetInts>d__1(0);
            <GetInts>d__.<>4__this = <>4__this;
        }
        //因为<GetInts>d__1实现了IEnumerator接口所以可以直接返回
        return <GetInts>d__;
    }

    IEnumerator IEnumerable.GetEnumerator()
    {
        //因为<GetInts>d__1实现了IEnumerator接口所以可以直接转换
        return ((IEnumerable<int>)this).GetEnumerator();
    }

    void IEnumerator.Reset()
    {
    }

    void IDisposable.Dispose()
    {
    }
}

Through the class it generates, we can see that the class implements both the IEnumerableinterface and IEnumeratorthe interface, indicating that it contains both GetEnumerator()methods, MoveNext()methods and Currentproperties. Using this class can satisfy foeachthe core structure that can be iterated. The code we wrote manually foris included in MoveNext()the method, which contains the state machine code that is defined, and moves the iteration to the next element based on the current state machine code. Let's roughly explain the execution flow of our forcode being translated into MoveNext()the method

  • • It is initialized to 0 during the first iteration <>1__state, representing the first element to be iterated. At this time, Currentthe initial value is 0, and <i>5__1the initial value of the loop control variable is also 0.

  • • Judging whether the termination condition is satisfied, if not, execute the logic in the loop. And change the loader <>1__stateto 1, which means the execution of the first iteration is completed.

  • • The loop control variable <i>5__1continues to increment and change and change the loader <>1__stateto -1, representing continuous iterations. And cyclically execute the custom logic of the loop body.

  • • Return if the iteration condition is not satisfied false, that is, it means that the logic terminates MoveNext()if the iteration condition is not satisfied .while (enumerator.MoveNext())

We also showed another yield returnway above, that is, the same method contains multiple yield returnforms

IEnumerable<int> GetInts()
{
    Console.WriteLine("内部遍历了:0");
    yield return 0;

    Console.WriteLine("内部遍历了:1");
    yield return 1;

    Console.WriteLine("内部遍历了:2");
    yield return 2;
}

The result of the decompilation of the above code is as follows, here we only show MoveNext()the implementation of the core method

private bool MoveNext()
{
    switch (<>1__state)
    {
        default:
            return false;
        case 0:
            <>1__state = -1;
            Console.WriteLine("内部遍历了:0");
            <>2__current = 0;
            <>1__state = 1;
            return true;
        case 1:
            <>1__state = -1;
            Console.WriteLine("内部遍历了:1");
            <>2__current = 1;
            <>1__state = 2;
            return true;
        case 2:
            <>1__state = -1;
            Console.WriteLine("内部遍历了:2");
            <>2__current = 2;
            <>1__state = 3;
            return true;
        case 3:
            <>1__state = -1;
            return false;
    }
}

Through the compiled code, we can see that multiple yield returnforms will be compiled into switch...casea form, and several yield returnforms will be compiled into n+1one case, and the extra one caserepresents MoveNext()the termination condition, which is the return falsecondition. Others casereturn trueto indicate that the iteration can continue.

IAsyncEnumerable interface

Above we showed the synchronous yield returnmethod, c# 8and started to add IAsyncEnumerable<T>interfaces to complete asynchronous iteration, that is, the scenario where the iterator logic contains asynchronous logic. IAsyncEnumerable<T>The implementation code of the interface is as follows

public interface IAsyncEnumerable<out T>
{
    IAsyncEnumerator<T> GetAsyncEnumerator(CancellationToken cancellationToken = default);
}

public interface IAsyncEnumerator<out T> : IAsyncDisposable
{
    ValueTask<bool> MoveNextAsync();
    T Current { get; }
}

Its biggest difference is that the synchronous method IEnumeratorcontains MoveNext()the return method bool, and IAsyncEnumeratorthe interface contains MoveNextAsync()the asynchronous method, which returns ValueTask<bool>the type. So the sample code above

await foreach (var num in GetIntsAsync())
{
    Console.WriteLine("外部遍历了:{0}", num);
}

So awaitalthough it is added to foreachthe above, the actual effect is MoveNextAsync()the method executed in each iteration. It can be roughly understood as the following way of working

IAsyncEnumerator<int> enumerator = list.GetAsyncEnumerator();
while (enumerator.MoveNextAsync().GetAwaiter().GetResult())
{
   var current = enumerator.Current;
}

Of course, the actual compiled code is not like this. We explained in the previous article <Study on the summary of c# asynchronous operation async await state machine [2]> that it will be compiled into an asynchronous state machine, so the combined implementation async awaitis IAsyncStateMachinerelatively simple IAsyncEnumerator<T>. yield returnThe synchronization method is more complicated and contains more codes, but the implementation principle can be compared with the synchronization method, but at the same time to understand the implementation of the asynchronous state machine, here we will not show too much after compilation and implementation, interested students 异步yield returncan Find out for yourself.

foreach-enhanced

c# 9The enhanced function of foreach is added, that is, through the form of extension methods, methods foreachare added to objects that originally have the ability to contain, so that ordinary classes can also be used to iterate without the ability GetEnumerator(). foreachIt is used as follows

Foo foo = new Foo();
foreach (int item in foo)
{
    Console.WriteLine(item);
}

public class Foo
{
    public List<int> Ints { get; set; } = new List<int>();
}

public static class Bar
{
    //给Foo定义扩展方法
    public static IEnumerator<int> GetEnumerator(this Foo foo)
    {
        foreach (int item in foo.Ints)
        {
            yield return item;
        }
    }
}

This function is indeed relatively powerful and satisfies the principle of openness and closure. We can enhance the function of the code without modifying the original code, which can be said to be very practical. Let's take a look at the result of its compilation

Foo foo = new Foo();
IEnumerator<int> enumerator = Bar.GetEnumerator(foo);
try
{
    while (enumerator.MoveNext())
    {
        int current = enumerator.Current;
        Console.WriteLine(current);
    }
}
finally
{
    if (enumerator != null)
    {
        enumerator.Dispose();
    }
}

Here we see that the extension method GetEnumerator()is essentially syntactic sugar, which compiles the extension capability into 扩展类.GetEnumerator(被扩展实例)a method. That is, the original way when we write code, but the compiler generates its calling method for us. Next, let's take a look at GetEnumerator()what the extension method compiles into

public static IEnumerator<int> GetEnumerator(Foo foo)
{
    <GetEnumerator>d__0 <GetEnumerator>d__ = new <GetEnumerator>d__0(0);
    <GetEnumerator>d__.foo = foo;
    return <GetEnumerator>d__;
}

Do you feel familiar when you see this code? Yes, yield return本质it is the same as the syntactic sugar generation method mentioned in the above section. A corresponding class is also generated during compilation. The class here is, let’s take a look at <GetEnumerator>d__0the class structure

private sealed class <GetEnumerator>d__0 : IEnumerator<int>, IEnumerator, IDisposable
{
    private int <>1__state;
    private int <>2__current;
    public Foo foo;
    private List<int>.Enumerator <>s__1;
    private int <item>5__2;

    int IEnumerator<int>.Current
    {
        get{ return <>2__current; }
    }

    object IEnumerator.Current
    {
        get{ return <>2__current; }
    }

    public <GetEnumerator>d__0(int <>1__state)
    {
        this.<>1__state = <>1__state;
    }

    private bool MoveNext()
    {
        try
        {
            int num = <>1__state;
            if (num != 0)
            {
                if (num != 1)
                {
                    return false;
                }
                <>1__state = -3;
            }
            else
            {
                <>1__state = -1;
                //因为示例中的Ints我们使用的是List<T>
                <>s__1 = foo.Ints.GetEnumerator();
                <>1__state = -3;
            }
            //因为上面的扩展方法里使用的是foreach遍历方式
            //这里也被编译成了实际生产方式
            if (<>s__1.MoveNext())
            {
                <item>5__2 = <>s__1.Current;
                <>2__current = <item>5__2;
                <>1__state = 1;
                return true;
            }
            <>m__Finally1();
            <>s__1 = default(List<int>.Enumerator);
            return false;
        }
        catch
        {
            ((IDisposable)this).Dispose();
            throw;
        }
    }

    bool IEnumerator.MoveNext()
    {
        return this.MoveNext();
    }

    void IDisposable.Dispose()
    {
    }

    void IEnumerator.Reset()
    {
    }

    private void <>m__Finally1()
    {
    }
}

Looking at the code generated by the compiler, we can see that yield returnthe generated code structure is the same, but MoveNext()the logic in it depends on the specific logic when we write the code, and different logic generates different codes. Here we will not explain the code it generates, because it is similar to the code logic we explained above.

Summarize

    Through this article we introduce the grammar c#in yield return, and discuss some thoughts brought about by it. Through some simple examples, we have shown yield returnhow to use iterators, and know how iterators can process large amounts of data on demand. At the same time, by analyzing the essence of foreachiteration and yield returnsyntax, we explained their implementation principles and underlying mechanisms. Fortunately, the overall knowledge involved is relatively simple. If you read the relevant implementation code carefully, I believe you will understand the implementation principle behind it, so I won’t go into details here.

    When you encounter challenges and difficulties, please don't give up easily. No matter what you are facing, as long as you are willing to try hard, explore, and pursue, you will be able to overcome difficulties and move towards success. Remember, success does not come overnight, it requires our continuous efforts and persistence. Believe in yourself, believe in your ability, believe in your potential, and you will definitely be able to become a better self.

quote link

[1] sharplab.io:  https://sharplab.io/
[2]  Research on the summary of c# asynchronous operation async await state machine:  https://www.cnblogs.com/wucy/p/17137128.html

https://www.cnblogs.com/wucy/p/17443749.html

Guess you like

Origin blog.csdn.net/qq_42672770/article/details/131674450