.NET performance optimization - using ValueStringBuilder to concatenate strings

foreword

One of the tips I want to share with you this time is used in string splicing scenarios. We often encounter scenarios where there are many short strings that need to be spliced. In this scenario, it is not recommended to use String.Concatoperators +=. .
At present, the most officially recommended solution is to use StringBuilderto build these strings, so is there any faster way with lower memory usage? That is what I want to introduce to you today ValueStringBuilder.

ValueStringBuilder

ValueStringBuilderIt is not a public API, but it is widely used in the basic class library of .NET. Since it is a value type, it will not be allocated on the heap itself, and there will be no GC pressure.
There are two usage methods provided by Microsoft ValueStringBuilder, one is that they already have a piece of memory space for string construction. This means that you can use stack space, heap space or even unmanaged heap space, which is very friendly to GC and can greatly reduce GC pressure in high concurrency situations.

// 构造函数:传入一个Span的Buffer数组
public ValueStringBuilder(Span<char> initialBuffer);

// 使用方式:
// 栈空间
var vsb = new ValueStringBuilder(stackalloc char[512]);
// 普通数租
var vsb = new ValueStringBuilder(new char[512]);
// 使用非托管堆
var length = 512;
var ptr = NativeMemory.Alloc((nuint)(512 * Unsafe.SizeOf<char>()));
var span = new Span<char>(ptr, length);
var vsb = new ValueStringBuilder(span);
.....
NativeMemory.Free(ptr); // 非托管堆用完一定要Free

Another way is to specify a capacity, which will obtain the buffer space from the default ArrayPoolobject charpool. Because the object pool is used, it is also more friendly to GC. Be careful, the objects in the pool must be Remember to return it .

// 传入预计的容量
public ValueStringBuilder(int initialCapacity)  
{  
    // 从对象池中获取缓冲区
    _arrayToReturnToPool = ArrayPool<char>.Shared.Rent(initialCapacity);  
    ......
}

So let's compare the performance of using +=, StringBuilderand using ValueStringBuilderthese methods.

// 一个简单的类
public class SomeClass  
{  
    public int Value1; public int Value2; public float Value3;  
    public double Value4; public string? Value5; public decimal Value6;  
    public DateTime Value7; public TimeOnly Value8; public DateOnly Value9;  
    public int[]? Value10;  
}
// Benchmark类
[MemoryDiagnoser]  
[HtmlExporter]  
[Orderer(SummaryOrderPolicy.FastestToSlowest)]  
public class StringBuilderBenchmark  
{  
    private static readonly SomeClass Data;  
    static StringBuilderBenchmark()  
    {  
        var baseTime = DateTime.Now;  
        Data = new SomeClass  
        {  
            Value1 = 100, Value2 = 200, Value3 = 333,  
            Value4 = 400, Value5 = string.Join('-', Enumerable.Range(0, 10000).Select(i => i.ToString())),  
            Value6 = 655, Value7 = baseTime.AddHours(12),  
            Value8 = TimeOnly.MinValue, Value9 = DateOnly.MaxValue,  
            Value10 = Enumerable.Range(0, 5).ToArray()  
        };  
    }

    // 使用我们熟悉的StringBuilder
    [Benchmark(Baseline = true)]  
    public string StringBuilder()  
    {  
        var data = Data;  
        var sb = new StringBuilder();  
        sb.Append("Value1:"); sb.Append(data.Value1);  
        if (data.Value2 > 10)  
        {  
            sb.Append(" ,Value2:"); sb.Append(data.Value2);  
        }  
        sb.Append(" ,Value3:"); sb.Append(data.Value3);  
        sb.Append(" ,Value4:"); sb.Append(data.Value4);  
        sb.Append(" ,Value5:"); sb.Append(data.Value5);  
        if (data.Value6 > 20)  
        {  
            sb.Append(" ,Value6:"); sb.AppendFormat("{0:F2}", data.Value6);  
        }  
        sb.Append(" ,Value7:"); sb.AppendFormat("{0:yyyy-MM-dd HH:mm:ss}", data.Value7);  
        sb.Append(" ,Value8:"); sb.AppendFormat("{0:HH:mm:ss}", data.Value8);  
        sb.Append(" ,Value9:"); sb.AppendFormat("{0:yyyy-MM-dd}", data.Value9);  
        sb.Append(" ,Value10:");  
        if (data.Value10 is null or {Length: 0}) return sb.ToString();  
        for (int i = 0; i < data.Value10.Length; i++)  
        {  
            sb.Append(data.Value10[i]);  
        }  
  
        return sb.ToString();  
    }

    // StringBuilder使用Capacity
    [Benchmark]  
    public string StringBuilderCapacity()  
    {  
        var data = Data;  
        var sb = new StringBuilder(20480);  
        sb.Append("Value1:"); sb.Append(data.Value1);  
        if (data.Value2 > 10)  
        {  
            sb.Append(" ,Value2:"); sb.Append(data.Value2);  
        }  
        sb.Append(" ,Value3:"); sb.Append(data.Value3);  
        sb.Append(" ,Value4:"); sb.Append(data.Value4);  
        sb.Append(" ,Value5:"); sb.Append(data.Value5);  
        if (data.Value6 > 20)  
        {  
            sb.Append(" ,Value6:"); sb.AppendFormat("{0:F2}", data.Value6);  
        }  
        sb.Append(" ,Value7:"); sb.AppendFormat("{0:yyyy-MM-dd HH:mm:ss}", data.Value7);  
        sb.Append(" ,Value8:"); sb.AppendFormat("{0:HH:mm:ss}", data.Value8);  
        sb.Append(" ,Value9:"); sb.AppendFormat("{0:yyyy-MM-dd}", data.Value9);  
        sb.Append(" ,Value10:");  
        if (data.Value10 is null or {Length: 0}) return sb.ToString();  
        for (int i = 0; i < data.Value10.Length; i++)  
        {  
            sb.Append(data.Value10[i]);  
        }  
  
        return sb.ToString();  
    }  

    // 直接使用+=拼接字符串
    [Benchmark]  
    public string StringConcat()  
    {  
        var str = "";  
        var data = Data;  
        str += ("Value1:"); str += (data.Value1);  
        if (data.Value2 > 10)  
        {  
            str += " ,Value2:"; str += data.Value2;  
        }  
        str += " ,Value3:"; str += (data.Value3);  
        str += " ,Value4:"; str += (data.Value4);  
        str += " ,Value5:"; str += (data.Value5);  
        if (data.Value6 > 20)  
        {  
            str += " ,Value6:"; str += data.Value6.ToString("F2");  
        }  
        str += " ,Value7:"; str += data.Value7.ToString("yyyy-MM-dd HH:mm:ss");  
        str += " ,Value8:"; str += data.Value8.ToString("HH:mm:ss");  
        str += " ,Value9:"; str += data.Value9.ToString("yyyy-MM-dd");  
        str += " ,Value10:";  
        if (data.Value10 is not null && data.Value10.Length > 0)  
        {  
            for (int i = 0; i < data.Value10.Length; i++)  
            {  
                str += (data.Value10[i]);  
            }     
        }  
  
        return str;  
    }  
  
    // 使用栈上分配的ValueStringBuilder
    [Benchmark]  
    public string ValueStringBuilderOnStack()  
    {  
        var data = Data;  
        Span<char> buffer = stackalloc char[20480];  
        var sb = new ValueStringBuilder(buffer);  
        sb.Append("Value1:"); sb.AppendSpanFormattable(data.Value1);  
        if (data.Value2 > 10)  
        {  
            sb.Append(" ,Value2:"); sb.AppendSpanFormattable(data.Value2);  
        }  
        sb.Append(" ,Value3:"); sb.AppendSpanFormattable(data.Value3);  
        sb.Append(" ,Value4:"); sb.AppendSpanFormattable(data.Value4);  
        sb.Append(" ,Value5:"); sb.Append(data.Value5);  
        if (data.Value6 > 20)  
        {  
            sb.Append(" ,Value6:"); sb.AppendSpanFormattable(data.Value6, "F2");  
        }  
        sb.Append(" ,Value7:"); sb.AppendSpanFormattable(data.Value7, "yyyy-MM-dd HH:mm:ss");  
        sb.Append(" ,Value8:"); sb.AppendSpanFormattable(data.Value8, "HH:mm:ss");  
        sb.Append(" ,Value9:"); sb.AppendSpanFormattable(data.Value9, "yyyy-MM-dd");  
        sb.Append(" ,Value10:");  
        if (data.Value10 is not null && data.Value10.Length > 0)  
        {  
            for (int i = 0; i < data.Value10.Length; i++)  
            {  
                sb.AppendSpanFormattable(data.Value10[i]);  
            }     
        }  
  
        return sb.ToString();  
    }
    // 使用ArrayPool 堆上分配的StringBuilder
    [Benchmark]  
    public string ValueStringBuilderOnHeap()  
    {  
        var data = Data;  
        var sb = new ValueStringBuilder(20480);  
        sb.Append("Value1:"); sb.AppendSpanFormattable(data.Value1);  
        if (data.Value2 > 10)  
        {  
            sb.Append(" ,Value2:"); sb.AppendSpanFormattable(data.Value2);  
        }  
        sb.Append(" ,Value3:"); sb.AppendSpanFormattable(data.Value3);  
        sb.Append(" ,Value4:"); sb.AppendSpanFormattable(data.Value4);  
        sb.Append(" ,Value5:"); sb.Append(data.Value5);  
        if (data.Value6 > 20)  
        {  
            sb.Append(" ,Value6:"); sb.AppendSpanFormattable(data.Value6, "F2");  
        }  
        sb.Append(" ,Value7:"); sb.AppendSpanFormattable(data.Value7, "yyyy-MM-dd HH:mm:ss");  
        sb.Append(" ,Value8:"); sb.AppendSpanFormattable(data.Value8, "HH:mm:ss");  
        sb.Append(" ,Value9:"); sb.AppendSpanFormattable(data.Value9, "yyyy-MM-dd");  
        sb.Append(" ,Value10:");  
        if (data.Value10 is not null && data.Value10.Length > 0)  
        {  
            for (int i = 0; i < data.Value10.Length; i++)  
            {  
                sb.AppendSpanFormattable(data.Value10[i]);  
            }     
        }
  
        return sb.ToString();  
    }
      
}

The result is shown below.


From the above results, we can draw the following conclusions.

  • Use StringConcatis the slowest and is not .
  • Using is 6.5 times faster StringBuilderthan using StringConcatand is the recommended method.
  • Setting the initial capacity is 25% faster StringBuilderthan using it directly , just as I said you should set the initial size for the collection type , setting the initial size is definitely recommended .StringBuilder
  • The allocation on the stack ValueStringBuilderis StringBuilder50% faster than the one with the initial capacity set, and it is 25% faster than the one with the initial capacity set StringBuilder. In addition, its GC times are the lowest.
  • Allocation on the heap is 55% faster ValueStringBuilderthan allocation on StringBuilderthe heap, and its GC times are slightly higher than allocation on the stack.
    From the above conclusions, we can find ValueStringBuilderthat the performance is very good, even if the buffer is allocated on the stack, the performance is StringBuilder25% faster than that.

Source code analysis

ValueStringBuilderThe source code of is not long, we pick a few important methods to share with you, part of the source code is as follows.

// 使用 ref struct 该对象只能在栈上分配
public ref struct ValueStringBuilder
{
    // 如果从ArrayPool里分配buffer 那么需要存储一下
    // 以便在Dispose时归还
    private char[]? _arrayToReturnToPool;
    // 暂存外部传入的buffer
    private Span<char> _chars;
    // 当前字符串长度
    private int _pos;

    // 外部传入buffer
    public ValueStringBuilder(Span<char> initialBuffer)
    {
        // 使用外部传入的buffer就不使用从pool里面读取的了
        _arrayToReturnToPool = null;
        _chars = initialBuffer;
        _pos = 0;
    }

    public ValueStringBuilder(int initialCapacity)
    {
        // 如果外部传入了capacity 那么从ArrayPool里面获取
        _arrayToReturnToPool = ArrayPool<char>.Shared.Rent(initialCapacity);
        _chars = _arrayToReturnToPool;
        _pos = 0;
    }

    // 返回字符串的Length 由于Length可读可写
    // 所以重复使用ValueStringBuilder只需将Length设置为0
    public int Length
    {
        get => _pos;
        set
        {
            Debug.Assert(value >= 0);
            Debug.Assert(value <= _chars.Length);
            _pos = value;
        }
    }

    ......

    [MethodImpl(MethodImplOptions.AggressiveInlining)]
    public void Append(char c)
    {
        // 添加字符非常高效 直接设置到对应Span位置即可
        int pos = _pos;
        if ((uint) pos < (uint) _chars.Length)
        {
            _chars[pos] = c;
            _pos = pos + 1;
        }
        else
        {
            // 如果buffer空间不足,那么会走
            GrowAndAppend(c);
        }
    }

    [MethodImpl(MethodImplOptions.AggressiveInlining)]
    public void Append(string? s)
    {
        if (s == null)
        {
            return;
        }

        // 追加字符串也是一样的高效
        int pos = _pos;
        // 如果字符串长度为1 那么可以直接像追加字符一样
        if (s.Length == 1 && (uint) pos < (uint) _chars .Length)
        {
            _chars[pos] = s[0];
            _pos = pos + 1;
        }
        else
        {
            // 如果是多个字符 那么使用较慢的方法
            AppendSlow(s);
        }
    }

    private void AppendSlow(string s)
    {
        // 追加字符串 空间不够先扩容
        // 然后使用Span复制 相当高效
        int pos = _pos;
        if (pos > _chars.Length - s.Length)
        {
            Grow(s.Length);
        }

        s
#if !NETCOREAPP
                .AsSpan()
#endif
            .CopyTo(_chars.Slice(pos));
        _pos += s.Length;
    }

    // 对于需要格式化的对象特殊处理
    [MethodImpl(MethodImplOptions.AggressiveInlining)]
    public void AppendSpanFormattable<T>(T value, string? format = null, IFormatProvider? provider = null)
        where T : ISpanFormattable
    {
        // ISpanFormattable非常高效
        if (value.TryFormat(_chars.Slice(_pos), out int charsWritten, format, provider))
        {
            _pos += charsWritten;
        }
        else
        {
            Append(value.ToString(format, provider));
        }
    }

    [MethodImpl(MethodImplOptions.NoInlining)]
    private void GrowAndAppend(char c)
    {
        // 单个字符扩容在添加
        Grow(1);
        Append(c);
    }

    // 扩容方法
    [MethodImpl(MethodImplOptions.NoInlining)]
    private void Grow(int additionalCapacityBeyondPos)
    {
        Debug.Assert(additionalCapacityBeyondPos > 0);
        Debug.Assert(_pos > _chars.Length - additionalCapacityBeyondPos,
            "Grow called incorrectly, no resize is needed.");

        // 同样也是2倍扩容,默认从对象池中获取buffer
        char[] poolArray = ArrayPool<char>.Shared.Rent((int) Math.Max((uint) (_pos + additionalCapacityBeyondPos),
            (uint) _chars.Length * 2));

        _chars.Slice(0, _pos).CopyTo(poolArray);

        char[]? toReturn = _arrayToReturnToPool;
        _chars = _arrayToReturnToPool = poolArray;
        if (toReturn != null)
        {
            // 如果原本就是使用的对象池 那么必须归还
            ArrayPool<char>.Shared.Return(toReturn);
        }
    }

    // 
    [MethodImpl(MethodImplOptions.AggressiveInlining)]
    public void Dispose()
    {
        char[]? toReturn = _arrayToReturnToPool;
        this = default; // 为了安全,在释放时置空当前对象
        if (toReturn != null)
        {
            // 一定要记得归还对象池
            ArrayPool<char>.Shared.Return(toReturn);
        }
    }
}

From the source code above, we can summarize ValueStringBuilderseveral features:

  • In comparison StringBuilder, the implementation is very simple.
  • Everything is for high performance, such as various Spanusages, various inline parameters, and the use of object pools, etc.
  • The memory footprint is very low, it is a structure type, and it is ref struct, which means it will not be boxed and will not be allocated on the heap.

Applicable scene

ValueStringBuilderIt is a high-performance string creation method, which can be used in different ways for different scenarios.
1. In the scenario of very high-frequency string concatenation, and the string length is small , you can use stack allocation at this time ValueStringBuilder.
Everyone knows that ASP.NET Core has very good performance now. In the internal library UrlBuilder it depends on, it uses stack allocation, because the memory allocated on the stack will be reclaimed after the current method ends, so it will not cause any GC pressure.


2. Very high-frequency string splicing scenarios, but the length of the string is uncontrollable . At this time, use ArrayPool to specify the capacity ValueStringBuilder. For example, there are many scenarios in the .NET BCL library, such as the ToString implementation of dynamic methods. Although allocation from the pool is not as efficient as allocation on the stack, it can also reduce memory usage and GC pressure.

3. Very frequent string splicing scenarios, but the string length is controllable . At this time , stack allocation and ArrayPool allocation can be used together. For example, in the regular expression parsing class, if the string length is small, the stack space is used. Larger then use ArrayPool.

Scenarios to watch out for

1.async\await Not available in ValueStringBuilder. Everyone knows the reason, because ValueStringBuilderit can only be allocated on the stackref struct , and it will be compiled into a method before and after the split of the state machine, so it is not easy to pass it in the method, but the compiler will also warn.async\awaitawaitValueStringBuilder


2. It cannot be ValueStringBuilderreturned as a return value, because it is allocated on the current stack, it will be released after the method ends, and it will point to an unknown address when it returns. The compiler will also warn about this.

3. If you want to ValueStringBuilderpass to other methods, you must use refpass, otherwise there will be multiple instances of value copying. The compiler doesn't warn, but you have to be very careful.

4.  If allocation on the stack is used, it is safer to control the size of the Buffer within 5KB . As for why this is necessary, I will talk about it later.

Summarize

Today I would like to share with you a high-performance string splicing structure with almost no memory usage ValueStringBuilder. It is recommended to use in most scenarios. But pay great attention to the several scenarios mentioned above . If the conditions are not met, then you can still use efficient StringBuilderstring splicing.

Source link of this articlehttps://github.com/InCerryGit/BlogCode-Use-ValueStringBuilder

Guess you like

Origin blog.csdn.net/weixin_45499836/article/details/126443522