Finally understand~ Graphical SharedFlow caching system

foreword

Kotlin provides us with two tools for creating "hot flows": StateFlow and SharedFlow . StateFlow is often used to replace LiveData as an architectural component, so everyone is relatively familiar with it. In fact, StateFlow is just a specialized form of SharedFlow. SharedFlow has more powerful functions and more usage scenarios, thanks to its built-in caching system. This article uses diagrams to help you understand SharedFlow's caching system more vividly.

Create SharedFlow needs to use MutableSharedFlow()the method , we configure the cache through the three parameters of the method:

fun <T> MutableSharedFlow(
    replay: Int = 0, 
    extraBufferCapacity: Int = 0, 
    onBufferOverflow: BufferOverflow = BufferOverflow.SUSPEND
): MutableSharedFlow<T>

Next, we introduce the impact of these three key parameters on the cache in the form of a sequence diagram. Before the text, let us unify the terminology:

  • Emitter : Producer of Flow data, emitting data from upstream
  • Subscriber : Consumer of Flow data, receiving data downstream

replay

When Subscriber subscribes to SharedFlow, it has the opportunity to receive the data that has been sent before, and replay specifies the amount of data that can be received before subscribe. replay cannot be negative, and the default value is 0, indicating that Subscriber can only receive the data emitted after subscribe:

The figure above shows the case of replay = 0, the Subscriber cannot receive ❶ of the emit before subscribe, but can only receive ❷ and ❸.

When replay = n (n > 0), SharedFlow will enable caching. At this time, BufferSize is n, which means that the latest n data transmitted can be cached and sent to the newly added Subscriber.

The above figure takes n = 1 as an example:

  1. Emitter sends ❶ and is cached by Buffer
  2. Subscriber receives the cached ❶ after subscribing to SharedFlow
  3. The Emitter sends ❷ ❸ successively, and the data in the Buffer cache is updated successively

In the producer-consumer model, sometimes the speed of consumption cannot catch up with the production. At this time, it should be controlled, and the production should be stopped or the data should be discarded. The same is true for SharedFlow. Sometimes the processing speed of the Subscriber is slow, and the data cached by the Buffer cannot be processed in time. When the Buffer is empty, the emit will be suspended by default (onBufferOverflow = SUSPEND)

The above figure shows the suspend scenario of emit when replay = 1:

  1. Emitter sends ❶ and is cached
  2. Subscriber subscribes to SharedFlow, receives replay ❶ and starts processing
  3. The Emitter sends ❷, and the cached data is updated to ❷. Since Subscriber’s processing of ❶ has not yet ended, ❷ has not been consumed in the cache in time
  4. Emitter sends ❸, because cached ❷ has not been consumed by Subscriber, emit hangs up
  5. Subscriber starts consumption ❷, Buffer caches ❸, Emitter can continue to emit new data

Note that SharedFlow can have multiple Subscribers as a multicast, so in the above example, the time when ❷ is consumed depends on the last Subscriber that starts processing.

extraBufferCapacity

The extra in extraBufferCapacity means that besides replay-cache, Buffer can also add additional cache.

若 replay = n, extraBufferCapacity = m,则 BufferSize = m + n

extraBufferCapacity defaults to 0, setting extraBufferCapacity helps to improve the throughput of Emitter

On the basis of the above figure, we then set extraBufferCapacity = 1, the effect is as follows:

BufferSize = 1 + 1 = 2 in the above figure:

  1. Emitter sends ❶ and gets processed by Subscriber1, ❶ is cached as a data of replay,
  2. Emitter sends ❷, and the replay-cache data in Buffer is updated to ❷
  3. Emitter sends ❸, Buffer stores replay data ❷ as extra and stores ❸
  4. Emitter sends ❹, at this time Buffer has no free space, and emit hangs
  5. Subscriber2 subscribes to SharedFlow. Although there are two data ❷ and ❸ in the Buffer at this time, since replay = 1, Subscriber2 can only receive the latest data ❸
  6. After Subscriber1 finishes processing ❶, it will process the next data in Buffer in sequence and start to consume ❷
  7. For SharedFlow, there is no Subscriber that does not consume ❷, ❷ removes the cache, and the emit of ❹ continues and enters the cache. At this time, Buffer has two data ❸ ❹,
  8. Subscriber1 finished processing ❷ and started to consume ❸
  9. There is no Subscriber without consumption ❸, ❸ remove the cache.

onBufferOverflow

In the previous example, when the Buffer is full, the emit will be suspended, which is based on the premise that onBufferOverflow is SUSPEND. onBufferOverflow is used to specify the strategy when the cache is removed. In addition to the default SUSPEND, there are two data discarding strategies:

  • DROP_LATEST : discard the latest data
  • DROP_OLDEST : discard the oldest data

It should be noted that when BufferSize = 0, extraBufferCapacity only supports SUSPEND, and other discarding strategies are invalid. This is easy to understand. Because there is no data in the Buffer, there is no way to discard it. Therefore, the prerequisite for starting the discarding strategy is that the Buffer has at least one buffer and the data is filled.

The image above shows the effect of DROP_LATEST. Suppose replay = 2, extra = 0

  1. When the Emitter sends ❸, since ❶ has been consumed, the Buffer data changes from ❶❷ to ❷❸
  2. When the Emitter sends ❹, since ❷ has not been consumed yet, the Buffer is full, and ❹ is directly discarded
  3. When the Emitter sends ❺, since ❷ has been spent, the cache can be removed, and the Buffer data becomes ❸❺

The figure above shows the effect of DROP_OLDEST. Compared with DROP_LATEST, it is very obvious that the latest two data will always be stored in the cache, but the older data may be removed from the Buffer whether it is consumed or not, so Subscriber can consume the current The latest data, but it is possible to miss the intermediate data, for example, ❷ is missing in the figure

Note: When extraBufferCapacity is set to SUSPEND, it can ensure that the Subscriber consumes all the data without leakage, but it will affect the speed of the Emitter; when it is set to DROP_XXX, it can ensure that the emit call returns immediately, but the Subscriber may miss some data.

If we don't want emit to hang, in addition to setting DROP_XXX, there is another way to call tryEmit, which is a non-suspend version of emit

abstract suspend override fun emit(value: T)

abstract fun tryEmit(value: T): Boolean

tryEmit returns a boolean value, you can judge the return value in this way, when using emit will hang, using tryEmit will return false, otherwise it is true. This means that the premise of tryEmit returning false is that extraBufferCapacity must be set to SUSPEND, and the free position in the Buffer is 0. At this time, the effect of using tryEmit is equivalent to DROP_LATEST.

SharedFlow Buffer

The three parameters of MutableSharedFlow introduced above are essentially working around the Buffer of SharedFlow. So what is the specific structure of this Buffer?

The above figure is a comment about Buffer in the SharedFlow source code. This figure vividly tells us that Buffer is a linear data structure (that is, an ordinary array Array<Any?>), but this figure cannot intuitively reflect the operating mechanism of Buffer. Let's take a look at the specific update process of Buffer at runtime through an example:

val sharedFlow = MutableSharedFlow<Int>(
    replay = 2, 
    extraBufferCapacity = 2,
    onBufferOverflow = BufferOverflow.SUSPEND
)
var emitValue = 1

fun main() {
    
    
    runBlocking {
    
    
        launch {
    
    
            sharedFlow.onEach {
    
    
                delay(200) // simulate the consume of data
            }.collect()
        }

        repeat(12) {
    
    
            sharedFlow.emit(emitValue)
            emitValue++
            delay(50)
        }
    }
}

The above code is very simple, SharedFlow's BufferSize = 2+2 = 4, the speed of Emitter production is greater than the speed of Subscriber consumption, so the filling and updating of Buffer will occur during the process, and the changes of Buffer are still shown in the following diagram

First look at the timing diagram corresponding to the code:

With the previous introduction, I believe this timing diagram is easy to understand, so I won’t repeat it here. The following will focus on illustrating the memory changes of Buffer. SharedFlow's Buffer is essentially a queue implemented based on Array. Elements are added or deleted from the queue through pointer movement, which avoids the movement of elements in the actual array. There are three key pointers here:

  • head : The head of the queue points to the first valid data of the Buffer, which is the earliest data that enters the cache in time, and the cache will not be removed until the data is consumed by all Subscribers. Therefore head also represents the processing progress of the slowest Subscriber
  • replay : The actual location where the Buffer reserves space for the replay-cache. When a new Subscriber subscription occurs, the data will be processed from this location.
  • end : The position when new data enters the cache, and end also represents the fastest Subscriber processing progress.

If bufferSize represents the number of data stored in the current Buffer, then we know that the three-pointer index conforms to the following relationship:

  • replay <= head + bufferSize
  • end = head + bufferSize

After understanding the meaning of the three pointers, let's look at how the Buffer in the above figure works:

Finally, summarize the characteristics of Buffer:

  • Based on array implementation, when the array space is not enough, perform 2n expansion
  • The position of the element after entering the array remains unchanged, and the starting point of data consumption is determined by moving the pointer
  • After the pointer moves to the end of the array, it will point to the head again, and the array space can be recycled

Guess you like

Origin blog.csdn.net/vitaviva/article/details/127175839