Sequences in Kotlin

Sequences in Kotlin

Sequences are actually a copy of Stream in Java8. From the previous article, you can know that Kotlin defines a lot of APIs for operating collections. Yes, these functions are still suitable for Sequences, and sequence operations are better than collection operations in terms of performance. Moreover, it can be seen from the source code of the previous functional API that they will create a lot of intermediate sets, and each operator will open up memory to store the intermediate data sets, but these are not needed in the sequence.

1. Why Sequences are needed

We generally deal with datasets in Kotlin as collections, and using some functional operator APIs in collections, we rarely convert datasets into sequences and then perform collection operations. This is because the magnitude of the data we are generally exposed to is relatively small, and there is no difference between using sets and sequences. Let us look at an example together, and you will understand the meaning of using sequences.

//不使用Sequences序列,使用普通的集合操作
fun computeRunTime(action: (() -> Unit)?) {
  val startTime = System.currentTimeMillis()
  action?.invoke()
  println("the code run time is ${System.currentTimeMillis() - startTime}")
}

fun main(args: Array<String>) = computeRunTime {
  (0..10000000)
        .map { it + 1 }
        .filter { it % 2 == 0 }
        .count { it < 10 }
        .run {
            println("by using list way, result is : $this")
        }
}

operation result

by using list way, result is : 4
the code run time is 3173
//转化成Sequences序列,使用序列操作
fun computeRunTime(action: (() -> Unit)?) {
    val startTime = System.currentTimeMillis()
    action?.invoke()
    println("the code run time is ${System.currentTimeMillis() - startTime}")
}

fun main(args: Array<String>) = computeRunTime {
    (0..10000000)
        .asSequence()
        .map { it + 1 }
        .filter { it % 2 == 0 }
        .count { it < 10 }
        .run {
            println("by using sequences way, result is : $this")
        }
}

operation result

by using list way, result is : 4
the code run time is 47

Through the same function as above, the difference in running time between using common set operations and converting into sequences is not only a little bit, but also corresponding to the two implementation methods. In the case of relatively large data sets, the performance difference is also very large. big. So we should know why we need to use Sequences.

2. What is Sequences

Sequence operations are also called lazy set operations , and the power of Sequences sequence interface lies in the implementation of its operations. The evaluation of the elements in the sequence is lazy, so the sequence can be used more efficiently to perform chain operations (mapping, filtering, transformation, etc.) on the elements in the data set, instead of requiring every data operation like a normal set. It is necessary to open up new memory to store intermediate results, but in fact, most data collection operations require attention to the final result rather than the intermediate process.

Sequence is another option for manipulating data sets in Kotlin. It is very similar to the new Stream in Java8. In Java8, we can convert a data set into a Stream, and then perform data operations on the Stream (mapping, filtering, Transformation, etc.), Sequences can be said to be a tool for optimizing collections in some special scenarios. But it is not used to replace collections, to be precise, it plays a complementary role.

Sequence operations are divided into two categories:

1. Intermediate operation

The intermediate operations of a sequence are always lazy. An intermediate operation returns a sequence (Sequences), and the generated new sequence internally knows how to transform the elements in the original sequence. How to explain that the intermediate operations of the sequence are lazy? Let's take a look at an example:

fun main(args: Array<String>) {
    (0..6)
        .asSequence()
        .map {//map返回是Sequence<T>,故它属于中间操作
            println("map: $it")
            return@map it + 1
        }
        .filter {//filter返回是Sequence<T>,故它属于中间操作
            println("filter: $it")
            return@filter it % 2 == 0
        }
}

operation result

Process finished with exit code 0

The above example has only intermediate operations and no end operations. It is found through the running results that no hints are output in map and filter, which means that the operations of map and filter are delayed. They are only used when obtaining results (that is, end operations). When it is called), the prompt will be output .

2. End operation

The end operation of the sequence will perform all the delay calculations of the original intermediate operation. An end operation returns a result. The returned result can be a collection, a number, or any object transformed from other object collections. The above example plus the end operation:

fun main(args: Array<String>) {
    (0..6)
        .asSequence()
        .map {//map返回是Sequence<T>,故它属于中间操作
            println("map: $it")
            return@map it + 1
        }
        .filter {//filter返回是Sequence<T>,故它属于中间操作
            println("filter: $it")
            return@filter it % 2 == 0
        }
        .count {//count返回是Int,返回的是一个结果,故它属于末端操作
            it < 6
        }
        .run {
            println("result is $this");
        }
}

operation result

map: 0
filter: 1
map: 1
filter: 2
map: 2
filter: 3
map: 3
filter: 4
map: 4
filter: 5
map: 5
filter: 6
map: 6
filter: 7
result is 2

Process finished with exit code 0

Note : It is very simple to judge whether it is an intermediate operation or an end operation. You only need to look at the type of the return value of the operator API function. If it returns a Sequence, then this is an intermediate operation. If it returns a specific result type, such as Int ,Boolean, or any other object, then it is an end operation

3. How to create Sequences

The main methods of creating Sequences are:

1. Use Iterable's extension function asSequence to create

//定义声明
public fun <T> Iterable<T>.asSequence(): Sequence<T> {
    return Sequence { this.iterator() }
}
//调用实现
list.asSequence()

2. Use the generateSequence function to generate a sequence

//定义声明
@kotlin.internal.LowPriorityInOverloadResolution
public fun <T : Any> generateSequence(seed: T?, nextFunction: (T) -> T?): Sequence<T> =
    if (seed == null)
        EmptySequence
    else
        GeneratorSequence({ seed }, nextFunction)

//调用实现,seed是序列的起始值,nextFunction迭代函数操作
val naturalNumbers = generateSequence(0) { it + 1 } //使用迭代器生成一个自然数序列

3. Use the extended function constraintOnce of Sequence to generate a one-time-use sequence.

//定义声明
public fun <T> Sequence<T>.constrainOnce(): Sequence<T> {
    // as? does not work in js
    //return this as? ConstrainedOnceSequence<T> ?: ConstrainedOnceSequence(this)
    return if (this is ConstrainedOnceSequence<T>) this else ConstrainedOnceSequence(this)
}
//调用实现
val naturalNumbers = generateSequence(0) { it + 1 }
val naturalNumbersOnce = naturalNumbers.constrainOnce()

Note : It can only be iterated once, if it exceeds once, an IllegalStateException ("This sequence can be consumed only once.") exception will be thrown.

4. Sequences operation and collection operation performance comparison

Regarding the sequence performance comparison, the comparison is mainly performed in the following scenarios. Through the performance comparison, you will know in which scenario the normal set operation or the sequence operation should be used.

In the case of relatively large data size, the performance of using Sequences will be better than that of ordinary data sets; but in the case of relatively small data size, the performance of using Sequences will be worse than that of ordinary data sets.

5. Principles of Sequences Performance Optimization

Seeing the performance comparison above, I believe you can't wait to know the principle of sequence (Sequences) internal performance optimization at this moment, so let's take a look at the principle of sequence internal. Give an example

fun main(args: Array<String>){
    (0..10)
        .asSequence()
        .map { it + 1 }
        .filter { it % 2 == 0 }
        .count { it < 6 }
        .run {
            println("by using sequence result is $this")
        }
}

1. Basic principle description

Sequence operation  : The basic principle is lazy evaluation, that is to say, when performing intermediate operations, intermediate data results will not be produced, and evaluation will only be performed when the end operation is performed. That is, each data element in 0~10 in the above example is to perform the map operation first, and then immediately perform the filter operation. Then the next element also performs the map operation first, and then immediately performs the filter operation. However, the ordinary set is the principle that the data after all elements are executed after the map is stored, and then all elements are stored in the stored data set after performing the filter operation.

Common collection operations  : For each operation, a new intermediate result will be generated, that is, after the map operation in the above example, the original data set will be looped through once to obtain the latest data set and stored in the new collection, and then the filter operation will be performed. Traverse the data elements in the new set of map last time, and finally get the latest data set and save it in a new set.

2. Principle diagram

//使用序列
fun main(args: Array<String>){
    (0..100)
        .asSequence()
        .map { it + 1 }
        .filter { it % 2 == 0 }
        .find { it > 3 }
}
//使用普通集合
fun main(args: Array<String>){
    (0..100)
        .map { it + 1 }
        .filter { it % 2 == 0 }
        .find { it > 3 }
}

Through the above principle conversion diagram, you will find that the sequence will be operated element by element, some unnecessary operations are removed early before the end operation find is obtained, and after find finds a qualified element, many subsequent element operations can be omitted , So as to achieve the purpose of optimization. As for the common operations of a collection, no matter which element it is, all operations must be passed by default. In fact, some operations are not necessary to be executed before the results are obtained, and before the results are obtained, you can perceive whether the operation meets the conditions. If the conditions are not met, discard them in advance to avoid performance loss caused by unnecessary operations.

6. Complete analysis of Sequences principle source code

//使用序列
fun main(args: Array<String>){
    (0..100)
        .asSequence()
        .map { it + 1 }
        .filter { it % 2 == 0 }
        .find { it > 3 }
}
//使用普通集合
fun main(args: Array<String>){
    (0..100)
        .map { it + 1 }
        .filter { it % 2 == 0 }
        .find { it > 3 }
}

By decompile the source code of the above example, you will find that the ordinary set operation will generate a while loop for each operation, and each time a new set will be created to save the intermediate results. But not using sequences. They will share the data in the same iterator no matter how many intermediate operations they perform. Do you want to know the principle of sharing the data in the same iterator? Please look at the internal source code implementation.

6.1 Decompile the source code using a set of common operations

 public static final void main(@NotNull String[] args) {
      Intrinsics.checkParameterIsNotNull(args, "args");
      byte var1 = 0;
      Iterable $receiver$iv = (Iterable)(new IntRange(var1, 100));
      //创建新的集合存储map后中间结果
      Collection destination$iv$iv = (Collection)(new ArrayList(CollectionsKt.collectionSizeOrDefault($receiver$iv, 10)));
      Iterator var4 = $receiver$iv.iterator();

      int it;
      //对应map操作符生成一个while循环
      while(var4.hasNext()) {
         it = ((IntIterator)var4).nextInt();
         Integer var11 = it + 1;
         //将map变换的元素加入到新集合中
         destination$iv$iv.add(var11);
      }

      $receiver$iv = (Iterable)((List)destination$iv$iv);
      //创建新的集合存储filter后中间结果
      destination$iv$iv = (Collection)(new ArrayList());
      var4 = $receiver$iv.iterator();//拿到map后新集合中的迭代器
      //对应filter操作符生成一个while循环
      while(var4.hasNext()) {
         Object element$iv$iv = var4.next();
         int it = ((Number)element$iv$iv).intValue();
         if (it % 2 == 0) {
          //将filter过滤的元素加入到新集合中
            destination$iv$iv.add(element$iv$iv);
         }
      }

      $receiver$iv = (Iterable)((List)destination$iv$iv);
      Iterator var13 = $receiver$iv.iterator();//拿到filter后新集合中的迭代器
      
      //对应find操作符生成一个while循环,最后末端操作只需要遍历filter后新集合中的迭代器,取出符合条件数据即可。
      while(var13.hasNext()) {
         Object var14 = var13.next();
         it = ((Number)var14).intValue();
         if (it > 3) {
            break;
         }
      }
   }

6.2 Use Sequences Lazy Operation to Decompile Source Code

1. The entire sequence operation source code

 public static final void main(@NotNull String[] args) {
      Intrinsics.checkParameterIsNotNull(args, "args");
      byte var1 = 0;
      //利用Sequence扩展函数实现了fitler和map中间操作,最后返回一个Sequence对象。
      Sequence var7 = SequencesKt.filter(SequencesKt.map(CollectionsKt.asSequence((Iterable)(new IntRange(var1, 100))), (Function1)null.INSTANCE), (Function1)null.INSTANCE);
      //取出经过中间操作产生的序列中的迭代器,可以发现进行map、filter中间操作共享了同一个迭代器中数据,每次操作都会产生新的迭代器对象,但是数据是和原来传入迭代器中数据共享,最后进行末端操作的时候只需要遍历这个迭代器中符合条件元素即可。
      Iterator var3 = var7.iterator();
      //对应find操作符生成一个while循环,最后末端操作只需要遍历filter后新集合中的迭代器,取出符合条件数据即可。
      while(var3.hasNext()) {
         Object var4 = var3.next();
         int it = ((Number)var4).intValue();
         if (it > 3) {
            break;
         }
      }

   }

2. Extract the key code and continue to go deeper:

SequencesKt.filter(SequencesKt.map(CollectionsKt.asSequence((Iterable)(new IntRange(var1, 100))), (Function1)null.INSTANCE), (Function1)null.INSTANCE);

3. Convert this code into three parts:

//第一部分
val collectionSequence = CollectionsKt.asSequence((Iterable)(new IntRange(var1, 100)))
//第二部分
val mapSequence = SequencesKt.map(collectionSequence, (Function1)null.INSTANCE)
//第三部分
val filterSequence = SequencesKt.filter(mapSequence, (Function1)null.INSTANCE)

4. Explain the first part of the code:

The first part of the decompiled source code is very simple, mainly calling the extension function in Iterable to convert the original data set into a Sequence object

public fun <T> Iterable<T>.asSequence(): Sequence<T> {
    return Sequence { this.iterator() }//传入外部Iterable<T>中的迭代器对象
}

Go deeper

@kotlin.internal.InlineOnly
public inline fun <T> Sequence(crossinline iterator: () -> Iterator<T>): Sequence<T> = object : Sequence<T> {
    override fun iterator(): Iterator<T> = iterator()
}

The iterator object is returned by the iterator method in the collection passed in from the outside, and a Sequence is instantiated by an object expression. The Sequence is an interface, and there is an iterator () inside the abstract function that returns an iterator object, and then iterates the passed in As the internal iterator of the Sequence, the iterator object is equivalent to adding the shell of the Sequence sequence to the iterator. The core iterator is still an iterator object passed in from the outside, which is a bit of a change of concept.

5. Explain the second part of the code

Through the first part, the normal set is successfully converted into a sequence, and then the map operation is performed, which is actually achieved by calling the Sequence extension function map

val mapSequence = SequencesKt.map(collectionSequence, (Function1)null.INSTANCE)

Enter the map extension function

public fun <T, R> Sequence<T>.map(transform: (T) -> R): Sequence<R> {
    return TransformingSequence(this, transform)
}

You will find that a TransformingSequence object is returned internally. The object constructor receives a Sequence type object, and a transform lambda expression, and finally returns a Sequence type object. Let's parse this for the time being, and we will introduce more later.

6. Explain the third part of the code:

Through the second part, after the map operation, the sequence object is returned, and finally the filter operation is performed on this object. The filter is also an extension function of the Sequence, and finally a Sequence object is returned.

val filterSequence = SequencesKt.filter(mapSequence, (Function1)null.INSTANCE)

Enter the filter extension function

public fun <T> Sequence<T>.filter(predicate: (T) -> Boolean): Sequence<T> {
    return FilteringSequence(this, true, predicate)
}

You will find that a FilteringSequence object is returned internally. The object constructor receives a Sequence type object, and a predicate lambda expression, and finally returns a Sequence type object. Let's parse this for the time being, and we will introduce more later.

7. Introduction to the overall structure of Sequences source code

Code structure diagram  : The labels in the figure correspond to each operator class one by one, and they all implement the Sequence interface

First of all, Sequence is an interface with only one abstract function and a function that returns an iterator object, which can be treated as an iterator object shell.

public interface Sequence<out T> {
    /**
     * Returns an [Iterator] that returns the values from the sequence.
     *
     * Throws an exception if the sequence is constrained to be iterated once and `iterator` is invoked the second time.
     */
    public operator fun iterator(): Iterator<T>
}

Sequence core class UML class diagram

Only the class diagrams of a few commonly used operators are drawn here

Note : Through the UML class diagram above, the principle of sharing data in the same iterator is actually realized by using the state mode (object-oriented polymorphism principle) in the Java design pattern, first through Iterable's iterator () The returned iterator object instantiates the Sequence, and then calls different operators externally. These operators correspond to the corresponding extension function. The extension function returns a subclass object that implements the Sequence interface for each different operation, and these sub According to the implementation of different operations, the class changes the implementation of the iterator () abstract function in the interface, and returns a new iterator object, but the iterated data comes from the original iterator.

8. Then the TransformingSequence and FilteringSequence above continue to parse

Through the above in-depth analysis of the overall structure of Sequences, then the subsequent analysis of TransformingSequence and FilteringSequence is very simple. Let’s take TransformingSequence as an example

//实现了Sequence<R>接口,重写了iterator()方法,重写迭代器的实现
internal class TransformingSequence<T, R>
constructor(private val sequence: Sequence<T>, private val transformer: (T) -> R) : Sequence<R> {
    override fun iterator(): Iterator<R> = object : Iterator<R> {//根据传入的迭代器对象中的数据,加以操作变换后,构造出一个新的迭代器对象。
        val iterator = sequence.iterator()//取得传入Sequence中的迭代器对象
        override fun next(): R {
            return transformer(iterator.next())//将原来的迭代器中数据元素做了transformer转化传入,共享同一个迭代器中的数据。
        }

        override fun hasNext(): Boolean {
            return iterator.hasNext()
        }
    }

    internal fun <E> flatten(iterator: (R) -> Iterator<E>): Sequence<E> {
        return FlatteningSequence<T, R, E>(sequence, transformer, iterator)
    }
}

9. Source code analysis summary

The internal realization principle of the sequence is to adopt the state design mode, according to the expansion function of different operators, instantiate the corresponding Sequence subclass object, each subclass object rewrites the iterator () abstract method in the Sequence interface, and the internal implementation is based on the transmission The data elements in the imported iterator object are transformed, filtered, merged, etc., to return a new iterator object. This can explain why the working principle of the sequence is to perform different operations element by element, instead of performing the A operation first for all elements in a normal collection, and then performing the B operation for all elements. This is because the sequence always maintains an iterator inside. When an element is iterated, the operations of A, B, and C need to be executed in sequence. If there is no end operation at this time, the value will be stored in the iterator of C Execute in sequence, wait for the data shared in the original collection to be iterated, or terminate the iteration if certain conditions are not met, and finally take out the data in the C iterator.

7. Summary

At this point, the sequence in Kotlin has been explained. We have analyzed the sequence in Kotlin from multiple perspectives from why the sequence is needed to how to use the sequence and what is the nature of the final sequence. I believe everyone has a deep impression of the Kotlin sequence.

 

Guess you like

Origin blog.csdn.net/PrisonJoker/article/details/114055543