In-depth analysis of Gradle - Task principle (Graph)

50679fa99bddfb02aa84f0f62d05c14c.png

Are you curious about how gradle handles the dependencies between tasks? There are many ways to create tasks, and there are many ways to establish dependencies. How does gradle determine the execution order of the final tasks? Let's explore it below

Author: Near-Earth Asteroid
Link: https://juejin.cn/post/7241492186919239717

First use a picture to show the concept of task

51d3d17689809932b0f9d38619e0f01b.jpeg

Creation

Task creation

First come to a picture to help understand

2b81ff5b1988bd11518056dde0176249.jpeg

The creation of tasks can be mainly divided into two ways

  1. create

  2. register

createThe task will be created immediately
register, but one is registered task provider(this concept will be explained later), and the task instance is not created immediately at this time. This is also the official task creation method currently recommended by the official plugin. The task creation method in the official plugin has been modified toregister

The task will be tasks.create/regsteradded to the build script script or plugins by callingtask container

tasks.create('hello') {
    doLast {
        println 'greeting'
    }
}


tasks.register('hello') {
    doLast {
        println 'greeting'
    }
}

TaskContainer

We know that gradle will create an Projectobject associated with each Project, and the Task-related methods we use in the build script will be directed to the Projectobject, and Projectthe object's processing of the Task is delegated to it TaskContainer, which can be simply understood as a container for storing Tasks

3631b64c4fbac62e840a961fcce93c48.jpeg

From the signatures of the two, it can be seen that createthe configureClosuretype of Closure is Closure, butgroovy.lang.Closure Action. The coexistence of the two is due to the heavy use of groovy in the early days. The former will convert closure into action throughregisterConfigureUtil.configureUsing(configureClosure)

TaskContainerIt can be simply divided into two parts, one map, one is that pendingMapthe createcreated task is added to the map, registerthe registered task provider is placed pendingMapin the middle, pendingMapand the task provider in the middle will actively add it when its task is created map, and remove itself pendingMapfrom it

The final one task实例is created by reflection. If the Task type is not specified, then DefautTaskthe type will be generated by default. You can create/registerpass in the constructor parameters at the time, or you can configure actionpass the parameters in the way

lazy loading

The difference between create and register

It can be simply understood that create is eager to create tasks, and register is lazy loading.
Gradle execution has 3 stages, initialization, configuration, and execution. No matter which Task is executed, the configuration stage must exist, and it will be executed at this stage.build script


If it is createthe method, the Task will be created immediately, which actually implies a problem - the created Task may not be run. For example, when we want to run compileJavathis task, we can avoid this problem by creating all the related Tasks build scriptin evalthe process. The Task is not created immediately, but created when needed.test
register

Here you may still have doubts. Although createthe Task is created, registerit will be created Task Provider, and most Tasks may not have additional operations in their constructors. registerWhat is the advantage?

In fact , registerin comparison create, not only is the timing of the creation of the Task itself delayed, but it is also reflected in the right configuration actionexecution timing. createAfter the Task is created, it will be configured immediately. registerThe Task registered in the method is only created when it is needed, and it is configured at that time.

The official name is task configuration avoidance , which is used to avoid unnecessary task creation and configuration,
such as using registeralternatives, create
using namedalternatives getByName, etc.

The ideal task creation time is during the Task Graph calculation , build scan provides visual data to help locate the problem of premature task creation

You can refer to the official document task_configuration_avoidance

Lazy Properties

In addition to the lazy creation of the Task itself, the properties of the Task can also be lazy. The lazy properties of the Task properties mainly solve the problem that when configuring the Task, some properties may not get their values ​​immediately. It may require complex calculations or rely on the results of other Task operations. As the complexity of the build increases, manual maintenance of these dependencies will become complicated. After these properties are lazy, they will not be evaluated immediately, and their values ​​will be evaluated when needed to reduce the cost of building scripts. maintenance cost

Lazy Properties can be configured by 2 types

  • Provider

  • Property

The difference is Propertythat is mutable and Providerthe value is immutable. PropertyIn fact, Providerthe subclass
registermethod returns Task Providerexactly Providerthe subclass

PropertyThere are get/set methods to set and get values,
Provideronly get can get values

Properties can also Extensionbe set via

interface CompileExtension {
    Property<String> getClasspath()
}


abstract class Compile extends DefaultTask {  
    @Input  
    abstract Property<String> getJdkVersion()
    @Input  
    abstract Property<String> getClasspath()
}  


project.extensions.create('compile', CompileExtension)
def a = tasks.register('a', Compile) {  
    classpath = compile.classpath
    jdkVersion = '11'
    doLast {  
        println classpath.get()
        println jdkVersion.get()
    }
}


compile {  
    classpath = 'src/main/java'  
}

./gradlew a
output
src/main/java
11

Property generics are not available for all types, filesand collectionsare rather special. There is a separate Property
that distinguishes between file and directory

RegularFileProperty
DirectoryProperty

ListProperty
SetProperty
MapProperty

If the attribute is used incorrectly, gradle will report an error prompt, for example, if RegularFilePropertythe file directory is set, or the file does not exist, there will be a corresponding error prompt

PropertyIt must be marked with input/output annotations (such as in the above code @Input), otherwise an error will be reported, and it is related to task dependencies and task up-to-date checks. The following will introduce inputs/outputsProperty in dependency processing

PropertyThere is no need to manually initialize, as can be seen from the above example abstract, gradle will create it by default when creating a task instance, we only need to consider assignment when using it, and must assign a value during configuration, otherwise an error will be reported, or marked to indicate that @Optionalthis Propertyis not necessary

For more information, please refer to the official document lazy_configuration

NamedDomainObjectCollection

TaskContainerThe interface has been implemented NamedDomainObjectCollection. This concept needs to be mentioned. There are many things in gradle.
For example tasks, it extensionscan NamedDomainObjectCollection
be understood intuitively from the name. Named
Domain
is used for the collection of
ObjectCollection objects in a certain domain

NamedDomainObjectCollectionImplemented the java collection Collectioninterface
because of its named attributes, in fact, it can be simply regarded as a simple one Map, and the actual final logic is indeed handed over to the map .
It also has a namermethod that needs to be rewritten. This function is used to name the added elements

Task Graph

overall process

After build scriptthe execution, the creation and registration of the Task is completed. All the Tasks are added to the Project. TaskContainerAfter that, the directed acyclic graph of all the Tasks to be executed is constructed. This graph is built from the input when we run the gradle command. For example, in the build entry tasks, ./gradlew buildthere can be entry taskmultiple

ExecutionPlanIt is a container for storing tasks, and all tasks will be added to it. After entry tasksbeing added, it will trigger the exploration of task dependencies, and execute in a loop until all task dependencies are clear

The topological sort obtained after that entry tasksdetermines the final execution plan

This contains 2 general tasks

  1. task dependent resolve

  2. Determination of task execution order

The following figure is an example, when ./gradlew Dexecuting

536b26d7453d773780d0bf5fcdd0024f.jpeg

Take D as the entry task
D depends on C
C depends on B and A
B depends on A

The entire execution process is in A -> B -> C -> Dthis order

Task Relationship

Before talking about specific dependency processing, we need to understand how many ways to establish dependency relationships

There are several ways to establish associations between Tasks

task inputs依赖
dependsOn
finalizedBy
mustRunAfter
shoulRunAfter

dependsOnIt is the most common, so I won’t talk about it here, and briefly introduce other methods

Task inputs

  • property method

abstract class A extends DefaultTask {
    @OutputFile
    abstract RegularFileProperty getOutputFile()
}


def a = tasks.register('a', A) {
  outputFile = layout.buildDirectory.file('build/a')
}


tasks.register('b') {
  inputs.property('a.outputFile', a.flatMap { it.outputFile })
  doLast {
    println inputs.properties['a.outputFile']
  }
}

task bThrough property and task abuild dependencies

  • files method

def a = tasks.register('a') {
  outputs.files('build/a')
}


tasks.register('b') {
  inputs.files(a)
}

task bThe inputs and task aoutputs establish dependencies

finalizedBy

finalizedByAs the name implies, the dependent Task will entry taskbe executed later, for example

def c = tasks.regsiter('c')


tasks.regsiter('d') {
  finalizedBy c
}

Execute ./gradlew d, will execute first d, then executec

mustRunAfter/shouldRunAfter

mustRunAfterCompared shouldRunAfterwith other types, it is weaker. It is actually not dependent, but to set the execution order. The task dependencies introduced by these two methods will not be executed if they are not in the task graph.

def c = tasks.regsiter('c')


tasks.regsiter('d') {
  mustRunAfter c
}

For example, to execute ./gradlew dthe command, only execute dthe task, cnot execute ./gradlew d cthe command, it will be executed first c, and then executedd

mustRunAfter/shouldRunAfterIt is only used to set the priority of task execution, and does not add strong dependencies to the task . shouldRunAfterCompared with mustRunAfterweaker ones, the priority of execution may not be fully guaranteed, for example, in parallel mode or when the task has a problem of looping due to it

Each relationship has its own counterpart TaskDependency, TaskDependencywhich is essentially a container for storing dependencies. Calling the corresponding method above is to add elements to the corresponding container, and the order of saving dependencies in the same container is based on the order of their names

The type of dependency is not limited, for example, dependsOna string (Task name), createan instance of a Task, and an instance registerof Task Providera Task are all available, that is to say, TaskDependencythe elements stored in this container are very complicated. Next, let’s see how gradle handles resolvethese dependencies

Task Dependency Resolve

ExecutionPlan

ExecutionPlanIt is used to process the entry of the entire Task Graph , and the determination of Task dependencies resolveand execution topological order is handled by this

First use an overall flow chart to help understand

3c7d0e43790d2d462bff5fe5c406c1d2.jpeg

After entry tasksbeing added to, ExecutionPlanit will trigger the exploration of task dependencies, corresponding DefaultExecutionPlantodiscoverNodeRelationships

DefaultExecutionPlan
The following code has been deleted and modified, and the general logic is retained here

public void addEntryTasks(Collection<? extends Task> tasks) {
  LinkedList<Node> queue = new LinkedList<>(tasks);
  discoverNodeRelationships(queue);
}


private void discoverNodeRelationships(LinkedList<Node> queue) {
  Set<Node> visiting = new HashSet<>();
  while (!queue.isEmpty()) {
    Node node = queue.getFirst();
    if (visiting.add(node)) {
      node.resolveDependencies(dependencyResolver);
      for (Node successor : node.getDependencySuccessors()) {  
          if (!visiting.contains(successor)) {  
              queue.addFirst(successor);  
          }
      }
    } else {
      queue.removeFirst();  
      visiting.remove(node);
      for (Node finalizer : node.getFinalizers()) {  
          finalizers.add(finalizer);  
          if (!visiting.contains(finalizer)) {  
              queue.addFirst(finalizer);  
          }  
      }
    }
  }
}

Generally speaking, it is a DFS , and node is the dependency established in the Task RelationshipDependencySuccessors introduced above . After all the dependencies of node are processed, it will be added to itselfinputsdependsOnfinalizer task

The dependencies of the Task are stored in multiple TaskDependencyfiles. The dependency of the Task resolveis to traverse them TaskDependency. The code logic entry is in LocalTaskNodethe file, that is, from the entry taskbeginning, the entire dependency is processed, as shown in the figure below (deleted)

LocalTaskNodetaskIt is a package Node, Nodethere are many types, the algorithm here can be used for all Nodetypes

1a5e0364704d5aa65cbd2d5da0c8714a.jpeg

Dependence on Task resolveis TaskDependencyResolvercompleted through, and TaskDependencyResolverthe processing of dependencies is finally handed over CachingDirectedGraphWalkerto handle

CachingDirectedGraphWalker

It uses a variant of the tarjan strongly connected graph algorithm , which has 2 functions

  • findValuesfind start nodereachable fromnodes

  • findCyclesFind cycles that exist in a graph

Students who are familiar with the strongly connected graph algorithm Tarjan's strongly connected components algorithm - Wikipedia should know that it can be used to find loops in the graph. The concept of strong connectivity itself means that both nodes can communicate with each other, but it is impossible to exist in a directed acyclic graph. Therefore, the algorithm has been modified so that dependent nodes can be
found .

This is currently findValuesused to find the dependent nodes. In fact, the dependencies of the Task and its indirect dependencies are not completely determined here, but start nodethe direct dependencies of the Task are determined.
Still take the above example as an example. From Dthe starting point, we only find first C, and then Conly find , B. AIt is not because of the lack of ability of this class that we cannot search all the dependencies at once. This is because of the method given here. Not sure if this is intentionally designed, but a large number of intermediate nodes will be generated, which will lead to a waste of space with the cacheBA
graphnode

In addition, it can be seen from the Caching in the name that it has a caching function, that is, the node that has been explored, and the cached result can be directly reused when it is explored next time

CachingDirectedGraphWalkerIn the process of searching, it will be called graph.getNodeValuesto get the node,

7eb5ee1118fbfdcbfb651d94dac08099.jpeg

getNodeValuesThere are 3 parameters, nodewhich are the current node, valuesthe value corresponding to the node, connectedNodesand the associated node. For example, if task dit depends on task c, then it task cistask dconnectedNodes

TaskGraphImplImplements DirectedGraphthe interface, it is mainly responsible for 2 things

  1. Call DefaultTaskDependency.visitDependenciesto resolve task dependencies

  2. call WorkDependencyResolverwill be Taskconverted toLocalTaskNode

The current purpose of this step is to clarify the dependency graph of the Task, and does not determine its execution order

Rely on resolve

visitDependencies

The Visitor design pattern is used here. Many objects implement TaskDependencyContainerinterfaces, and most of them are used as containers. The advantage of using the Visitor pattern is that you can add functions without modifying the implementation of these classes. After the Visitor traverses these classes, the logic is processed internally.

There are many types of Task dependencies, here are some main situations

  • Task

createTask created in a dependent manner

def a = tasks.create('a')


tasks.register('b') {
  dependsOn a
}
  • Provider

registerTasks created in a dependent manner registerwill return Task Providerobjects

def a = tasks.register('a')


tasks.register('b') {
  dependsOn a
}
  • TaskDependencyContainer

inputsThe imported dependencies

Here you need to understand inputsthe concept first

input analysis

concept

Generally speaking, Task will have inputsand outputs, inputscan have files or attributes, but outputsthe file

Task divides the definition of input and output attributes into four categories

  • Basic types of Simple values
    , strings, etc. implement Serializable types

  • Filesystem types
    File, or Project.file()objects generated by gradle file operations

  • Dependency resolution results
    depend on the results of the ruling, which is essentially a file


  • Nested combinations of the above types of Nested values

Take task as an example, there can be many compileJavawhen compiling java code , for example , you can also specify the maximum memory available during compilation, which is the class fileinputssource filestarget jvm versionoutputs

The properties of the custom Task must be marked with annotations, if not marked, an error will be reported at runtime. The properties here refer to the public fields with getter/setter methods of JavaBeans, which are different from the properties mentioned above for lazy configuration

The attribute analysis of Task will analyze the parent class, and some methods such as methods inherited from DefaultTaskor will not be analyzedObject

effect

Annotations on tags have two main functions

  1. inputs/outputsRelated Dependency Analysis

  2. Incremental Buildup-to-date check

How to annotate attributes

There are many annotations provided by gradle

Input is used to mark a common type
InputFiles is used to mark a file-related type that is an input
Nested is used to mark a hidden type
OutputFiles is used to mark an output file-related type
Internal is used to mark a property that is used internally
...


@InternalWait, it is worth mentioning more about the annotation task_input_output_annotations for specific reference
, such as the maximum memory available at compile time mentioned above. source files, target jvm versionchanges will affect the compilation result of the class file, but the maximum memory available at runtime has no effect on the compilation result. This kind of attribute that has nothing to do with input and output does not affect the result of the Incremental Build cache result. This can be used to mark it .
This also shows that the attributes marked by these annotations have an impact on the cache result.@Input@InputFiles

For example

class SimpleTask extends DefaultTask {  
    @Input String inputString 
    @InputFiles File inputFiles  
    @OutputFiles Set<File> outputFiles   
    @Internal Object internal  
}

inputs/outputsThere are 2 sources

  1. By annotating properties

  2. inputsThe api to call is added

For example

abstract class Compile extends DefaultTask {  
    @Input  
    abstract Property<String> getClasspath()  
}


tasks.register('compile', Compile) {
  classpath = 'src/main'// 1. 属性注解方式
  inputs.property('name', 'compile')// 2. inputs添加属性
  inputs.files(project.files('libs'))// 3. inputs添加文件
}

The difference between the two is that the annotation method is more capable. inputsAPI is a subset of the annotation method. It can provide some of the capabilities of the annotation method, but other annotations @Inputare similar .@InputFiles@Internal

AnnotatedPropertiesinputsRegisteredProperties

How does gradle analyze the dependencies created by inputs

The specific execution logic is PropertyWalkerprocessed by the Visitor mode for the processing of each attribute.

7c9d577b7de9a3f70b362e13aed80cd7.jpeg

There are 2 sources, so different sources must be analyzed

AnnotatedProperties

To analyze the properties of annotations, first of all, the properties of the annotations must be parsed out. Gradle encapsulates the parsed data into metadata, saves the names of the properties, the type of the annotations marked, and the validation of the properties at the same time. MethodEach
annotation has a corresponding annotation handlerprocessing, all handlerare stored in the map, and annotationobtained through the type. For example, @InputFilesit will verify that the returned value of the attribute is a file-related type. If it is another type, it will report an error
annotation. After the attribute is parsed, each attribute will be traversed and visited. The processing method of each annotation is also different, so it is also handed over. For inputs, there are mainly two types, one is ordinary attributes, and the other handleris file attributes, corresponding to the above PropertyVisitortwo methods

RegisteredProperties

The attributes added through inputsthe api method will be added to two containers according to their own conditions, one is used to store file-related types, and the other is used to store other types. During visitor analysis, the two containers will be analyzed separately.

How do different tasks establish associations through these attributes? Let us start with a specific example

def e = tasks.register('e', CustomTask) {  
    inputs.property('prop1', a.flatMap { it.outputFile })  
    inputs.files(b)  
    prop2 = c.flatMap { it.outputFile }  
    prop3 = d.files  
}

Part of the code is intercepted above. There are a total of 5 Tasks, task eall of which task a,b,c,dhave dependencies. a,b,cAll registerare Tasks, d is createa Task

  • prop1inputs.propertyDepends on the way of passing , task awhat a.flatMapis returned is task athe information saved by the Provider, task aand it is itself Provider, gradle can get it by calling the getter of the Task attribute through reflection task a, and use it as a dependency

  • inputs.filesIt directly depends on task b, which inputs.files(b)is actually task ba dependency on the outputs file, which is FileCollectionconsistent with the processing

  • prop2Depends task c, the processing method is the same asprop1

  • prop3Depends task d, d.fileswhat is returned is the information that FileCollectionwas also saved at the time of creationtask d

Because there are many objects that can be added as dependencies, the difference is also great, so gradle uses the visitor mode, and the specific objects process their own dependencies in the visit method, and finally the visitor collects all dependencies

The logic for specific attribute analysis is finally gathered in PropertyVisitor, and TaskInputsthese dependencies will be added connectedNodesto allow the search of the graph to continue

Only the inputs are explained here. The actual attribute processing and the logic related to incremental construction will be explained in the later cached articles.

After the dependencies of the Task resolveare completed, the dependencies will be stored in multiple containers, dependencyNodesand dependentNodesthe Tasks that this Task depends on and the Tasks that depend on this Task, , mustRunAfteretc. shouldRunAfterwill also be stored in separate containers

Task dependency caused by Project dependency

inputsThere is also a special case of dependency mode, which is the dependency relationship between projects. Suppose there are 2 projects, libA and libB , and libB depends on libA

libA/build.gradle

plugins {  
    id 'java'  
}

libB/build.gradle

plugins {  
    id 'java'  
}


dependencies {
  implementation(project(':libA'))
}

Through dependenciesthe method 2, the dependency relationship is established, and the task ./gradlew libB:compileJavawill be executed first during execution libA:jar. How is this done?

That is to say, because implementation(project(':libA'))of the relationship, libB:compileJavathere libA:jaris a dependence on

libA , and will applybe associated with the task, and will be simply understood as a part of libA Configuration , that the output product of libA is generated by the task ( Configuration is a concept of gradle dependency, which will be explained in detail in the dependency processing later, and it can be simply understood as a bunch of files here)java pluginjava pluginPublishArtifactJarPublishArtifact
PublishArtifactPublishArtifactJar

CompileJavaTask has an attribute classpath . When libB , classpath generates a dependency on libA. Classpath is a part of task inputs. It corresponds to a bunch of files. Some of them come from the output product. When processing the dependency of Task, libA is found through Configuration , and then it is logical to establish a dependency relationship with the task of libA , which is essentially a dependency relationship through processing.compileJavaproject(':libA')CompileJavalibA
PublishArtifactJarTaskInput

order of execution

The dependency graph of Task is Task Graph , which is a directed acyclic graph (DAG) under normal circumstances . After it is created resolve, you can start to solve the topological order of Task Graph at this time, and get the final execution order

Topological Order (Topological Order) essentially arranges the vertices of the DAG graph into a linear sequence according to their pointing relationship

If there are loops in the graph, the topological order calculation will fail. At this time, it will be called, CachingDirectedGraphWalkerthat is, the tarjan strongly connected graph algorithm will be used to find the loop. The purpose is to report error messages so that users can intuitively see which tasks have interdependence and facilitate modification. By the way, the strong connectivity algorithm optimizes error messages by searching for rings. There are also many usage scenarios in code compilation. For example, if the normal inheritance relationship is regarded as a directed acyclic graph, then in the case of cyclic inheritance, this algorithm can be used to find out which classes have cyclic inheritance.

There are many ways to find the topological order, and the topological order is not unique. There may be multiple solutions. Gradle uses the DFS method, which will be added to the queue as the starting point for searching. At the same time, there may be multiple data structures for traversing the entry nodesQueue for finding tasks, and for saving the final result. There may be multiple data structures for storing whether the mark has been visited or not. Here is the simplest one to explain the visitingNodesoverall
entry nodessteps

  1. Check if the queue is empty

    1. If it is empty, it ends, and the order of the set that saves the result is the sorted result

    2. If not empty, take the first node in the queue

  2. Whether the node already exists in the result set, if it exists, remove the node in the queue directly, and repeat step 1

  3. Whether the node status is "searching", if the node has been searched , save it in the result set, and remove the node in the queue , repeat step 1, otherwise mark the current node

  4. Node 's direct dependencies node successors

    1. If there is a status of "searching" in the successors of the node , it means that there is a loop in the DAG graph, and an error message will be reported

    2. Add all successors of node to the queue, go back to step 1 to judge whether the queue is empty

The successors here represent all tasks associated with the current node through the several ways of establishing dependencies introduced above

The flow chart is as follows

yes

no

yes

no

yes

no

yes

no

start

Finish

Whether the queue is empty

Whether the first node is already in the result

Whether the first node has been visited

The existence of node succussors has been visited

Add the first node to the result

report error

Mark node status as visit

Add the node's successor to the queue

remove the first node

The general code is as follows

void processNodeQueue() {  
    while (!queue.isEmpty()) {  
        final Node node = queue.peekFirst();  
  
        if (result.contains(node)) {  
            queue.removeFirst();  
            visitingNodes.remove(node);  
            continue;
        }  
  
        if (visitingNodes.put(node)) {  
            ListIterator<Node> insertPoint = queue.listIterator();  
            for (Node successor : node.getAllSuccessors()) {  
                if (visitingNodes.containsEntry(successor)) {
                    onOrderingCycle(successor, node);  
                }  
                insertPoint.add(successor);  
            }  
        } else {  
            queue.removeFirst();  
            visitingNodes.remove(node);  
            result.add(node);  
        }  
    }  
}

Taking the dependency relationship shown above as an example, let’s go through the overall process

ca3c738198701da41154bf898bb8b74c.jpeg

A lot of details are omitted here, the more important points are as follows

  1. finalizedByThe imported dependencies will be added to the one immediately after the corresponding Task, for example, a.finalizedBy(b) c.dependsOn(a) will be blocated in the middle of the queue a, cwhich ensures the order of execution

  2. If the Task is mustRunAfter/shouldRunAfteradded and is not referenced by other strong dependencies, it will not be added to the result

  3. Judgment of ring formation, if it is caused by it, shouldRunAfterit will be ignored

  4. entry nodesIt can be multiple. When dealing with multiple entry nodes, each entry nodeswill correspond to one to distinguish segmentdifferent onesnode

reference documents

Follow me for more knowledge or contribution

3be2180ed1bb13e3ca45714d013fb5eb.jpeg

3ac9a645aa5a7f9bc36d84340def4008.jpeg

Guess you like

Origin blog.csdn.net/c6E5UlI1N/article/details/131136664