Are you curious about how gradle handles the dependencies between tasks? There are many ways to create tasks, and there are many ways to establish dependencies. How does gradle determine the execution order of the final tasks? Let's explore it below
Author: Near-Earth Asteroid
Link: https://juejin.cn/post/7241492186919239717
First use a picture to show the concept of task
Creation
Task creation
First come to a picture to help understand
The creation of tasks can be mainly divided into two ways
create
register
create
The task will be created immediately register
, but one is registered task provider
(this concept will be explained later), and the task instance is not created immediately at this time. This is also the official task creation method currently recommended by the official plugin. The task creation method in the official plugin has been modified toregister
The task will be tasks.create/regster
added to the build script script or plugins by callingtask container
tasks.create('hello') {
doLast {
println 'greeting'
}
}
tasks.register('hello') {
doLast {
println 'greeting'
}
}
TaskContainer
We know that gradle will create an Project
object associated with each Project, and the Task-related methods we use in the build script will be directed to the Project
object, and Project
the object's processing of the Task is delegated to it TaskContainer
, which can be simply understood as a container for storing Tasks
From the signatures of the two, it can be seen that create
the configureClosure
type of Closure is Closure, butgroovy.lang.Closure
Action. The coexistence of the two is due to the heavy use of groovy in the early days. The former will convert closure into action throughregister
ConfigureUtil.configureUsing(configureClosure)
TaskContainer
It can be simply divided into two parts, one map
, one is that pendingMap
the create
created task is added to the map, register
the registered task provider is placed pendingMap
in the middle, pendingMap
and the task provider in the middle will actively add it when its task is created map
, and remove itself pendingMap
from it
The final one task实例
is created by reflection. If the Task type is not specified, then DefautTask
the type will be generated by default. You can create/register
pass in the constructor parameters at the time, or you can configure action
pass the parameters in the way
lazy loading
The difference between create and register
It can be simply understood that create is eager to create tasks, and register is lazy loading.
Gradle execution has 3 stages, initialization, configuration, and execution. No matter which Task is executed, the configuration stage must exist, and it will be executed at this stage.build script
If it is create
the method, the Task will be created immediately, which actually implies a problem - the created Task may not be run. For example, when we want to run compileJava
this task, we can avoid this problem by creating all the related Tasks build script
in eval
the process. The Task is not created immediately, but created when needed.test
register
Here you may still have doubts. Although create
the Task is created, register
it will be created Task Provider
, and most Tasks may not have additional operations in their constructors. register
What is the advantage?
In fact , register
in comparison create
, not only is the timing of the creation of the Task itself delayed, but it is also reflected in the right configuration action
execution timing. create
After the Task is created, it will be configured immediately. register
The Task registered in the method is only created when it is needed, and it is configured at that time.
The official name is task configuration avoidance , which is used to avoid unnecessary task creation and configuration,
such as using register
alternatives, create
using named
alternatives getByName
, etc.
The ideal task creation time is during the Task Graph calculation , build scan provides visual data to help locate the problem of premature task creation
You can refer to the official document task_configuration_avoidance
Lazy Properties
In addition to the lazy creation of the Task itself, the properties of the Task can also be lazy. The lazy properties of the Task properties mainly solve the problem that when configuring the Task, some properties may not get their values immediately. It may require complex calculations or rely on the results of other Task operations. As the complexity of the build increases, manual maintenance of these dependencies will become complicated. After these properties are lazy, they will not be evaluated immediately, and their values will be evaluated when needed to reduce the cost of building scripts. maintenance cost
Lazy Properties can be configured by 2 types
Provider
Property
The difference is Property
that is mutable and Provider
the value is immutable. Property
In fact, Provider
the subclass register
method returns Task Provider
exactly Provider
the subclass
Property
There are get/set methods to set and get values, Provider
only get can get values
Properties can also Extension
be set via
interface CompileExtension {
Property<String> getClasspath()
}
abstract class Compile extends DefaultTask {
@Input
abstract Property<String> getJdkVersion()
@Input
abstract Property<String> getClasspath()
}
project.extensions.create('compile', CompileExtension)
def a = tasks.register('a', Compile) {
classpath = compile.classpath
jdkVersion = '11'
doLast {
println classpath.get()
println jdkVersion.get()
}
}
compile {
classpath = 'src/main/java'
}
./gradlew a
output
src/main/java
11
Property generics are not available for all types, files
and collections
are rather special. There is a separate Property
that distinguishes between file and directory
RegularFileProperty
DirectoryProperty
ListProperty
SetProperty
MapProperty
If the attribute is used incorrectly, gradle will report an error prompt, for example, if RegularFileProperty
the file directory is set, or the file does not exist, there will be a corresponding error prompt
Property
It must be marked with input/output annotations (such as in the above code @Input
), otherwise an error will be reported, and it is related to task dependencies and task up-to-date checks. The following will introduce inputs/outputsProperty
in dependency processing
Property
There is no need to manually initialize, as can be seen from the above example abstract
, gradle will create it by default when creating a task instance, we only need to consider assignment when using it, and must assign a value during configuration, otherwise an error will be reported, or marked to indicate that @Optional
this Property
is not necessary
For more information, please refer to the official document lazy_configuration
NamedDomainObjectCollection
TaskContainer
The interface has been implemented NamedDomainObjectCollection
. This concept needs to be mentioned. There are many things in gradle.
For example tasks
, it extensions
can NamedDomainObjectCollection
be understood intuitively from the name. Named
Domain
is used for the collection of
ObjectCollection objects in a certain domain
NamedDomainObjectCollection
Implemented the java collection Collection
interface
because of its named attributes, in fact, it can be simply regarded as a simple one Map
, and the actual final logic is indeed handed over to the map .
It also has a namer
method that needs to be rewritten. This function is used to name the added elements
Task Graph
overall process
After build script
the execution, the creation and registration of the Task is completed. All the Tasks are added to the Project. TaskContainer
After that, the directed acyclic graph of all the Tasks to be executed is constructed. This graph is built from the input when we run the gradle command. For example, in the build entry tasks
, ./gradlew build
there can be entry task
multiple
ExecutionPlan
It is a container for storing tasks, and all tasks will be added to it. After entry tasks
being added, it will trigger the exploration of task dependencies, and execute in a loop until all task dependencies are clear
The topological sort obtained after that entry tasks
determines the final execution plan
This contains 2 general tasks
task dependent resolve
Determination of task execution order
The following figure is an example, when ./gradlew D
executing
Take D as the entry task
D depends on C
C depends on B and A
B depends on A
The entire execution process is in A -> B -> C -> D
this order
Task Relationship
Before talking about specific dependency processing, we need to understand how many ways to establish dependency relationships
There are several ways to establish associations between Tasks
task inputs依赖
dependsOn
finalizedBy
mustRunAfter
shoulRunAfter
dependsOn
It is the most common, so I won’t talk about it here, and briefly introduce other methods
Task inputs
property method
abstract class A extends DefaultTask {
@OutputFile
abstract RegularFileProperty getOutputFile()
}
def a = tasks.register('a', A) {
outputFile = layout.buildDirectory.file('build/a')
}
tasks.register('b') {
inputs.property('a.outputFile', a.flatMap { it.outputFile })
doLast {
println inputs.properties['a.outputFile']
}
}
task b
Through property and task a
build dependencies
files method
def a = tasks.register('a') {
outputs.files('build/a')
}
tasks.register('b') {
inputs.files(a)
}
task b
The inputs and task a
outputs establish dependencies
finalizedBy
finalizedBy
As the name implies, the dependent Task will entry task
be executed later, for example
def c = tasks.regsiter('c')
tasks.regsiter('d') {
finalizedBy c
}
Execute ./gradlew d
, will execute first d
, then executec
mustRunAfter/shouldRunAfter
mustRunAfter
Compared shouldRunAfter
with other types, it is weaker. It is actually not dependent, but to set the execution order. The task dependencies introduced by these two methods will not be executed if they are not in the task graph.
def c = tasks.regsiter('c')
tasks.regsiter('d') {
mustRunAfter c
}
For example, to execute ./gradlew d
the command, only execute d
the task, c
not execute ./gradlew d c
the command, it will be executed first c
, and then executedd
mustRunAfter/shouldRunAfter
It is only used to set the priority of task execution, and does not add strong dependencies to the task . shouldRunAfter
Compared with mustRunAfter
weaker ones, the priority of execution may not be fully guaranteed, for example, in parallel mode or when the task has a problem of looping due to it
Each relationship has its own counterpart TaskDependency
, TaskDependency
which is essentially a container for storing dependencies. Calling the corresponding method above is to add elements to the corresponding container, and the order of saving dependencies in the same container is based on the order of their names
The type of dependency is not limited, for example, dependsOn
a string (Task name), create
an instance of a Task, and an instance register
of Task Provider
a Task are all available, that is to say, TaskDependency
the elements stored in this container are very complicated. Next, let’s see how gradle handles resolve
these dependencies
Task Dependency Resolve
ExecutionPlan
ExecutionPlan
It is used to process the entry of the entire Task Graph , and the determination of Task dependencies resolve
and execution topological order is handled by this
First use an overall flow chart to help understand
After entry tasks
being added to, ExecutionPlan
it will trigger the exploration of task dependencies, corresponding DefaultExecutionPlan
todiscoverNodeRelationships
DefaultExecutionPlan
The following code has been deleted and modified, and the general logic is retained here
public void addEntryTasks(Collection<? extends Task> tasks) {
LinkedList<Node> queue = new LinkedList<>(tasks);
discoverNodeRelationships(queue);
}
private void discoverNodeRelationships(LinkedList<Node> queue) {
Set<Node> visiting = new HashSet<>();
while (!queue.isEmpty()) {
Node node = queue.getFirst();
if (visiting.add(node)) {
node.resolveDependencies(dependencyResolver);
for (Node successor : node.getDependencySuccessors()) {
if (!visiting.contains(successor)) {
queue.addFirst(successor);
}
}
} else {
queue.removeFirst();
visiting.remove(node);
for (Node finalizer : node.getFinalizers()) {
finalizers.add(finalizer);
if (!visiting.contains(finalizer)) {
queue.addFirst(finalizer);
}
}
}
}
}
Generally speaking, it is a DFS , and node is the dependency established in the Task RelationshipDependencySuccessors
introduced above . After all the dependencies of node are processed, it will be added to itselfinputs
dependsOn
finalizer task
The dependencies of the Task are stored in multiple TaskDependency
files. The dependency of the Task resolve
is to traverse them TaskDependency
. The code logic entry is in LocalTaskNode
the file, that is, from the entry task
beginning, the entire dependency is processed, as shown in the figure below (deleted)
LocalTaskNode
task
It is a package Node
, Node
there are many types, the algorithm here can be used for all Node
types
Dependence on Task resolve
is TaskDependencyResolver
completed through, and TaskDependencyResolver
the processing of dependencies is finally handed over CachingDirectedGraphWalker
to handle
CachingDirectedGraphWalker
It uses a variant of the tarjan strongly connected graph algorithm , which has 2 functions
findValues
findstart node
reachable fromnodes
findCycles
Find cycles that exist in a graph
Students who are familiar with the strongly connected graph algorithm Tarjan's strongly connected components algorithm - Wikipedia should know that it can be used to find loops in the graph. The concept of strong connectivity itself means that both nodes can communicate with each other, but it is impossible to exist in a directed acyclic graph. Therefore, the algorithm has been modified so that dependent nodes can be
found .
This is currently findValues
used to find the dependent nodes. In fact, the dependencies of the Task and its indirect dependencies are not completely determined here, but start node
the direct dependencies of the Task are determined.
Still take the above example as an example. From D
the starting point, we only find first C
, and then C
only find , B
. A
It is not because of the lack of ability of this class that we cannot search all the dependencies at once. This is because of the method given here. Not sure if this is intentionally designed, but a large number of intermediate nodes will be generated, which will lead to a waste of space with the cacheB
A
graph
node
In addition, it can be seen from the Caching in the name that it has a caching function, that is, the node that has been explored, and the cached result can be directly reused when it is explored next time
CachingDirectedGraphWalker
In the process of searching, it will be called graph.getNodeValues
to get the node,
getNodeValues
There are 3 parameters, node
which are the current node, values
the value corresponding to the node, connectedNodes
and the associated node. For example, if task d
it depends on task c
, then it task c
istask d
connectedNodes
TaskGraphImpl
Implements DirectedGraph
the interface, it is mainly responsible for 2 things
Call
DefaultTaskDependency.visitDependencies
to resolve task dependenciescall
WorkDependencyResolver
will beTask
converted toLocalTaskNode
The current purpose of this step is to clarify the dependency graph of the Task, and does not determine its execution order
Rely on resolve
visitDependencies
The Visitor design pattern is used here. Many objects implement TaskDependencyContainer
interfaces, and most of them are used as containers. The advantage of using the Visitor pattern is that you can add functions without modifying the implementation of these classes. After the Visitor traverses these classes, the logic is processed internally.
There are many types of Task dependencies, here are some main situations
Task
create
Task created in a dependent manner
def a = tasks.create('a')
tasks.register('b') {
dependsOn a
}
Provider
register
Tasks created in a dependent manner register
will return Task Provider
objects
def a = tasks.register('a')
tasks.register('b') {
dependsOn a
}
TaskDependencyContainer
inputs
The imported dependencies
Here you need to understand inputs
the concept first
input analysis
concept
Generally speaking, Task will have inputs
and outputs
, inputs
can have files or attributes, but outputs
the file
Task divides the definition of input and output attributes into four categories
Basic types of Simple values
, strings, etc. implement Serializable typesFilesystem types
File, orProject.file()
objects generated by gradle file operationsDependency resolution results
depend on the results of the ruling, which is essentially a file
Nested combinations of the above types of Nested values
Take task as an example, there can be many compileJava
when compiling java code , for example , you can also specify the maximum memory available during compilation, which is the class fileinputs
source files
target jvm version
outputs
The properties of the custom Task must be marked with annotations, if not marked, an error will be reported at runtime. The properties here refer to the public fields with getter/setter methods of JavaBeans, which are different from the properties mentioned above for lazy configuration
The attribute analysis of Task will analyze the parent class, and some methods such as methods inherited from DefaultTask
or will not be analyzedObject
effect
Annotations on tags have two main functions
inputs/outputs
Related Dependency AnalysisIncremental Build中
up-to-date
check
How to annotate attributes
There are many annotations provided by gradle
Input is used to mark a common type
InputFiles is used to mark a file-related type that is an input
Nested is used to mark a hidden type
OutputFiles is used to mark an output file-related type
Internal is used to mark a property that is used internally
...
@Internal
Wait, it is worth mentioning more about the annotation task_input_output_annotations for specific reference
, such as the maximum memory available at compile time mentioned above. source files
, target jvm version
changes will affect the compilation result of the class file, but the maximum memory available at runtime has no effect on the compilation result. This kind of attribute that has nothing to do with input and output does not affect the result of the Incremental Build cache result. This can be used to mark it .
This also shows that the attributes marked by these annotations have an impact on the cache result.@Input
@InputFiles
For example
class SimpleTask extends DefaultTask {
@Input String inputString
@InputFiles File inputFiles
@OutputFiles Set<File> outputFiles
@Internal Object internal
}
inputs/outputs
There are 2 sources
By annotating properties
inputs
The api to call is added
For example
abstract class Compile extends DefaultTask {
@Input
abstract Property<String> getClasspath()
}
tasks.register('compile', Compile) {
classpath = 'src/main'// 1. 属性注解方式
inputs.property('name', 'compile')// 2. inputs添加属性
inputs.files(project.files('libs'))// 3. inputs添加文件
}
The difference between the two is that the annotation method is more capable. inputs
API is a subset of the annotation method. It can provide some of the capabilities of the annotation method, but other annotations @Input
are similar .@InputFiles
@Internal
AnnotatedProperties
inputs
RegisteredProperties
How does gradle analyze the dependencies created by inputs
The specific execution logic is PropertyWalker
processed by the Visitor mode for the processing of each attribute.
There are 2 sources, so different sources must be analyzed
AnnotatedProperties
To analyze the properties of annotations, first of all, the properties of the annotations must be parsed out. Gradle encapsulates the parsed data into metadata
, saves the names of the properties, the type of the annotations marked, and the validation of the properties at the same time. Method
Each
annotation has a corresponding annotation handler
processing, all handler
are stored in the map, and annotation
obtained through the type. For example, @InputFiles
it will verify that the returned value of the attribute is a file-related type. If it is another type, it will report an error
annotation. After the attribute is parsed, each attribute will be traversed and visited. The processing method of each annotation is also different, so it is also handed over. For inputs, there are mainly two types, one is ordinary attributes, and the other handler
is file attributes, corresponding to the above PropertyVisitor
two methods
RegisteredProperties
The attributes added through inputs
the api method will be added to two containers according to their own conditions, one is used to store file-related types, and the other is used to store other types. During visitor analysis, the two containers will be analyzed separately.
How do different tasks establish associations through these attributes? Let us start with a specific example
def e = tasks.register('e', CustomTask) {
inputs.property('prop1', a.flatMap { it.outputFile })
inputs.files(b)
prop2 = c.flatMap { it.outputFile }
prop3 = d.files
}
Part of the code is intercepted above. There are a total of 5 Tasks, task e
all of which task a,b,c,d
have dependencies. a,b,c
All register
are Tasks, d is create
a Task
prop1
inputs.property
Depends on the way of passing ,task a
whata.flatMap
is returned istask a
the information saved by the Provider,task a
and it is itselfProvider
, gradle can get it by calling the getter of the Task attribute through reflectiontask a
, and use it as a dependencyinputs.files
It directly depends on task b, whichinputs.files(b)
is actuallytask b
a dependency on the outputs file, which isFileCollection
consistent with the processingprop2
Dependstask c
, the processing method is the same asprop1
prop3
Dependstask d
,d.files
what is returned is the information thatFileCollection
was also saved at the time of creationtask d
Because there are many objects that can be added as dependencies, the difference is also great, so gradle uses the visitor mode, and the specific objects process their own dependencies in the visit method, and finally the visitor collects all dependencies
The logic for specific attribute analysis is finally gathered in PropertyVisitor
, and TaskInputs
these dependencies will be added connectedNodes
to allow the search of the graph to continue
Only the inputs are explained here. The actual attribute processing and the logic related to incremental construction will be explained in the later cached articles.
After the dependencies of the Task resolve
are completed, the dependencies will be stored in multiple containers, dependencyNodes
and dependentNodes
the Tasks that this Task depends on and the Tasks that depend on this Task, , mustRunAfter
etc. shouldRunAfter
will also be stored in separate containers
Task dependency caused by Project dependency
inputs
There is also a special case of dependency mode, which is the dependency relationship between projects. Suppose there are 2 projects, libA and libB , and libB depends on libA
libA/build.gradle
plugins {
id 'java'
}
libB/build.gradle
plugins {
id 'java'
}
dependencies {
implementation(project(':libA'))
}
Through dependencies
the method 2, the dependency relationship is established, and the task ./gradlew libB:compileJava
will be executed first during execution libA:jar
. How is this done?
That is to say, because implementation(project(':libA'))
of the relationship, libB:compileJava
there libA:jar
is a dependence on
libA , and will apply
be associated with the task, and will be simply understood as a part of libA Configuration , that the output product of libA is generated by the task ( Configuration is a concept of gradle dependency, which will be explained in detail in the dependency processing later, and it can be simply understood as a bunch of files here)java plugin
java plugin
PublishArtifact
Jar
PublishArtifact
PublishArtifact
PublishArtifact
Jar
CompileJava
Task has an attribute classpath . When libB , classpath generates a dependency on libA. Classpath is a part of task inputs. It corresponds to a bunch of files. Some of them come from the output product. When processing the dependency of Task, libA is found through Configuration , and then it is logical to establish a dependency relationship with the task of libA , which is essentially a dependency relationship through processing.compileJava
project(':libA')
CompileJava
libA
PublishArtifact
Jar
TaskInput
order of execution
The dependency graph of Task is Task Graph , which is a directed acyclic graph (DAG) under normal circumstances . After it is created resolve
, you can start to solve the topological order of Task Graph at this time, and get the final execution order
Topological Order (Topological Order) essentially arranges the vertices of the DAG graph into a linear sequence according to their pointing relationship
If there are loops in the graph, the topological order calculation will fail. At this time, it will be called, CachingDirectedGraphWalker
that is, the tarjan strongly connected graph algorithm will be used to find the loop. The purpose is to report error messages so that users can intuitively see which tasks have interdependence and facilitate modification. By the way, the strong connectivity algorithm optimizes error messages by searching for rings. There are also many usage scenarios in code compilation. For example, if the normal inheritance relationship is regarded as a directed acyclic graph, then in the case of cyclic inheritance, this algorithm can be used to find out which classes have cyclic inheritance.
There are many ways to find the topological order, and the topological order is not unique. There may be multiple solutions. Gradle uses the DFS method, which will be added to the queue as the starting point for searching. At the same time, there may be multiple data structures for traversing the entry nodes
Queue for finding tasks, and for saving the final result. There may be multiple data structures for storing whether the mark has been visited or not. Here is the simplest one to explain the visitingNodes
overall entry nodes
steps
Check if the queue is empty
If it is empty, it ends, and the order of the set that saves the result is the sorted result
If not empty, take the first node in the queue
Whether the node already exists in the result set, if it exists, remove the node in the queue directly, and repeat step 1
Whether the node status is "searching", if the node has been searched , save it in the result set, and remove the node in the queue , repeat step 1, otherwise mark the current node
Node 's direct dependencies node successors
If there is a status of "searching" in the successors of the node , it means that there is a loop in the DAG graph, and an error message will be reported
Add all successors of node to the queue, go back to step 1 to judge whether the queue is empty
The successors here represent all tasks associated with the current node through the several ways of establishing dependencies introduced above
The flow chart is as follows
yes
no
yes
no
yes
no
yes
no
start
Finish
Whether the queue is empty
Whether the first node is already in the result
Whether the first node has been visited
The existence of node succussors has been visited
Add the first node to the result
report error
Mark node status as visit
Add the node's successor to the queue
remove the first node
The general code is as follows
void processNodeQueue() {
while (!queue.isEmpty()) {
final Node node = queue.peekFirst();
if (result.contains(node)) {
queue.removeFirst();
visitingNodes.remove(node);
continue;
}
if (visitingNodes.put(node)) {
ListIterator<Node> insertPoint = queue.listIterator();
for (Node successor : node.getAllSuccessors()) {
if (visitingNodes.containsEntry(successor)) {
onOrderingCycle(successor, node);
}
insertPoint.add(successor);
}
} else {
queue.removeFirst();
visitingNodes.remove(node);
result.add(node);
}
}
}
Taking the dependency relationship shown above as an example, let’s go through the overall process
A lot of details are omitted here, the more important points are as follows
finalizedBy
The imported dependencies will be added to the one immediately after the corresponding Task, for example, a.finalizedBy(b) c.dependsOn(a) will beb
located in the middle of the queuea
,c
which ensures the order of executionIf the Task is
mustRunAfter/shouldRunAfter
added and is not referenced by other strong dependencies, it will not be added to the resultJudgment of ring formation, if it is caused by it,
shouldRunAfter
it will be ignoredentry nodes
It can be multiple. When dealing with multipleentry nodes
, eachentry nodes
will correspond to one to distinguishsegment
different onesnode
reference documents
Follow me for more knowledge or contribution