[Data Structure and Algorithm] -> Algorithm -> Topological Sort -. How to determine the compilation dependency of code source files?

Topological Sorting (Topological Sorting)

Ⅰ Preface

When we generally write programs, a complete project often contains many code source files. When the compiler compiles the entire project, it needs to compile each source file in turn according to the dependencies. For example, a.java depends on b.java. When compiling, the compiler needs to compile b.java before compiling a.java.

It turns out that when I write C language programs, I often use joint compilation. At that time, we run the programs manually to compile and run on the command line, so if you want joint compilation and there are dependencies, for example, ac uses bc functions , We have to compile bc first, generate an obj file, and then compile b.obj and ac together. If you read my earliest article, you will see it often. Now I feel much more comfortable with Java. The IDE will automatically compile other files that this file depends on, so I don't need to manually compile them one by one according to the dependencies.

The compiler obtains this local dependency by analyzing source files or compilation configuration files (such as Makefile) written by the programmer in advance. So how does the compiler determine a global compilation order through the local dependencies between source files?

Insert picture description here
This will use the algorithm we are going to talk about in this article.

Ⅱ Analysis of Topological Sorting Algorithm

The solution to the problems raised above is related to the "topological sorting algorithm", a classic algorithm of the "graph" data structure. So what is topological sorting? This concept is easy to understand, let's look at an example of topological sorting in life.

We have a certain order when we wear clothes. We can think of this order as a certain dependency between clothes and clothes. For example, you must wear underwear before you can wear outer pants, not the other way around, not everyone can wear superhuman feelings.

Suppose we now have eight pieces of clothes to wear, and we have already understood the pairwise dependency between them. How to arrange a dressing sequence to satisfy all the pairwise dependencies?

This is a topological sorting problem. From this example, you should be able to think that in many cases, the sequence of topological sorting is not unique. You can take a look at the picture below. Both of these sorts satisfy the dressing sequence of these local sequential relations.

Insert picture description here
After understanding this example in life, you should have an idea about the compilation dependency relationship mentioned at the beginning. Like this problem, it can be abstracted into a topological sorting problem.

The principle of topological sorting is very simple, and our focus is on the realization of topological sorting.

We know that the algorithm is built on a specific data structure. For this problem, let's take a look at how to abstract the background of the problem into a specific data structure.

If a is executed before b, that is, b depends on a, then an edge from a to b is constructed between vertex a and b. Moreover, this graph must not only be a directed graph, but also a directed acyclic graph, that is, there can be no circular dependencies like a->b->c->d->a. Because once a ring appears in the graph, topological sorting cannot work. In fact, the topological sort itself is an algorithm based on directed acyclic graphs.

package com.tyz.about_topo.core;

import java.util.LinkedList;

/**
 * 构造有向无环图
 * @author Tong
 */
public class Graph {
    
    
	private int vertex; //顶点数
	private LinkedList<Integer> adj[]; //邻接表

	public Graph() {
    
    
	}

	@SuppressWarnings("unchecked")
	public Graph(int vertex) {
    
    
		this.vertex = vertex;
		this.adj = new LinkedList[this.vertex];
		
		for (int i = 0; i < vertex; i++) {
    
    
			this.adj[i] = new LinkedList<Integer>();
		}
	}
	
	public void addEdge(int start, int end) {
    
     //加有向边
		this.adj[start].add(end);
	}

	public LinkedList<Integer>[] getAdj() {
    
    
		return adj;
	}

	public int getVertex() {
    
    
		return vertex;
	}

	public void setVertex(int vertex) {
    
    
		this.vertex = vertex;
	}

}

The data structure is defined. Now let's look at how to implement topological sorting on this directed acyclic graph.

There are two implementation methods for topological sorting, which are not difficult to understand. They are Kahn algorithm and DFS depth-first search algorithm . Let's take a look at them in turn.

A. Kahn Arithmetic

Kahn algorithm actually uses the idea of greedy algorithm, and the idea is relatively clear.

When defining the data structure, if s needs to be executed before t, add an edge from s to t. Therefore, if the in-degree of a vertex is 0, it means that no vertex must be executed before this vertex, and then this vertex can be executed.

We first find a vertex with an in-degree of 0 from the graph, output it to the result sequence of topological sorting (the corresponding code is to print it out), and delete this vertex from the graph (that is, this The in-degree of the vertex reachable by the vertex is reduced by 1). We repeat the above process until all vertices are output. The final output sequence is the topological sort that satisfies the local dependency.

package com.tyz.about_topo.core;

import java.util.LinkedList;

/**
 * 拓扑排序
 * @author Tong
 */
public class TopoSort {
    
    
	private Graph graph; //有向无环图
	private int vertex; //图的顶点数
	
	public TopoSort(int vertex) {
    
    
		this.vertex = vertex;
		this.graph = new Graph(vertex);
	}
	
	/**
	 * 用 Kahn算法 实现拓扑排序
	 */
	public void topoSortByKahn() {
    
    
		int[] inDegree = new int[this.vertex]; //统计每个顶点的入度
		for (int i = 0; i < this.vertex; ++i) {
    
    
			for (int j = 0; j < this.graph.getAdj()[i].size(); ++j) {
    
    
				int w = this.graph.getAdj()[i].get(j);
				inDegree[w]++;
			}
		}
		LinkedList<Integer> queue = new LinkedList<Integer>();
		for (int i = 0; i < this.vertex; ++i) {
    
    
			if (inDegree[i] == 0) {
    
    
				queue.add(i);
			}
		}
		while (!queue.isEmpty()) {
    
    
			int i = queue.remove();
			System.out.println("->" + i);
			for (int j = 0; j < this.graph.getAdj()[i].size(); ++j) {
    
    
				int k = this.graph.getAdj()[i].get(j);
				inDegree[k]--;
				if (inDegree[k] == 0) {
    
    
					queue.add(k);
				}
			}
		}
	}
	
}

The code is implemented according to the ideas we mentioned above, it is relatively simple, you can make a reference.

B. DFS algorithm

The depth-first search algorithm of the graph has been written in my previous article, and students who are interested or have doubts can jump over and read it.

[Data structure and algorithm] -> algorithm -> depth first search & breadth first search

In fact, topological sorting can also be implemented using a depth-first search algorithm, but the more accurate statement here is not depth-first search, but depth-first traversal, which traverses all vertices in the graph, rather than just searching from one vertex to another. path.

package com.tyz.about_topo.core;

import java.util.LinkedList;

/**
 * 拓扑排序
 * @author Tong
 */
public class TopoSort {
    
    
	private Graph graph; //有向无环图
	private int vertex; //图的顶点数
	
	public TopoSort(int vertex) {
    
    
		this.vertex = vertex;
		this.graph = new Graph(vertex);
	}
	
	/**
	 * 用 DFS算法 实现拓扑排序
	 */
	public void topoSortByDFS() {
    
    
		@SuppressWarnings("unchecked")
		LinkedList<Integer> inverseAdj[] = new LinkedList[this.vertex]; //创建逆邻接表
		for (int i = 0; i < this.vertex; ++i) {
    
    
			for (int j = 0; j < this.graph.getAdj()[i].size(); ++i) {
    
    
				int w = this.graph.getAdj()[i].get(j);
				inverseAdj[w].add(i);
			}
		}
		boolean[] visited = new boolean[this.vertex];
		for (int i = 0; i < this.vertex; ++i) {
    
     //深度优先遍历图
			if (visited[i] == false) {
    
    
				visited[i] = true;
				dfs(i, inverseAdj, visited);
			}
		}
	}
	
	/**
	 * 递归实现深度优先遍历
	 * @param vertex 当前顶点
	 * @param inverseAdj 逆邻接表
	 * @param visited 顶点状态
	 */
	private void dfs(int vertex, LinkedList<Integer> inverseAdj[], boolean[] visited) {
    
    
		for (int i = 0; i < inverseAdj[this.vertex].size(); ++i) {
    
    
			int w = inverseAdj[this.vertex].get(i);
			if (visited[w] == true) {
    
    
				continue;
			}
			visited[i] = true;
			dfs(w, inverseAdj, visited);
		}
		System.out.println("->" + vertex); //输出完逆邻接表中这个顶点所达到的所有顶点后输出自己
	}

This algorithm contains two key parts.

The first part is to construct the inverse adjacency list through the adjacency list . In the adjacency list, the edge s->t means that s is executed before t, that is, t depends on s. In the inverse adjacency list, the edge s->t indicates that s depends on t, and s is executed after t.

The second part is the core of this algorithm, which is to process each vertex recursively . For a vertex, we first output all the vertices it can reach, that is, first output all the vertices it depends on, and then output itself.

You can think about this logic. Depth-first traversal is to find the vertex first, and then output all the vertices it points to. If you encounter a dead end, then go back and go back. But if this is the case, we can’t get all the vertices a vertex depends on first The output comes. But if we reverse the adjacency list, we set a vertex to be executed last, so that we can find all its child vertices through recursive traversal, that is, all the vertices it depends on, output first, and finally output itself. In this way, it can be guaranteed that topological sorting will not go wrong.

C. Algorithm time complexity

Now that we have understood the principle of Kahn algorithm and DFS algorithm for topological sorting, let's take a look at their time complexity.

It can be seen from the Kahn algorithm code that each vertex is visited once, and each edge is also visited once. Therefore, the time complexity of the Kahn algorithm is O(V+E), and V represents the number of vertices. , E represents the number of edges.

The time complexity of the DFS algorithm is analyzed in detail in the graph search algorithm. Each vertex is visited twice and each edge is visited once, so the time complexity is also O(V+E).

It should be noted that the graph here may not be connected, it may be composed of several disconnected subgraphs, so E is not necessarily greater than V, and the size relationship between the two is uncertain. Therefore, in the representation of time complexity, V and E must be taken into account.