tarjan strongly connected components

First, the definition of strongly connected components

The strongly connected component of a directed graph In a directed graph G, if there is a directed path from vi to vj between two vertices vi, vj (vi>vj), and there is also a directed path from vj to vi, The two vertices are said to be strongly connected. A directed graph G is said to be strongly connected if every two vertices of it are strongly connected. The maximally strongly connected subgraphs of a directed graph are called strongly connected components (strongly connected components) SCC.

The above is a formal definition of a strongly connected graph component of a directed graph from a passage from Wikipedia. In fact, it is not difficult to understand, for example

As shown in the figure above, {a,b,c,d} is a strongly connected component, {e} is a strongly connected component, and {f} is a strongly connected component.

2. SCC solution algorithm

Given a directed graph, how to find its strongly connected components?

There are generally two algorithms:

1. Kosaraju algorithm : This algorithm performs dfs on the graph G and its transpose G(T) respectively. The first dfs on G determines the topological order between the strongly connected components, and the second is based on the topological inversion. ) do dfs so that each time dfs gets a strongly connected component. The algorithm complexity is O(V+E).

2. Tarjan algorithm : The tarjan algorithm is also based on dfs to solve SCC, which is different from Kosaraju's idea of ​​making dfs through topological order to make each SCC "non-interfering", the tarjan algorithm only does dfs for the original image once, and runs each SCC in the process of dfs" intertwined", and record some node information in the dfs process, and identify each SCC through this information. The algorithm complexity is O(V+E).

In contrast, the tarjan algorithm does not need to transpose the graph, and only does dfs once, so the execution efficiency is higher.

This paper mainly sorts out the tarjan strongly connected component algorithm.

Three, dfs related concepts

The tarjan algorithm itself is a dfs algorithm, which uses some properties of dfs, so before expanding the tarjan SCC algorithm, let's sort out some related foundations.

1. Search tree (forest): Do a dfs search on a graph, and the search process forms a search tree.

2. Node access order d[i]: During the dfs process, each node records a value d[i] according to the order in which the nodes are accessed. In addition, the order f[i] of access completion may also be recorded for each node.

3. Classification of edges by dfs:

  Tree Edges : Edges that form naturally during the search process. If the neighbor node j is searched through node i dfs, it is found that j has not been visited before, and j is searched in dfs. At this time, the edge <i, j> forms a searched edge of the search tree species, which is called a tree edge

       Reverse Edge : An edge where descendants point to ancestors during the search. If, when searching its neighbor node j through node i dfs, it is found that j is the ancestor of i in the search tree (j has been searched by dfs, but has not been recursively returned), then the edge <i, j> is the reverse edge.

  Forward Edge : An edge that points to a descendant from an ancestor during the search.

  Crosstab : Edges between different subtrees during search.

  One property: in the process of dfs, when traversing neighbor j through node i, if j has been visited before (not necessarily recursive visit is completed) and d[i]>d[j], then the edge <i, j> is an inverse Towards or crossed edges . This property is used in the tarjan algorithm.

Note: Regarding the knowledge of dfs and the classification of edges by dfs, there is a detailed description in "Introduction to Algorithms", here only the content related to the tarjan algorithm explained in this article is listed.

The following figure is an explanation of the above concepts:

The left side of the figure is the original image, and the right side is a dfs search tree, marking the node d[i] value and the classification of all edges. Depending on the order in which dfs visits nodes, the classification of the search tree, d[i] and edges will also be different, but it does not affect the related properties.

Fourth, tarjan SCC algorithm

Understand some basic concepts, and now look at tarjan's algorithm for solving strongly connected components. The level is limited, and the formal proof of the algorithm will not be involved, only the idea and process of the algorithm will be explained.

Basic idea of ​​algorithm

 The idea of ​​tarjan algorithm is based on the following properties

a) If node i and node j are in the same strongly connected category, then they will have a common ancestor in the search tree

b) The strongly connected components form a subtree in the search tree.

First, let’s understand it intuitively:

For property a) we can see that {a,b,c,d} is an SCC, and in the search tree, any two of them have a common ancestor.

Property b) is a natural inference of property a). It can be seen that the three SCCs {a,b,c,d}{e}{f} all correspond to a subtree in the search tree, which is framed by a dotted line.

In fact, it is not difficult to understand the correctness of the nature. Imagine the process of dfs. Once dfs accesses a node in the SCC, all nodes in the SCC will be accessed later, because they are mutually reachable. In this case, it is certain that any two nodes in an SCC have a common ancestor, because in the worst case, their common ancestor can be the first node in the SCC to be visited. Take the above figure as an example, c and b have a common ancestor b, c and d use the common ancestor a, a as the first node in the SCC visited by dfs, and can be used as the common ancestor of any two nodes. In this case, property a) is correct. As for property b), it is a natural inference of property a). Nodes in the same SSC have a common ancestor (the first node that is visited), not a subtree. Well.

The red node in the above figure marks the root of a subtree, called the scc root, which is the first node in the SCC where it is traversed by dfs.

After understanding the above two properties, you can basically see the basic idea of ​​tarjan algorithm to solve SCC: since each SCC is a subtree of a search tree, find the root of the subtree and "pick off" from the search tree. "A subtree, isn't it an SCC? The question is, how to find the subtree root?

solve

Define the variable:

d[i]: access order of node i, mentioned above;

low[i]: After the node i passes through zero or more tree edges ,  the access order d[k] of the earliest visited node k in the same SCC that can be reached by at most one reverse edge or cross edge .

According to the definition of low[i], its calculation process is as follows:

low[i] = min{low[i],low[j]} : If <i,j> is a tree edge, j can reach that node by definition, and i can also reach it after passing through the tree edge <i,j>

low[i] = min[low[i],d[j]]: If <i,j> is a reverse edge or an intersecting edge, by definition, i can only reach the value of j, and there should only be at most one on the path An opposite or intersecting edge, and ends with that edge.

Note: Forward edges do not affect SCC connectivity The tarjan algorithm does not consider forward edges.

A B around the mouth, right? Let's look at an example:

 

As shown in the figure: d[i]/low[i] is marked on the node. {a,b,c,d} is an SCC. Node b can reach the earliest visited node a in the same SCC through a tree edge and a reverse edge. The value of node d[i] is 1, so low[b] =1; node d passes through 0 tree edges, and one cross edge can reach the earliest visited node c in the same SCC, the d[i] value of node c is d[c]=3, so low[d]=3 .

With the above definition, the tarjan algorithm has a key theorem:

Theorem: d[i]=low[i] <=> node i is a scc root

The strict formal proof of this theorem can be found in the original paper below. In fact, the simple understanding is that only the sss root of the node in the same SCC satisfies d[i]=low[i], why?

Considering the nodes in the same SCC, there are two cases:

1. For the scc root node, it is obvious that d[i]=low[i]

2. For other nodes, it must be low[i]<d[i]. Consider two cases:

  2.1. Node i accesses its own ancestor k through a reverse edge (whether it is a direct reverse edge or through its child nodes), then low[i]=d[k], k is the ancestor of i, so d[ i]<low[i]. As in c or d in the above figure, they can reach ancestor a through reverse edges (including tree edges);

  2.2. Node i reaches the node in the same scc through the cross edge: to reach the node k through the cross edge, there is low[i]=d[k], in the best case k is the scc root, or in one case the k value ssc In any case, since k has been visited before, low[i]=d[k]<d[i]. As shown in the node d in the above figure, it reaches c through the crossing edge.

Through the above two points, it can be proved that in an SCC, only the scc root satisfies d[i] = low[i].

As mentioned above, tarjan's calculation of solving SSC has been converted into the problem of solving scc root, and this theorem gives the solution method of scc root. At this point, the whole process is complete: dfs traverses the original image, recursively calculates low[i ], After the recursive traversal of node i is completed, if d[i]=low[i] is found, an SCC is found. Of course, this also involves the question of how to get an scc according to the scc root after finding the scc root. In the dfs process of the tarjan algorithm, the stack is used to record the visited nodes. After the scc root is found, the node is popped from the top of the stack until the scc root node is encountered, and an scc can be formed. The following algorithm implementation describes this process.

accomplish

1. First, let's take a look at the pseudo-code implementation in tarjan's paper, and personally add relevant comments:

The NUMBER of the original text corresponds to d[i] in this article, and LOWLINK corresponds to low in this article

2. Java implementation:

//node status during dfs

1 1 public enum Status {
2 2     NOT_VISIT,VISITING,VISITED;
3 3 }
View Code

 

//SCC data structure

 1 1 public class SccMeta {
 2  2     private int sccCount;
 3  3     private List<List<Node>> sccList;
 4  4 
 5  5     public SccMeta(){
 6  6         this.sccCount = 0;
 7  7         this.sccList = new ArrayList<>();
 8  8     }
 9  9 
10 10     public int getSccCount(){
11 11         return sccCount;
12 12     }
13 13 
14 14     public List<List<Node>> getSccList(){
15 15         return sccList;
16 16     }
17 17 
18 18     public void addScc(List<Node> scc){
19 19         this.sccList.add(scc);
20 20         this.sccCount++;
21 21     }
22 22 
23 23     @Override
24 24     public String toString(){
25 25         StringBuilder s = new StringBuilder();
26 26         s.append("Strongly Connected Componnet:").append("\n");
27 27         for(int i=0; i< sccCount; i++){
28 28             s.append("scc " + i+":");
29 29             s.append(sccList.get(i).toString());
30 30             s.append("\n");
31 31         }
32 32         return s.toString();
33 33     }
34 34 }
View Code

 

//tarjan scc main function, the above traverses each unvisited node

 1 1 public SccMeta scc() {
 2  2         initScc();
 3  3         SccMeta sccMeta = new SccMeta();
 4  4         int nodeNum = size();
 5  5         for(int i=0; i<nodeNum;i++){
 6  6             if(status[i] == Status.NOT_VISIT){
 7  7                 tarjan(i, sccMeta);
 8  8             }
 9  9         }
10 10         return sccMeta;
11 11     }
View Code

 

//tarjan recursive process

 1 private void tarjan(int i,SccMeta sccMeta){
 2         low[i] = d[i] = timer++;
 3         stack.push(i);
 4         status[i] = Status.VISITING;
 5         List<Integer> neighbors = adjacencyList.get(i);
 6         for(int j : neighbors){
 7             if(status[j] == Status.NOT_VISIT){ // 节点j未访问,边<i,j>是树边
 8                 tarjan(j, sccMeta);
 9                 low[i] = Math.min(low[i],low[j]);
10             }else if (d[i] > d[j]){ // Node j node j has been visited, <i,j> is the reverse edge or cross edge 
11                  if (stack.contains(j)){
 12                      low[ i] = Math.min(low[i],d[j]);
 13                  }
 14              } else {
 15                  // Node j has been visited, and <i,j> is a forward edge, ignore 
16              }
 17          }
 18          status[i] = Status.VISITED;
 19  
20          if (d[i] == low[i]){ // node i is a ssc root 
21              List<Node> scc = new LinkedList<> ();
 22             int k;
23             do{
24                 k = stack.pop();
25                 scc.add(getNode(k));
26             }while(k != i);
27             sccMeta.addScc(scc);
28         }
29     }
View Code

Demonstration of the algorithm execution process

In the figure, different colors are used to represent different nodes visited by a node. Green is the currently visited node, gray is the node being visited (not yet recursively returned), and black is the node that has been recursively visited.

1. Original image

2. dfs accesses c and encounters a reverse edge

 

4. Access to node f is completed, and an SCC {f} is obtained

 

 5. The recursive access to node c is completed, and the recursive fallback to b

6. Access e and encounter a cross edge; after e access is completed, get the second SCC {e}

7. The recursive access to node b is completed

 8. Access d encounters a cross edge; d access is complete

9. The recursive access of a is completed, and the third SCC {a,b,c,d} is obtained

V. Summary

This paper introduces the tarjan algorithm for solving strongly connected components from the definition, algorithm idea, implementation and demonstration. The overall idea and process of the algorithm are relatively clearly described, but the strict formal proof of the algorithm is involved in a limited level. If you are interested, you can refer to the original tarjan paper.

appendix:

1. Tarjan algorithm paper: DEPTH-FIRST SEARCH AND LINEAR GRAPH ALGORITHMS

2. The java implementation of the algorithm in this article is placed on github: https://github.com/Tswaf/algorithm/tree/master/src/graph

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325331858&siteId=291194637