Detailed explanation and application of DFS and BFS algorithms

DFS and BFS

foreword

DFS and BFS are two basic methods of searching. Search is a concrete realization of a brute force algorithm, which is to list all possible situations, and then check them one by one to find out the answer.

The same point between DFS and BFS: both can find the exit, and both need to violently search all intersections and roads.

Difference: using BFS can easily find the shortest path, but using DFS is more difficult; using DFS can search all paths from the entrance to the exit, but using BFS cannot; DFS programming is simpler than BFS.

DFS

basic idea

DFS programming is generally implemented using recursion. When writing recursive code, it is generally necessary to use memory search for optimization.

The idea of ​​the recursive algorithm is to gradually reduce the big problem until it becomes the smallest problem of the same kind, and when it reaches the minimum, then answer larger problems one by one until the final problem is solved. So recursion has two processes: recursive forward and recursive return (backtracking).

code template

ans;  //答案,用全局变量表示
void dfs(层数,其他参数){
   if(出局判断){    //到达最底层,或者满足条件退出
       更新答案;    //答案一般用全局变量表示
       return;    //返回到上一层
   }
   (剪枝)         //在进一步DFS之前剪枝
   for(枚举下一层可能的情况)   //对每一种情况继续DFS
      if(used[i]==0){       //如果状态i没有用过,就可以进入下一层
          used[i]=1;        //标记状态i,表示已经用过,在更底层不能再使用
          dfs(层数+1,其他参数); //下一层
          used[i]=0;         //恢复状态,回溯时,不影响上一层对这个状态的使用
      }
    return;      //返回到上一层
}

When performing DFS, "save the scene" and "restore the scene" are very important, which is related to whether the searched path is correct or repeated.

  • The role of "preserve the scene" is to prohibit reuse. When searching for a path from the start point to the end point, the points passed on this path cannot be passed repeatedly, otherwise it will go around in circles, so it is necessary to "save the scene" for the point on the path, and prohibit passing it again. Points that have not been passed , or points that have returned after hitting a wall , cannot save the scene, and these points may enter the current path later.

  • The role of "recovery site" is to allow reuse. When re-searching for a new path, the method is to step back along the original path from the end point (or the point of collision). Every time a point is retreated, the point is "restored" to allow the new path to pass through this point again.

DFS application

generate permutations

  • arrangement

In some scenarios, the system permutation function next_permutation()is not applicable, and the permutation algorithm needs to be written by itself.

The following code can output the order from small to large , provided that the numbers in a[20] are arranged from small to large, otherwise they need to be sorted first.

In the code, use b[] to record a new full arrangement. When entering dfs() for the first time, b[0] selects a number among n numbers. When entering for the second time, b[1] selects one of the remaining n numbers. -Select a number from 1 number...Use vis[] to record whether a certain number has been selected, and the selected number cannot be selected later.

#include<bits/stdc++.h>
using namespace std;
int a[20]={1,2,3,4,5,6,7,8,9,10,11,12,13};
bool vis[20];  //记录第i个数是否用过
int b[20];  //生成的一个全排列
void dfs(int s,int t){
    if(s==t){   //递归结束,产生一个全排列
        for(int i=0;i<t;++i)cout<<b[i]<<" ";  //输出一个排列
        cout<<"; ";
        return;
    }
    for(int i=0;i<t;i++)
       if(!vis[i]){
           vis[i]=true;
           b[s]=a[i];
           dfs(s+1,t);
           vis[i]=false;
       }
}
int main(){
    int n=3
    dfs(0,n);  //前n个数的全排列
    return 0;
}

Output after running:1 2 3; 1 3 2; 2 1 3; 2 3 1; 3 1 2; 3 2 1;

If you need to output the arrangement of any m numbers among n numbers, for example, take any three numbers among 4 numbers, change line 21 in the above code to n=4, and then line 7 and 8 in dfs() The t can be changed to 3.

  • combination

Combine with DFS output. When performing DFS, various combinations can be realized by selecting or not selecting the kth number.

Take three numbers {1,2,3} as an example:

#include<bits/stdc++.h>
using namespace std;
int a[]={1,2,3,4,5,6,7,8,9,10};
int vis[10];
void dfs(int k){
    if(k==3){
        for(int i=0;i<3;i++)
            if(vis[i])cout<<a[i];
        cout<<"-";
    }
    else {
       vis[k]=0; //不选中第k个数
       dfs(k+1); //继续搜下一个数
       vis[k]=1; //选这个数
       dfs(k+1); //继续搜下一个数
    }
}
int main(){
    dfs(0); //从第一个数开始
    return 0;
}

Output result:-3-2-23-1-13-12-123-

maze problem

Let's look at an example first:

insert image description here

The person at a certain point in this question walks along the sign, or he can go out in the end, or he doesn't go out. This kind of "all the way to the end" walking method is a typical DFS.

If this problem is directly solved by DFS, the complexity is O(n^4). When n is large, the code will seriously time out.

In fact, there is no need to execute dfs for each point. For example, starting from a point, walking a path, and finally getting out of the maze, then starting from all points on this path, you can get out of the maze; if you go around this path , then walking through all the points on this path cannot get out of the maze. So the key is how to mark the entire path , so as to greatly reduce the amount of computation. The optimized code is solve[][]implemented with .

...
char mp[n+2][n+2];
bool vis[n+2][n+2];
int solve[n+2][n+2]; //solve[i][j]=1表示这个点能走出去,solve[i][j]=2表示这个点走不出去
int ans=0;
int cnt=0;
bool dfs(int i,int j){
    if(i<0||i>n-1||j<0||j>n-1)return true;
    if(solve[i][j]==1) return true; //点(i,j)已经算过,能走出去
    if(solve[i][j]==2) return false;//点(i,j)已经算过,走不出去
    if(vis[i][j])return false;
    cnt++; //统计DFS了多少次
    vis[i][j]=true;
    if(mp[i][j]=='L'){
        if(dfs(i,j-1)){solve[i][j]=1;return true;}  //回退,记录整条路径都能走出去
        else    {solve[i][j]=2;return false;}  //回退,记录整条路径都走不出去
    }
    if(mp[i][j]=='R'){
        if(dfs(i,j+1)){solve[i][j]=1;return true;}
        else    {solve[i][j]=2;return false;}
    }
    if(mp[i][j]=='U'){
        if(dfs(i-1,j)){solve[i][j]=1;return true;}
        else    {solve[i][j]=2;return false;}
    }
    if(mp[i][j]=='D'){
        if(dfs(i+1,j)){solve[i][j]=1;return true;}
        else    {solve[i][j]=2;return false;}
    }
}
...

Since the answer can be obtained only by assigning a value to each point in the maze solve[][]once, the complexity is O(n²).

regular problem

Regular expressions, also known as regular expressions, are usually used to retrieve and replace text that matches a certain pattern (rule).

Consider a simple regular expression:

A regular expression consisting only of x ( ) |.

Xiao Ming wants to find the length of the longest string that this regular expression can accept.

For example ((xx|xxx)x|(x|xx)) The longest character string that xx can accept is: xxxxxx, the length is 6.

In this topic, the parentheses "()" have the highest priority, or the operation "()" takes the second place. Inside the parentheses is a whole, keep the longer one on both sides of the operation.

The subject of the topic is bracket matching, which is a classic stack application, but DFS (recursive) programming can also be used at the same time, which will make the answer easier.

#include<bits/stdc++.h>
using namespace std;
string s;
int pos=0;  //当前的位置
int dfs(){
    int tmp=0,ans=0;
    int len=s.size();
    while(pos<len){
       if(s[pos]=='()') {pos++;tmp+=dfs();}  //左括号,继续递归,相当于入栈
       else if(s[pos]==')') {pos++;break;}   //右括号,递归返回,相当于出栈
       else if(s[pos]=='|') {pos++;ans=max(ans,tmp);tmp=0;}  //检查或操作
       else if(s[pos]=='x') {pos++;tmp++;}    //检查x,并统计x个数
    }
    ans=max(ans,tmp);
    return ans;
}
int main(){
    cin>>s;
    cout<<dfs();
    return 0;
}

BFS

basic idea

The principle of BFS is "layer-by-layer diffusion", which starts from the starting point and searches layer by layer. When programming, BFS is implemented with queues. Since the characteristic of BFS is to search layer by layer, the first searched layer is closer to the starting point, so BFS is generally used to solve the shortest path problem.

  • Starting from the initial state S, the state of the next layer is generated using the rules.
  • Sequentially check all the states of the next layer to see if the goal state G is present. Otherwise, use the rules for all state nodes of the layer. Generate the state node of the next layer.
  • Continue to generate all the state nodes of the next layer according to the above idea, so that the layers are expanded until the target state appears.

The search tree is traversed in hierarchical order.

code template

Usually implemented with a queue (first in first out, first in first out)

初始化队列Q.
Q={起点s};标记s为已访问;
while(Q非空){
   取Q队首元素u;
   u出队;
   if(u==目标状态){...}
   所有与u相邻且未被访问的点进入队列;
   标记u为已访问;
}

BFS application

find the shortest path

Below is an example of the process of finding the shortest path using BFS.

insert image description here

The change process of the queue is as follows:
insert image description here

BFS is a good algorithm for finding the shortest path, but it is only suitable for one situation: the distance between any two adjacent points is equal, and this distance is generally regarded as 1. In this case, to find the shortest distance from a starting point to an end point, BFS is the optimal algorithm. If the distance of one walk is not 1, then a path with more walks may be shorter than a path with fewer walks. At this time, BFS cannot be used, but Dijkstra, SPFA, Floyd and other algorithms are required.

Give a piece of code for solving the shortest path of an unweighted graph using the BFS algorithm:

//定义图的数据结构
vector<int> adj[N]; //存储图中每个节点的相邻节点

//定义队列及访问标记
queue<int> q;
bool visited[N];

//源点s和目标节点t
int s, t;

//BFS求解最短路径
int bfs(int s, int t) {
    memset(visited, false, sizeof(visited)); //初始化所有节点均未访问过
    q.push(s); //将源点s入队
    visited[s] = true; //标记源点s已访问过

    int step = 0; //记录当前的层数,即走过的步数
    while (!q.empty()) {
        int size = q.size(); //当前层的节点数
        for (int i = 0; i < size; i++) {
            int cur = q.front(); //取出队首节点
            q.pop();

            if (cur == t) return step; //如果找到目标节点t,返回最短路径长度

            for (int j = 0; j < adj[cur].size(); j++) {
                int neighbor = adj[cur][j]; //取出相邻节点
                if (!visited[neighbor]) { //如果该节点未被访问过
                    visited[neighbor] = true; //标记已访问
                    q.push(neighbor); //将该节点入队
                }
            }
        }
        step++; //步数加1
    }

    return -1; //未找到最短路径,返回-1
}

Among them, N represents the total number of nodes in the graph, adj represents the data structure that stores the adjacent nodes of each node in the graph (usually using an adjacency list), s represents the source point, and t represents the target node. In the algorithm, we use a queue to store all the nodes of the current layer. After traversing the current layer, the number of steps step is increased by 1, and the next layer is traversed until the target node is found or all nodes are traversed.

The following problems exist when calculating the shortest path:

  • How long is the shortest path: Note that the length of the shortest path is unique.
  • Which points the shortest path passes through. Since there may be more than one shortest path, the title generally does not require the output path. If output is required, the path with the smallest lexicographical order is generally output.

Connectivity judgment

Connectivity judgment is a simple problem in graph theory. Given a graph consisting of points and edges connecting points, it is required to find the interconnected parts in the graph.

The connectivity problem can be solved with both BFS and DFS. However, if N is large, using DFS may cause errors due to too large recursion depth, and BFS should be used at this time.

BFS connectivity judgment steps:

  1. Start traversing from any point u on the graph and put it into the queue.
  2. Pop up the head of the queue u, mark point u has been searched, and then search for the neighbor points of point u, that is, the points connected to point u, and put them in the queue.
  3. Pop out the head of the queue, mark it as having been searched, then search for the neighbors connected to it, and put it into the queue.

Continue the above steps until the queue is empty, at this time a connected block has been found. Other points that have not been visited belong to another connected block. Continue to process these points according to the above steps. Finally, all points have been searched and all connected blocks have been found.

For example, in the following example, an N*M rectangular area and the status of each area (with/without oil) are given. If two areas with oil are adjacent (horizontal, vertical, oblique), it is considered to belong to The same oil pocket.

Find how many oil pockets there are in this rectangular area.

Ideas:

For each oily area, find out all the oily areas that belong to the same oil pocket, and finally calculate how many oil pockets there are in total.

How to find out all the oil pockets that belong to the same oil pocket?

BFS: find a starting point; start from this point, enumerate the surrounding areas to find oily areas; start from the new areas found in order, and repeat the above process until no new areas are added.

How to mark the oily areas belonging to the same oil pocket?

Set an access flag to indicate whether this area has been included, so the number of calls to BFS = the number of oil pockets.

The code framework is as follows:

Void BFS(int i,int j)
{
     初始化队列Q;
     while(Q不为空)
     {
         取出队首元素u;
         枚举元素u的相邻区域, if (此区域有油)
         {
             入队;访问标记;
         }
     }
}
int main()
{
   …
   枚举所有区域,if (此区域有油&&没有被访问过)
   BFS(…,…)
}

pruning

BFS and DFS are the direct realization of violent law, which can search all possible states. But a lot of time may be wasted on unnecessary calculations.

Pruning is a metaphor: cutting off branches that don't yield answers or are unnecessary. The key is to judge: what branch to cut and where to cut. Pruning is a commonly used optimization method for searching, and it can often optimize exponential complexity to approximate polynomial complexity.

The main pruning technology of BFS is to judge the weight. If there is a repeated state in a certain layer, it will be pruned.

There are many DFS pruning techniques, and the general idea is to " reduce the search state ".

For example, the previous maze problem (there is one person at each point in the maze, ask how many people can come out)

Pruning: memory search can be used, if a point has been searched, there is no need to search again.

There is also the problem of grid division (divide the 6×6 grid into two identical parts, and ask how many divisions there are)

Pruning: Start splitting from the center point. Note that two points that are symmetrical about the center point cannot be split at the same time. Feasibility pruning is used.

Similarly, judging is also pruning, in short, it is to minimize the number of searches.

Summarize

DFS algorithms are usually implemented using stacks or recursion . Since the DFS algorithm only needs to record the current path during the search process, it is excellent in space complexity, but in extreme cases, it may enter an infinite loop and cannot end the search. In addition, the DFS algorithm usually has no way to guarantee that the solution found is the optimal solution, because it will only search for the first solution .

The BFS algorithm is usually implemented using queues . Since the BFS algorithm needs to traverse all nodes to find the shortest path, it is higher than the DFS algorithm in terms of space complexity and time complexity. But the BFS algorithm can guarantee that the solution found must be the shortest , which cannot be guaranteed by the DFS algorithm.

Therefore, when we choose an algorithm, we need to weigh the relationship between time complexity and space complexity according to the characteristics and requirements of the specific problem, and choose an appropriate algorithm. At the same time, when implementing the algorithm, we also need to pay attention to issues such as boundary conditions and algorithm optimization to obtain better results.

Today, DFS and BFS algorithms have become one of the most important algorithms in computer science and have been widely used. For example, the backpropagation algorithm in deep learning can be regarded as a DFS-based algorithm, and in computer networks, the implementation of routing algorithms also uses a large number of DFS and BFS algorithms.

In addition, because the DFS and BFS algorithms have the characteristics of low computational time complexity and simple implementation, they have also become the basis of many algorithms, such as connectivity analysis algorithms in image processing and search algorithms in artificial intelligence. A proficient understanding of DFS and BFS can help us better learn other fields.

Guess you like

Origin blog.csdn.net/m0_61443432/article/details/130015075