A* Algorithm Proof

The target audience of this article is those who have not yet understood the A* algorithm or who want to learn more about it.

Given two nodes in a directed graph to find the shortest path between them, the A* algorithm can be used, and the algorithm is listed below.

For a node N in the graph, the starting node src, and the ending node dst:
During the execution of the entire algorithm, the nodes in the graph will be traversed from the src node (maybe some will not be traversed). to), when the algorithm traverses to a node, it will record the path from src to the node, and the length of the path is recorded as g(N). When the algorithm is initialized, it is natural that the g value of the src node is 0, and the g value of other nodes is ∞, indicating that it has not been traversed. In addition, a node may be traversed multiple times, for example:
           A 

      ↗ ↘


src D → ….
      ↘ ↗

          B    


algorithm (possibly) after traversing D through the path src->A->D, it goes back to visit B, Then visit D again.
Keep in mind that the g value of a node is dynamically generated - it is generated the first time it is traversed (and may be updated on subsequent traversals, for a later story), don't be intimidated by it, if you will , you can just treat it as the length of some path from the starting point to this point.

h(N) represents the "estimated" length from node N to the target point dst. Before the algorithm starts, the h value of each node is predetermined. (I don’t care how to determine it) One of the ideas of the A* algorithm is that the path is slowly discovered. After you find a node in the middle, you don’t know how far it is from your target point. You can only know about it. . Don't worry, this will be explained later.

The sum of g(N) and h(N) is denoted as f(N). f(n) represents the traversal trend of the algorithm, please keep it in mind, g(N) represents the distance from this node to the starting point (passed during this traversal), and h(N) represents the distance from this node to the end point (estimated) distance.




With the above concepts, let's take a look at the formal steps of the algorithm:
1. Create two containers named open and closed;
2. Add the src node to open, and set the g value of the src node to the value of 0, set the parent of the src node to null;
3. As long as the open container is not empty, perform the following steps:
remember the node with the largest f value in the open container as cur, move cur into closed and delete it from open
If cur is the dst node, then the algorithm ends, and the path generated by tracing the parent from cur is the shortest path; next, for each adjacent node n of the cur node:
if n
skips , traverse the next An adjacent node
If n is in open
If the g value of cur plus the distance from cur to n is less than the g value of the existing n, then update the g value of the existing n to the former, and update its parent to cur; otherwise do nothing.
Otherwise,
add it to open, its g value is set to the g value of cur plus the distance from cur to the edge of n, and its parent is set to cur

Before proving the algorithm, we must restrict the selection conditions of the h function.
h*(N) represents the actual distance from node N to dst. Select the condition h(N)<=h*(N)

according to h function:
d(i,j) = h*(i)-h*(j)
h *(i) > h(i)
h*(j) > h(j)
Therefore:
d(i,j) < h(i) - h(j)

The points on the shortest path (path) are (v0, v1, v2, ..., vk , ..., vt), where k=0...t, v0 is the starting point, and vt is the target point.


Conclusion 0:
i reaches j through the shortest path p of i->...->k->...->j, k is an intermediate point on this path, then i->...k is also The shortest path from i to k.
Proof:
If i->...k is not the shortest path from i to k, and the shortest path from i->k is recorded as p1, then p1+(k->...->j) is the shortest path from i to j path, contrary to p being the shortest path.

Conclusion 1:
For node N on the shortest path, when the node in front of it enters close, its g value is the shortest distance from the starting point, because when N-1 enters close, the value of N will be calculated. Specify or update, and this value is the smallest.

Conclusion 2:
For all points in front of a point vj on the shortest path, whenever vj enters open, these points have either entered closed, or at least one of them is in open. The following is the proof:
because v0 is the starting point, v0 must be Enter close first, and expand v1 to open, at this time g(v1) is the shortest path from the starting point to v1;
after that, set N=1, 2...j-1, for vj in open and it The point on the previous shortest path
because h(N)-h(j)<d(N,j) // the h function selects the condition
g(j)-g(N)>d(N,j) // because this When g(N) is the shortest path length from the starting point to N (see conclusion 0)
// g(j) is the current path length from the starting point to j
// but at this time (the path corresponding to g(N))+(N,j) is the shortest path
// so g(j)>g( N)+d(N,j)
so (g(j)-g(N))-(h(N)-h(j))>0
is f(j)-f(N)>0
is f( N)<f(j)
can see f(v1)<f(vj), so if v1 and vj are open at the same time (vj may have been accessed through other paths), then v1 must enter close before vj, and expand v2 goes to open (if it is not there) or update its g(v1), at this time g(v1) is the shortest path from the starting point to v1 It can be
seen that f(v2)<f(vj), so if v2 and vj are at the same time In open (vj may have been accessed through other paths), then v2 must enter close before vj, and expand v3 to open (if it is not there) or update its g(v2), then g(v1) is The shortest path from the starting point to v1
... It can be
seen that f(vj-1)<f(vj), so if vj-1 and vj are in open at the same time, then vj-1 will enter close before vj, and expand vj Go to open (if it is not there) or update its g(vj), where g(vj) is the shortest path from the starting point to vj


In other words , conclusion 2 shows that the points on the shortest path will enter close in sequence .


Note that conclusion 3 only indicates that the points on the shortest path will enter the close in sequence, and does not indicate that these points will enter the open in sequence, and does not indicate whether there are other points interspersed between them to enter the close.
Example:
   10 B 20
    ↗ ↘
A D
    ↘ C ↗
    14 6
   
   
The starting point is A and the end point is D. Now use the A* algorithm to find the shortest path

In order to find the shortest path, let
h(A)=10 (less than the actual shortest distance from A to D)
h(B)=5 (less than the actual shortest distance from B to D)
h(C)=4 (less than the shortest distance from C to D)
h(D)=0


When starting:
open = {A(g=0 h=10)}
close = {}

1. A is the smallest in open, add A to close, and add them to open for descendants B and C of A, so
open = {B(g=10 h=5), C(g=14 h= 4)}
close = {A}

2. B is the smallest in open, add B to close, and for B's descendant D, add it to open, so
open = {C(g=14 h=4), D( g=30 h=0)}
close = {A, B}

3. C is the smallest in open, and C is added to close. For B's descendant D, it already exists in open, but because of the g value of c->d Smaller, so update the g value of D to the g value of C and add the edge from C to D, so
open = {D(g=20 h=0)}
close = {ABC}

4. Take D from open, and the algorithm ends

It can be seen that although the end point D enters open in step 2, C has always existed because the value of f is smaller than it - until C enters close, it can enter close.
In addition, even if the end point enters the open in advance and the g value is obtained in advance, the g value will be updated to the shortest when the shortest path exits the open.

Guess you like

Origin http://10.200.1.11:23101/article/api/json?id=326915518&siteId=291194637