[Algorithm Analysis and Design] Greedy Algorithm (Part 1)


1. Learning points

  Understand the concept of greedy algorithm.
  Master the basic elements of greedy algorithms
  (1) Optimal substructure properties
  (2) Greedy selection properties.
  Understand the differences between greedy algorithms and dynamic programming algorithms.
  Understand the general theory of greedy algorithms
  and learn greedy design strategies through application examples.
  (1) Activity scheduling problem ;
  (2) Optimal loading problem ;
  (3) Huffman coding ;
  (4) Single-source shortest path ;
  (5) Minimum spanning tree ;
  (6) Multi-machine scheduling problem .


2. The problem of finding coins

  There are four kinds of coins, quarters, dimes, nickels and cents. Find the method that uses the fewest coins when finding sixty-three cents.
  Naturally, you will think of 2 quarters, 1 dime and 3 cents. Always use the option with the fewest coins under the current conditions. The result is generally the optimal method.
  There are three types of coins, a dime, a nickel and a cent. Find the method that uses the fewest coins when finding a dime.


2.1 Overview

  As the name suggests, the greedy algorithm always makes the best choice at the moment . That is to sayThe greedy algorithm does not consider the overall optimal choice. The choice it makes is only a local optimal choice in a certain sense.. Of course, we hope that the final result obtained by the greedy algorithm will also be the overall optimal . Although the greedy algorithm cannot obtain the overall optimal solution for all problems, it can produce the overall optimal solution for many problems . Such as single source shortest path problem, minimum spanning tree problem, etc.In some cases, even if the greedy algorithm cannot obtain the overall optimal solution, its final result is a good approximation of the optimal solution.


3. Event arrangement issues

  The problem of event arrangement is toSelect the largest consistent sub-set of activities from the given set of activities, is a good example that can be effectively solved using the greedy algorithm. The problem requires efficiently orchestrating a sequence of activities competing for a common resource . The greedy algorithm provides a simple and elegant way to make as many activities as possible compatible with the use of common resources.

  Let there be a set of n activities E={1,2,…,n}, each of which requires the use of the same resource , such as a lecture venue, etc., andOnly one activity can use this resource at a time. Each activity i has a start time si that requires the use of the resource and an end time fi, and si < fi.If activity i is selected, it occupies resources during the half-open time interval [si, fi). If the interval [si, fi) and the interval [sj, fj) do not intersect, then activity i and activity j are said to be compatible. That is to say, when sj≥fi, activity i is compatible with activity j .

  Example: Assume that the start time and end time of 11 activities to be arranged are arranged in non-decreasing order of end time as follows:
Insert image description here


3.1 Strategy selection

  Strategy 1: Give priority to those with an early start time; prove that this method is not feasible.
  Strategy 2: Give priority to those that take up less time; give counterexamples to overturn this strategy.
  Strategy 3: Give priority to those with an early end time ; use the greedy algorithm to get the optimal solution!

  becauseEnter activities in non-decreasing order of their completion time, so the algorithm greedySelector always selects the compatible activity with the earliest completion time to join set A every time . Intuitively, selecting compatible activities this way leaves as much time as possible for unscheduled activities. In other words, the significance of the algorithm's greedy selection is to maximize the remaining arrangable time period in order to schedule as many compatible activities as possible .
  The algorithm greedySelector is extremely efficient. When the input activities have been arranged in non-decreasing order of end time, the algorithm only needs O(n) time to schedule n activities so that the most activities can use public resources compatible with each other .If the given activities are not arranged in non-decreasing order, they can be rearranged in O(nlogn) time.
Insert image description here
  The calculation process of algorithm greedySelector is shown on the left. Each row in the figure corresponds to an iteration of the algorithm. The activities represented by the shaded bars are those that have been selected into set A, while the activities represented by the blank bars are the activities that are currently being checked for compatibility.


3.2 Program code for event arrangement issues

template<class Type>
void GreedySelector(int n, Type s[], Type f[], bool A[])
{
    
    
       A[1]=true;
       int j=1;
       for (int i=2;i<=n;i++) {
    
    
          if (s[i]>=f[j]) {
    
     A[i]=true; j=i; }
          else A[i]=false;
          }
}

  The start time and end time of each activity are stored in the arrays s and f and are arranged in non-decreasing order of the end time

  If the start time Si of the checked activity i is less than the end time fi of the recently selected activity j, then activity i is not selected, otherwise activity i is selected and added to set A.
  Greedy algorithms do not always find the overall optimal solution to the problem. However, for the activity arrangement problem, the greedy algorithm can always find the overall optimal solution , that is, it finally determines the largest compatible activity set A.
  Whether the greedy algorithm is correct and whether it can obtain the optimal solution must be proven !


3.3 Generally use mathematical induction to prove

  Example: Prove that for any natural number n,
  1+2+…+n=n(n+1)/2
  Proof: When n=1, left=1, right=1(1+1)/2=1
  Assume that for any The equation of natural number n holds , then
  1+2+…+n+(n+1)=n(n+1)/2+(n+1)
  =(n+1)(n/2+1)
  =(n +1)2(n/2+1)/2
  =(n+1)(n+2)/2


3.4 Propositions of activity selection algorithm

  The algorithm is executed to the kth step, and k activities are selected, i1=1,i2,…ik; then there is an optimal solution A containing activities i1=1,i2,…ik. According to the above proposition
  ,For any k, the combination of the first k steps of the algorithm will lead to the optimal solution, and at most the nth step, the optimal solution to the problem instance will be obtained.


3.4.1 First check whether it is correct when k=1

  Prove that there is an optimal solution including activity 1.
  Take any optimal solution A, and the activities in A are sorted in ascending order by deadline. If the first activity in A is j≠1, replace activity j in A with 1 to get A', that is, A '=(A-{j})∪{1},
  since f1<=fj, A' is also the optimal solution and contains 1.


3.4.2 Induction step, k->k+1

  The algorithm is executed to the kth step, and activities i1=1,i2,...ik are selected.
  According to the inductive hypothesis, there is an optimal solution A containing i1=1,i2,...ik. The
  remaining activities in A are selected from the set S'
  S'={ i|i∈Si>=fk}
  A={i1,i2,…ik}∪B


3.4.3 Induction steps (continued)

  B is the optimal solution of S' (otherwise, the optimal solution of S' is B*, and B has more activities than B, then B ∪{1,i2,…ik} is the optimal solution of S, and is better than A There are many activities, which contradicts the optimality of A
  ) Treat S' as a sub-problem. According to the inductive basis, there is an optimal solution B' in S', there is the first activity ik+1 in S', and | B' |=| B|, then {i1,i2,…ik}∪B'={i1,i2,…ik,ik+1}∪(B'-{ik+1}) is also the optimal of the original problem untie


4. Basic elements of greedy algorithm

  This section focuses on discussingGeneral characteristics of problems that can be solved using greedy algorithms.
  For a specific problem, how do you know whether the greedy algorithm can be used to solve the problem, and whether the optimal solution to the problem can be obtained? It is difficult to give a positive answer to this question.
  However, it can be seen from many problems that can be solved by greedy algorithms that such problems generally have two important properties : greedy selection properties and optimal substructure properties .


4.1 The property of greedy selection

  The so-called greedy choice property refers toThe overall optimal solution to the problem can be achieved through a series of local optimal choices, that is, greedy selection.. This is the first basic element that makes the greedy algorithm feasible, and it is also the main difference between the greedy algorithm and the dynamic programming algorithm .
  Dynamic programming algorithms usually solve each sub-problem in a bottom-up manner., while greedy algorithms are usually carried out in a top-down manner, making successive greedy choices iteratively, and each time a greedy choice is made, the desired problem is simplified into a smaller sub-problem .
  For a specific problem, to determine whether it has greedy choice properties , it must be provedThe greedy choices made at each step ultimately lead to the overall optimal solution to the problem.


4.2 Optimal substructure properties

  whenThe optimal solution of a problem contains the optimal solutions of its sub-problemsWhen , this problem is said to have optimal substructure properties . The optimal substructure property of a problem is a key feature of the problem that can be solved using dynamic programming or greedy algorithms.


4.3 Differences between greedy algorithm and dynamic programming algorithm

  Both greedy algorithms and dynamic programming algorithms require that the problem has optimal substructure properties , which is a common feature of the two types of algorithms. However, should the greedy algorithm or the dynamic programming algorithm be used to solve problems with optimal substructures ? Can problems that can be solved by the dynamic programming algorithm also be solved by the greedy algorithm? Let's study two classic combinatorial optimization problems and illustrate them.The main difference between greedy algorithm and dynamic programming algorithm


4.4 0-1 knapsack problem (dynamic programming)

  Given n items and a backpack. The weight of item i is Wi, its value is Vi, and the capacity of the backpack is C. How should we choose the items to put in the backpack so that the total value of the items in the backpack is maximized?
  When choosing the items to put in the backpack, there are only 2 choices for each item i, namely, put it in the backpack or not put it in the backpack.Item i cannot be loaded into the backpack multiple times, nor can only part of item i be loaded into the backpack.


4.5 Knapsack problem (greedy choice)

  Similar to the 0-1 knapsack problem, the difference is that when selecting item i to put into the knapsack,You can select part of item i without necessarily putting it all into the backpack, 1≤i≤n.
  These two types of problems both have optimal substructure properties and are very similar. However, the knapsack problem can be solved by the greedy algorithm, but the 0-1 knapsack problem cannot be solved by the greedy algorithm .


4.6 Basic steps for solving the knapsack problem using greedy algorithm

  First, calculate the value Vi/Wi of each item per unit weight, and then, according to the greedy selection strategy , put as many items with the highest unit weight value into the backpack as possible . If all such items are put into the backpack, the total weight of the items in the backpack does not exceed C, thenChoose the item with the next highest value per unit weight and fit as many as possible into your backpack. Continue this strategy until the backpack is full.

void Knapsack(int n,float M,float v[],float w[],float x[])
{
    
    
       Sort(n,v,w);
       int i;
       for (i=1;i<=n;i++) x[i]=0;
       float c=M;
       for (i=1;i<=n;i++) {
    
    
          if (w[i]>c) break;
          x[i]=1;
          c-=w[i];
          }
       if (i<=n) x[i]=c/w[i];
}

  algorithm knapsackMain calculation timeIt consists in sorting various items according to their value per unit weight from large to small . Therefore, the upper bound on the computational time of the algorithm is
O(nlogn).
  In order to prove the correctness of the algorithm, it is also necessary to prove that the knapsack problem has greedy selection properties .

  For the 0-1 knapsack problem, the reason why greedy selection cannot obtain the optimal solution is becauseIn this case, it does not guarantee that the backpack will eventually be filled, partially unused backpack space reduces the value of each kilogram of backpack space . In fact, when considering the 0-1 knapsack problem, you should compare the final solution of choosing the item and not choosing the item, and then make the best choice. This leads to manyoverlapping subproblems. This is another important feature that allows this problem to be solved using dynamic programming algorithms.
  In fact, this is also true. The dynamic programming algorithm can indeed effectively solve the 0-1 knapsack problem .


5. Optimal loading problem

  A batch of containers is to be loaded onto a ship with a carrying capacity c. The weight of container i is Wi. The optimal loading problem requires determining how to load as many containers onto a ship as possible without limiting the loading volume .


5.1 Algorithm description

  The optimal loading problem can be solved using a greedy algorithm. Adopting the greedy selection strategy of loading the lightest weight first can produce the optimal solution to the optimal loading problem .
  Mathematical modeling (omitted).

template<class Type>
void Loading(int x[],  Type w[], Type c, int n)
{
    
    
        int *t = new int [n+1];
        Sort(w, t, n);
        for (int i = 1; i <= n; i++) x[i] = 0;
        for (int i = 1; i <= n && w[t[i]] <= c; i++) {
    
    
	x[t[i]] = 1; 
	c -= w[t[i]];
	}
}

  The question isSubproblems of the 0-1 knapsack problem, the container is equivalent to the item, the weight of the item is wi, and the value vi is equal to 1. The ship load limit C is equivalent to the backpack capacity limit b
  0-1 There is currently no polynomial time algorithm for the knapsack problem, but this special sub-problem can!


5.2 Greedy selection properties

  It can be proved that the optimal loading problem has greedy selection properties .


5.3 Optimal substructure properties

  The optimal loading problem has optimal substructure properties.
  Depend onGreedy Selection Property of Optimal Loading ProblemandOptimal substructure properties, it is easy to prove the correctness of the algorithm.
  The main calculation amount of the algorithm is to sort the containers according to their weight from small to large, so the calculation time required by the algorithm is O(nlogn) .


5.4 Ideas for proving the correctness of the optimal loading problem

  Proposition: For any input instance of size n to the loading problem, the algorithm obtains the optimal solution. Assume that the containers from light to heavy are recorded as 1, 2,...n
  . Inductive basis: Prove that for any input instance containing only one box, the greedy method obtains the optimal solution (obviously).
  Induction step: Suppose that for any input instance of n boxes, the greedy method obtains the optimal solution, then for the input instances of n+1 boxes, the greedy method also obtains the optimal solution!


5.5 Proof of correctness

  Assume that for the input of n containers, the greedy method can obtain the optimal solution. N={1,2,…n,n+1}where w1<=w2<=…<=wn<=wn+1. According to the inductive hypothesis, for N'={2,3,…n,n+1 }, C'=C-w1, the greedy method obtains the optimal solution I', let I=I'∪{1}, it is necessary to prove that I is the optimal solution of the original problem N={1,2,...n,n+1} Excellent solution.
  Otherwise, there is an optimal solution I* about N containing 1 (if there is no 1 in I*, the solution obtained by replacing the first element in I* with 1 is also the optimal solution), and |I*|>| I|, then I*-{1} is the solution of N' and C' and |I* -{1} |>|I -{1} |=|I'|, and I' is about N' and C ''s optimal solution is contradictory, so I* is not the optimal solution of N. The optimal solution of N can only be that I.


6. Huffman coding

  Huffman coding is a very effective coding method widely used for data file compression. Its compression rate is usually between 20% and 90%.The Huffman coding algorithm uses the frequency table of characters appearing in the file to create an optimal representation of each character using a string of 0 and 1..
  Giving characters with high frequency a shorter code and characters with a lower frequency a longer code can greatly shorten the total code length.


6.1 Prefix code

  A string of 0,1 is specified for each character as its code, and the code of any character is required not to be a prefix of other character codes. This encoding is called a prefix code .
  Examples of non-prefix codes are a:001, b:00, c:010, d:01.
  Decode 1:01,00,001 d,b,a.
  Decoding 2: 010, 00, 01 c, b, d.

  Binary tree representation of prefix code:
  Prefix code: {00000,00001,0001,001,01,100,101,11}
  Frequency: {5%,5%,10%,15%,25%,10%,10%,20%}
  Construction tree:
  0-left Subtree 1-The maximum number of digits corresponding to a leaf
  in the right subtree code is the tree depth.
  
  

  The prefix nature of the encoding makes the decoding method very simple.
  The binary tree representing the optimal prefix code is always a complete binary tree, that is, any node in the tree has two child nodes .
  average code lengthIt is defined as:
Insert image description here
  The prefix code encoding scheme that minimizes the average code length is called the optimal prefix code for a given coded character set C.


6.2 Construct Huffman code

  Huffman proposed a greedy algorithm to construct optimal prefix codes, and the resulting coding scheme is called Huffman coding .
  The Huffman algorithm constructs a binary tree T representing the optimal prefix code in a bottom-up manner .
  The algorithm starts with |C| leaf nodes, and performs |C|-1 "merge" operations to generate the final required tree T.

  Example:
  Input: a:45,b:13,c:12,d:16,e:9,f:5,
  construct a Huffman tree from this, and find the encoding of each character
  a:1
  b:011
  c: 010
  d:001
  e:0001
  f:0000

  In the algorithm huffmanTree given in the book, the frequency of each character c in the encoded character set is f©. The priority queue Q with f as the key value is used in greedy selection to effectively determine the two trees with the minimum frequency that the algorithm currently wants to merge.Once the two trees with the minimum frequency are merged, a new tree is generated whose frequency is the sum of the frequencies of the two merged trees, and the new tree is inserted into the priority queue Q. After n-1 merges, there is only one tree left in the priority queue, which is the required tree T. Algorithm huffmanTree
  uses a minimum heap to implement priority queue Q. Initializing the priority queue requires O(n) calculation time. Since the removeMin and put operations of the minimum heap both require O(logn) time, n-1 mergers require a total of O(nlogn) calculation time . Therefore, the calculation time of Huffman's algorithm for n characters is O(nlogn).


6.3 Correctness of Huffman’s algorithm

  To prove the correctness of Huffman's algorithm, we only need to prove that the optimal prefix code problem has greedy selection properties and optimal substructure properties .
  (1) Greedy selection properties
  (2) Optimal substructure properties


6.4 Properties of optimal prefix codes (Lemma 1)

  Lemma 1: C is a character set, ∨c∈C, f© is frequency, x, y ∈C, f(x), f(y) have the smallest frequency, then there is an optimal prefix code such that x, y codewords, etc. long and differing only in the last digit


6.5 Properties of optimal prefix codes (Lemma 2)

  Lemma 2: Let T be a binary tree of prefix code, ∨x,y ∈T, x,y are leaf brothers, z is the father of x,y, let T'=T-{x,y} and let the frequency of z f(z)=f(x)+f(y), T is a binary tree corresponding to the prefix code C'=(C-{x,y})∪{z}, then B(T)=B(T') +f(x)+f(y)


6.6 Ideas for proving algorithm correctness

  theorem:The Huffman algorithm obtains a binary tree of the optimal prefix code for C for any character set C of size n (>=2)..
  Inductive basic proof: For the character set of n=2, Huffman algorithm obtains the optimal prefix code .
  Induction step proof: Assuming that the Huffman algorithm obtains the optimal prefix code for a character set of size k, then it also obtains an optimal prefix code for a character set of size k+1 .


6.7 Basics of induction

  n=2, character set C={x1,x2},
  characters of any code require at least 1 binary digit. The codes obtained by Huffman's algorithm are 1 and 0, which are the optimal prefix codes .
  Assume that the Huffman algorithm obtains the optimal prefix code for a character set of size k . Consider a character set of size k+1 C={x1,x2,…xk+1}, where x1,x2 ∈ C is the minimum frequency of two characters. Let C'=(C-{x,y})∪{z}, f(z)=f(x)+f(y) .
  According to the inductive hypothesis, the algorithm obtains a binary tree T' of the optimal prefix code for the character set C', frequency f(z) and f(xi) (i=3,4,...,k+1).
  Attach x1 and x2 to T' as the sons of z to obtain the tree T. Then T is a binary tree of the optimal prefix code for C=(C'-{z})∪{x1,x2}.
  If not, there exists a better tree T*, B(T*)<B(T), and according to Lemma 1, its leaf brothers are x1, x2.
  Remove x1 and x2 in T*, and get T*'. According to Lemma 2, B(T*') = B(T*)-(f(x1)+f(x2))<B(T)-(f (x1)+f(x2))=B(T').
  It is inconsistent with T' being an optimal prefix code binary tree about C'.


6.8 Application: File Merger

  Question: Given a set S={f1, f2,...fn} of sorted files of different lengths, where fi represents the number of items contained in the i-th file. Use binary merge to merge these files into one sorted file .
  The merging process corresponds to a binary tree: files are leaves .The files merged by fi and fj are their parent nodes


6.9 Pairwise sequential merge

  Example: S={21,10,32,41,18,70}
  Merger cost: Worst case workload
  (1) All calculations = 483.
  (2) (21+10+32+41)*3+(18+70)*2-5=483.
  (3) Huffman merge=456.

Guess you like

Origin blog.csdn.net/m0_65748531/article/details/133420143