Data Structure - Greedy Algorithm

The basic concept of greedy algorithm:

 The so-called greedy algorithm means that when solving a problem, it always makes the best choice at present . That is to say, without considering the overall optimality, what it makes is only a local optimal solution in a certain sense .
 Greedy algorithm does not get the overall optimal solution for all problems, the key is the choice of greedy strategy

General steps of greedy algorithm
 Establish a mathematical model to describe the problem and clarify what is the optimal solution.
 Divide the problem to be solved into several sub-problems.
 Solve each sub-problem and obtain the local optimal solution of the sub-problem.
 Combine the local optimal solutions of the subproblems into a solution of the original solution problem.

example:

 Assuming that there is the following class schedule, due to conflicts in the class time of some classes, it is impossible
to allow these classes to be taught in this classroom.
 You want to schedule as many classes as possible.
 How to choose as many courses as possible without conflicting time?

 This problem seems difficult.
 In fact, the algorithm may surprise you with its simplicity. The specific method is as follows
(1) Select the class that ends the earliest, which is the first class to be taught in this classroom.
(2) Next, you must select a class that starts after the end of the first class. Likewise, you choose to end the earliest class, which will be the second class to be taught in this classroom.

 Problem example
 We found that this algorithm is very easy and obvious, which is the advantage of greedy algorithm - simple and easy!
 The greedy algorithm is simple: take the best practice at each step. In this example, you choose to end the earliest class every time, that is, you choose the local optimal solution at each step, and finally get the global optimal solution.
 For this scheduling problem, the above simple algorithm finds the optimal solution!
 Of course, the greedy algorithm doesn't work well in every situation, but it's easy to implement!

 The knapsack problem.
 There is a backpack that can carry a maximum weight of 150 kg. Now there are 7 items
whose weights are [35, 30, 60, 50, 40, 10, 25], and their
values ​​are [10, 40, 30, 50, 35, 40, 30].
 If it were you, how should we choose to make our backpack carry
the most valuable items?

That is to say: every item, either choose or give up, this is the famous 0-1 knapsack problem.

Example of a problem—the knapsack problem
 Follow the steps mentioned before:
(1) Determine what is the optimal solution?
 Within the weight limit, the option with the greatest value is the optimal solution.
(2) Divide the problem to be solved into several sub-problems, and write them down in a small notebook!
 In fact, there are many ways to define the sub-problem. Here we choose the most
direct way to define it: we think that every time we try to choose
the item with the highest current value, this is the local optimal solution.
(3) Find the optimal solutions of the sub-problems respectively and then stack to obtain the global optimal solution.
 Calculate according to the established rules (value), the sequence is: 4 2 6 5 .
 The final total weight is: 130.
 The final total value is: 165.

 Example of the problem—the knapsack problem
 The above is a method, but there are different
ways to define the optimal solution of the sub-problem.
 Just now we used the highest value as the standard, and we can also use
the smallest weight among the current items as the optimal choice.
 Calculate according to the established rules (weight), the sequence is: 6 7 2 1 5 .
 The final total weight is: 140.
 The final total value is: 155.
 It can be seen that the weight-first strategy is better than the value-first strategy.

 Example of the problem—the knapsack problem
 In fact, it can also be defined as follows: choose
the item with the highest "value density" every time.
 Value density refers to the value per unit weight.
 Calculate according to the established rules (value density), the sequence is: 6 2 7
4 1.
 The final total weight is: 150.
 The final total value is: 170.  It can be seen that the value density strategy is better
than the previous value strategy and weight strategy.

Example Questions
 Exercise 1
You work for a furniture company and need to ship furniture all over the country. For this you need to load boxes on trucks. Each box is a different size, and you need to make the most of the space in each truck, so how do you choose which boxes to load on the truck? Please design a greedy algorithm. Is there an optimal solution using this algorithm?

Example Questions
 Exercise 2
You are going to travel to Europe and the total itinerary is 7 days. For each tourist attraction, you assign it a value - how much you want to see it, and estimate how long it will take. How do you maximize the value of this trip? Please design a greedy algorithm. Is there an optimal solution using this algorithm?

Exercise 1 A greedy strategy is to choose the largest box that fits in the remaining space of the truck, and repeat this process until no more boxes can fit. An optimal solution cannot be obtained using this algorithm.
Exercise 2 Keep picking the most valuable activities that can be done in the remaining time until there is not enough time left to complete any activities. An optimal solution cannot be obtained using this algorithm.

Example of a problem—a collection of coverage problems
 Suppose you have a radio program that can be heard by listeners in all 50 states of the United States. To do this, you need to decide which radio stations to air on. There is a fee to appear on each station, so you try to appear on as few stations as possible. The list of existing radio stations is as follows

  Each broadcast station covers a specific area, and the coverage areas of different broadcast stations may overlap.
 How to find the smallest collection of radio stations covering all 50 states in the United States?

 Example of the problem - set coverage problem
 The specific method is as follows.
a. List every possible set of broadcast stations, which is called the power set. There are 2n powers of possible subsets.

b. Of these sets, choose the smallest set that covers all 50 US states.

The problem is that computing every possible subset of broadcast stations takes a long time. Since there are one possible set 2^{n}, the running time is O( 2^{n}). 

If there are not many broadcasting stations, only 5 to 10, this is feasible. But what happens if there are many radio stations?
 With the increase of broadcasting stations, the time required will increase dramatically. Assuming you can compute 10 subsets per second, the time required will be as above.


 No algorithm can solve this problem fast enough! How to do it?
 Greedy algorithm can resolve the crisis! A very close solution is obtained using the following greedy algorithm.
 Pick the station that covers the most uncovered states. It doesn't matter if the station covers some of the covered states.
 Repeat step 1 until all states are covered.

 This is an approximation algorithm (approximation algorithm).
 Approximate algorithms can be used when the time required to obtain an exact solution is too long. The criteria for judging the pros and cons of an approximate algorithm are as follows:
 How fast the speed is;
 How close the approximate solution is to the optimal solution.
 Greedy algorithms are good choices, they are not only simple, but they usually run very fast. In this example, the running time of the greedy algorithm is O(n2), where n is the number of broadcasting stations.


 A set is similar to a list, except that the same element can only appear once, that is, the set cannot contain repeated elements.
 For example, suppose you have the following list.
 Convert it to a collection. In this set, 1, 2, and 3 all appear only once.


  Example of the problem-set coverage problem
 There is also a list of broadcast stations to choose from, and I choose to use a hash table to represent it.

 The key is the name of the broadcasting station, and the value is the state covered by the broadcasting station. In this example, radio station kone covers the states of Idaho, Nevada, and Utah. All values ​​are collections.
 Finally, a collection needs to be used to store the final selected radio stations.

# 需要覆盖广播台的州
states_needed = set(["mt", "wa", "or", "id", "nv", "ut", "ca", "az"])
#可供选择的广播台可覆盖的州 
stations = {}
stations["kone"] = set(["id", "nv", "ut"])
stations["ktwo"] = set(["wa", "id", "mt"])
stations["kthree"] = set(["or", "nv", "ca"])
stations["kfour"] = set(["nv", "ut"])
stations["kfive"] = set(["ca", "az"])
#存储最终选择的广播台
final_stations = set()

while states_needed: ##states_=needed当中还存在有州没有完成覆盖
	#遍历所有的广播台,从中选择覆盖了最多的未覆盖州的广播台,存储在best_station中
	best_station = None
	#储存该广播台覆盖的所有未覆盖的州
	states_covered = set()
	for station, states_for_station in stations.items():
		#同时出现在states_needed和states_for_station中的州:当前广播台覆盖的一系列还未覆盖的州
		covered = states_needed & states_for_station
		if len(covered) > len(states_covered):
			best_station = station
			states_covered = covered

	states_needed -= states_covered
	final_stations.add(best_station)

print(final_stations)

 Radio stations selected may be 2, 3, 4 and 5 instead of 1, 2, 3 and 5 as expected. Let's compare the running time of the greedy algorithm and the exact algorithm

  Problem example——activity selection problem
 Assume that there are n activities, and these activities will occupy the same site, and the site can only be used by one activity at a time.
 Each activity has a start time Si and an end time Fi (the time in the title is expressed as an integer), indicating that the activity occupies a venue in the [Si, fi) interval.
(Note: open on the left and close on the right)
 Question: Which activities can be arranged to maximize the number of activities held in this venue?

 Greedy conclusion: The activity that ends first must be part of the optimal solution.
 Proof: Assume that a is the activity that ends first among all activities, and b is the activity that ends first among the optimal solutions.
 If a=b, the conclusion is valid.
 If a!=b, then the end time of b must be later than the end time of a, then replace b in the optimal solution with a, and a must not be in the optimal solution Other activities of , overlap in time, so the replaced solution is also the optimal solution

NP-complete problem

The Traveling Salesman Problem Explained

NP-complete problem
 Can the traveling salesman problem be solved by a greedy algorithm?
 We use a simple example to show the steps of the greedy algorithm to solve the traveling salesman problem. Use a Cartesian coordinate system to represent the location of each city, where the starting
point is (0, 0).
 Greedy algorithm: start from the starting point (0, 0), select the nearest point; then start from this point, select the nearest point; repeat this step until there is no point and return to the starting point.

 NP complete problem
 Randomly choose the starting city, and then every time you choose the next city to go to, choose the nearest city that you haven't visited yet. Suppose the traveling salesman departs from Marin.
 Total journey is 71 miles. This path may not be the shortest, but it's pretty short.

 NP-complete problem
 NP-complete problem (NP-C problem), is one of the seven major mathematical problems in the world
 NP is the problem of Non-deterministic Polynomial, that is,
the non-deterministic problem of polynomial complexity.
 If any NP problem can be transformed
into an NP problem through a polynomial time algorithm, then this NP problem is called NP complete problem
(Non-deterministic Polynomial complete
problem).

Guess you like

Origin blog.csdn.net/jcandzero/article/details/126866142