Greedy algorithm: how to achieve huffman compression coding with the greedy algorithm?

Greedy algorithm: how to achieve huffman compression coding with the greedy algorithm?

Greedy algorithm (greedy algorithm) There are many classic applications, Huffman coding, Prim and Kruskal minimum spanning tree algorithm as well as single-source shortest path algorithm Dijkstra, how to use the greedy algorithm to achieve efficient data compression coding data to save memory space?

How to understand the "greedy algorithm"?

Suppose we have a backpack can hold 100kg items, the following five kinds of beans, and the total amount of the total value of each different beans, in order to make the total value of goods loaded backpack maximum, how to choose and how much of it loaded?

article The total amount (kg) The total value of the (yuan)
soybean 100 100
green beans 30 90
red beans 60 120
Black beans 20 80
Haricot vert 50 75

Only to calculate the price per item, arranged in descending order according to the unit price to be installed, black beans, mung beans, red beans, peas, soybeans. So is the black beans 20kg, 30kg green beans, 50kg red beans, this idea is a greedy algorithm

  • When you see such problems when think of the greedy algorithm: a set of data, we define the limits and expectations, hoping to choose the number of data, while meeting the limit value, the maximum expected value. Value does not exceed the weight limit is 100kg, the total value of the expectations is the largest
  • Try to take it with the greedy algorithm to solve: each selection under the current circumstances, in the case of the limit value equal to the amount of the contribution, the largest contribution to the expected value data. Just inside is from the rest of the beans, select the highest price, ie the same weight, the largest contribution to the value of beans
  • The results give a few examples to see whether the greedy algorithm produces optimal.

In a drawing the right, starting from the vertex s, find the shortest path to the apex T of (the weight of the edges and a minimum value), the algorithm is greedy Solutions: every time a selection is connected to the right with the current minimum edge vertices until you find the vertex. But not the shortest path, because the greedy algorithm does not work is in front of the choice will affect the choice of the latter, so even the first step in selecting the best moves (shortest side) may result missed the global optimum

Greedy algorithm actual analysis

1. Upon candy

M and n have candy children, if m <n, only be allocated to a part of the child candy, candy size of each range, the size of the m candy are s1, s2, s3 ...... sm, each well the demand for child-sized candy is not the same, only the size of the candy is greater than the demand for child-sized candy when children can meet, assuming that the demand for sweets n children sizes are g1, g2 ...... gn

How to distribute candy to meet the largest number of children possible?

N is abstracted from the child, the child to extract a portion of the number of dispensing candy make satisfy children (expected value) maximum limit value is the number of candy m

Every time to find out from the rest of the kids in the candy minimum size requirements, then sent him to meet the rest of the candy in his candy minimum, maximum number of such programs to meet the child's

2. Coins give change

Suppose we have 1 ¥, 2 ¥, 5 ¥, 10 ¥, 20 ¥, 50 ¥, 100 ¥, their number of sheets are c1, c2, c5, c10, c20, c50, c100, now use that money pay k ¥, how many bills are at least take it?

At the same contribution of the expected value (the number of banknotes), hoping to multi-point contribution amounts, you can make the least number of bills

3. interval coverage

Has n sections, the sections on the starting point and end endpoints are [l1, r1], [l2, r2], [l3, r3], ...... [ln, rn], to select a portion of the n-th interval from the interval this part of the range to meet mutually disjoint (the endpoint is not intersect intersection), the number of intervals up to elected?

区间:                     [6,8]   [2,4]  [3,5]  [1,5]   [5,9]  [8,10]
不相交区间:      [2,4]   [6,8]  [8,10]

The idea is to solve the problem: It is assumed that the n leftmost point interval is l min, Rmax of the rightmost point, the problem is equivalent to select several disjoint intervals, the left to right [lmin, rmax] on the cover, from small to large order of these n intervals sorted by starting the endpoint, every time you select a point with the left has been covering the front section do not overlap, and the right end point as small as possible, which would allow the remaining uncovered range as large as possible can be placed more range

How to achieve Huffman coding with the greedy algorithm?

There is a 1000-character file containing 1 byte per character (1byte = 8bits), which store a total of 1,000 characters need 8000bits, there is no more space-saving way to store it?

This 1000 characters contains only six different characters, provided they are a, b, c, d, e, f and 3 bits (bit) can represent eight different characters, in order to reduce storage space, each we use three-character bits, said that the need to store this 1000 characters can only 3000bits

a(000) , b(001) , c(010) ,d(011) , e(100) , f(101)

Huffman coding will not only examine the text in how many different characters, but also examine the frequency of occurrence of each character, depending on the frequency, choose the encoding of different lengths, how to choose the character encoding of different lengths to different frequencies? The higher frequency of occurrence of characters with a slightly shorter, encoding a slightly longer occurs less frequently used characters

Decompression time, each read from the text 3-bit binary code, then you can translate, but Huffman coding are of unequal length, each should take one, two, or three it? To avoid ambiguity, Huffman coding requires a code is another code prefix does not happen between each character encoding

011  010  100  011  101  001
 d       c         e      d        f      b

It is assumed that the frequency of occurrence of the characters six descending order are a, b, c, d, e, f, we encode them to look like this, any character code is not a prefix of another, decompressor when we read as long as possible can extract the binary string, after this encoding compression, 1000 characters require only 2100bits on it

character Frequency of occurrence coding The total number of binary
a 450 1 450
b 350 01 700
c 90 001 270
d 60 0001 240
e 30 00001 150
f 20 00000 100

How to code it different lengths depending on different character to the character frequency of occurrence?

Each character seen as a node, and comes into the frequency of the priority queue, the minimum frequency extracted two nodes A, B from the queue, then a new node C, the frequency is set to two nodes frequencies and and the node C as the a, B of the parent node, the C put into the priority queue, is repeated until there is no data in the queue, and finally form an overall character. Videos to each edge plus a weight value, pointing to the left child node all edges are labeled 0, numerals refer to the right side of a child node, then the path from the root to the Huffman leaf node is a leaf node corresponding to the character coding

​ p

​ k a (1)

​ z b (01)

including (001)

​ x d (0001)

f (00000) e(00001)

1. In a non-negative integers a, removed from k digits, so that the rest of the digital value of the minimum, how to choose which K numbers to remove it?

Integers a, by a number of digits, began to be removed from the high, remove the larger than its lower digit high numbers: K cycles, which means starting from the highest level, a relatively low numbers, if high large shift In addition, small high, right by one bit to continue comparing cycle times K

For example, 4556847594546 -> First 455647594546--> Second 45547594546--> Third 4547594546--> Fourth 447594546-> Fifth 44,594,546

2. n waiting to be personal service, but the window is a different length of time each person needs to be serviced, how to arrange the order being served, to make a personal total of n shortest waiting time?

The shortest waiting time begins service

Published 75 original articles · won praise 9 · views 9184

Guess you like

Origin blog.csdn.net/ywangjiyl/article/details/104537624