Python Algorithm Design - Huffman Coding

1. Huffman tree

insert image description here

The above picture is a Huffman tree constructed based on the letter frequency obtained in "this is an example of a huffman tree"

2. Huffman coding

Over the years, Huffman coding has been very advanced in statistical data compression, and it should be noted that the main reason may be that arithmetic coding is the idea of ​​patented

Huffman coding and one of Samuel Morse (Samuel Morse) Very similar in that it creates a sparse representation of the data

but unlike Morse codes, Huffman codes have a unique prefix which removes the need for a delimiter and the resulting code has only one way to decode it. But his disadvantage is that any one bit error can easily destroy the rest of the message

3. Python algorithm implementation


from collections import Counter

def find_min(freq):
    item = min(freq, key=lambda i: i[0])
    freq.remove(item)
    return item
def print_codes(tree, prefix=''):
    if isinstance(tree, tuple):
        print_codes(tree[0], prefix + '0')
        print_codes(tree[1], prefix + '1')
    else:
        print(tree, prefix)
def huffman_codes(text):
    freq = [(i, x) for x, i in Counter(text).items()]
    while len(freq) > 1:
        li, lx = find_min(freq)
        ri, rx = find_min(freq)
        freq.append((li + ri, (lx, rx)))
    print_codes(freq.pop()[1])


huffman_codes('The reason why Mr. chen can succeed is that  he is not only handsome but also wise')

Note :

  • The min() method returns the minimum value of the given parameter, which can be a sequence

  • The lambda function is also called an anonymous function. Using lambda can save the process of defining the function and make the code more streamlined.

  • isinstance(object, classinfo) function to judge whether an object is a known type, similar to
    type(). Where object represents the instance object, and classinfo represents the direct or indirect class name, basic type () or a tuple composed of them. tuple represents a tuple. The return value is bool type

  • Counter is used for counting. Calling it will return an object whose key is a list and whose value is the specific number of the value. The items() function returns a traversable array of (key, value) tuples as a list.

The output result
insert image description here

is shown in the figure. Huffman coding can replace every letter that appears in binary form, so that the information can be encrypted

4. Author Info

Author: Xiaohong's fishing routine, Goal: Make programming more interesting!

Focus on algorithms, reptiles, game development, data analysis, natural language processing, AI, etc., looking forward to your attention, let us grow and code together!

Copyright Note: This article prohibits plagiarism and reprinting, and infringement must be investigated!

Guess you like

Origin blog.csdn.net/qq_44000141/article/details/130309289