How to traverse a Huffman tree?

DVM :

I'm new to programming and I'm currently trying to learn Python. I'm doing am following a tutorial that consist in building a Huffman coding app through 4 steps; I've got the first 3 but I'm stuck in the fourth. First step is getting the frequency of a unique character in a string like this:

{'a': 8, 'b': 7, 'c': 6, 'd': 5, 'e': 4, 'f': 3, 'g': 2, 'h': 1}

Second step is putting the characters and frequency in tuples:

[(8, 'a'), (7, 'b'), (6, 'c'), (5, 'd'), (4, 'e'), (3, 'f'), (2, 'g'), (1, 'h')]

Third step is actually building the tree, which looks something like this:

[(36, (15, (7, 'b'), (8, 'a')), (21, (9, (4, 'e'), (5, 'd')), (12, (6, 'c'), 
 (6, (3, 'f'), (3, (1, 'h'), (2, 'g'))))))]

The fourth step is to "traverse" the tree while assigning each branch a code, 1 if it's on the left and 0 if it's on the right. I've thought of a few ways to do this, but I haven't had much success.

I see that each tier of the tree list has two elements (tree[1] and tree[2]) that represent the left and right branches, but I don't know what's the most efficient way to reiterate over the tree. I thought of using a loop that would keep going deeper into each branch if it detected the character belonged to it, but it didn't work and I'm not sure why.

def in_list(my_list, item):
    try:
        return any(item in sublist for sublist in my_list)
    except:
        return False


def get_codes(tree, tuples):

    tree = tree[0]

    for i in tuples:
        check = True
        while check is True:
            if in_list(tree[1], i) is True:
                print(i, 'is in 1')
                node1 = True
            else:
                node1 = False
            if in_list(tree[2], i) is True:
                print(i, 'is in 2')
                node2 = True
            else:
                node2 = False

            tree = tree[2]

            if node1 is False and node2 is False:
                check = False

    return 'test'

I'm not even sure this is the best way to approach this.

Here's my full code in case it's necessary:

def get_frequency(string):
    freq = dict.fromkeys(string, 0)   # fromKeys function takes every unique element of the given parameter.
    for i in string:                  # If given a string, it take each unique character.
        freq[i] += 1

    print('Step 1: ', freq)
    return freq


def get_nodes(dictionary):
    node_list = []

    list_keys = list(dictionary.values())
    list_values = list(dictionary.keys())

    for i in range(len(list_keys)):
        node_list.append((list_keys[i], list_values[i]))

    print('Step 2: ', node_list)
    return node_list


def get_tree(node_set):

    tree_list = node_set

    for i in range(len(tree_list)-1):

        # Sort nodes in ascending order. Lowest frequency first
        tree_list.sort(key=lambda tup: tup[0])

        # Defining the next node (f,l,r) based on the two tuples with the lowest frequency
        l_freq = tree_list[0][0]
        r_freq = tree_list[1][0]
        f = l_freq + r_freq

        l_tuple = tree_list[0]
        r_tuple = tree_list[1]

        # Append new node, delete old node
        node = (f, l_tuple, r_tuple)
        tree_list.remove(l_tuple)
        tree_list.remove(r_tuple)
        tree_list.append(node)

    print('Step 3: ', tree_list)
    return tree_list


def in_list(my_list, item):
    try:
        return any(item in sublist for sublist in my_list)
    except:
        return False


def get_codes(tree, tuples):

    tree = tree[0]

    for i in tuples:
        check = True
        while check is True:
            if in_list(tree[1], i) is True:
                print(i, 'is in 1')
                node1 = True
            else:
                node1 = False
            if in_list(tree[2], i) is True:
                print(i, 'is in 2')
                node2 = True
            else:
                node2 = False

            tree = tree[2]

            if node1 is False and node2 is False:
                check = False

    return 'test'


text = 'aaaaaaaabbbbbbbccccccdddddeeeefffggh'

frequency = get_frequency(text)
nodes = get_nodes(frequency)
huff_tree = get_tree(nodes)
codes = get_codes(huff_tree, get_nodes(frequency))
kaya3 :

Usually, the most convenient way to traverse a tree is recursively. This tree has two kinds of nodes:

  • Tuples with 2 elements (_, letter) are always leaves.
  • Tuples with 3 elements (_, left, right) are always internal nodes.

The algorithm is like this: if you're at an internal node, recurse on the left by extending the current codeword with 0, and then recurse on the right by extending the current codeword with 1. If you're at a leaf node, just pair the letter with the current codeword. The "initial" codeword is the empty string.

Here's an implementation using a recursive generator function.

def make_codewords(tree):
    def helper(node, codeword):
        if len(node) == 2:
            _, letter = node
            yield (codeword, letter)
        else:
            _, left, right = node
            yield from helper(left, codeword + '0')
            yield from helper(right, codeword + '1')

    root = tree[0]
    # convert (codeword, letter) pairs to dictionary
    return dict(helper(root, ''))

Example:

>>> tree = [(36, (15, (7, 'b'), (8, 'a')), (21, (9, (4, 'e'), (5, 'd')), (12, (6, 'c'), (6, (3, 'f'), (3, (1, 'h'), (2, 'g'))))))]
>>> make_codewords(tree)
{'00': 'b',
 '01': 'a',
 '100': 'e',
 '101': 'd',
 '110': 'c',
 '1110': 'f',
 '11110': 'h',
 '11111': 'g'}

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=23770&siteId=1