Hash algorithm based on Python

Hash (Hash) algorithm, also known as hash algorithm, is a method of creating digital "fingerprint" from arbitrary data content through a one-way function (One-way Function), and is an important cornerstone of cryptography security. This algorithm compresses the message or data into a digest, which reduces the amount of data and fixes the data format. After the plaintext information of any length is calculated by hash, the length of the output information digest is consistent.

Hash algorithm and its characteristics:

(1) Forward fast: Given the plaintext and hash algorithm, the hash value of plaintext of any length can be quickly calculated within limited time and limited resources.

(2) Avalanche effect: If there is any change in the original input information, the generated hash value will be very different.

(3) Difficulty in reverse: Given a number of hash values, under the existing computing conditions, it is almost impossible to reverse the corresponding original plaintext within a limited time.

(4) Conflict avoidance: In general, different plaintexts will not get the same hash value after hash calculation.

It is very simple to generate a hash value in the Python language, either through its built-in hash() function or through the MD5 algorithm of the hashlib module.

from os import listdir
from os.path import isdir, join
from hashlib import md5
import sys

indent = 0 #indent initial value

class Node: #Define the node class
    def add_child(self, child):
        assert isinstance(child, Node)
        is_leaf = False
        if child in self.children:
            return
        self.children.append(child)
        hashes = []
        for node in self.children:
            hashes.append(node.get_hash())
        prehash = ''.join(hashes)
        self.node_hash = md5(prehash.encode('utf-8')).hexdigest()
      
    def get_hash(self): #Return the hash value of the node
        return self.node_hash

    def generate_file_hash(self, path): #Information summary of the generated file
        # print('{}Generating hash for {}'.format(' ' * indent * 2, path))
        file_hash = md5() #file information summary

        if isdir(path):
            file_hash.update(''.encode('utf-8'))
        else:
            with open(path, 'rb') as f:
                for chunk in iter(lambda: f.read(4096), b''):
                    file_hash.update(chunk)
        return file_hash.hexdigest()

    def __str__(self): #Define the output format
        if isdir(self.path):            
            output = self.path  + ' (' + self.get_hash() + ')'
        else:
            output = self.path + ' (' + self.get_hash() + ')'
        child_count = 0
        for child in self.children:
            toadd = str(child)
            line_count = 0
            for line in toadd.split('\n'):
                output += '\n'
                if line_count == 0 and child_count == len(self.children) - 1:
                    output += '`-- ' + line
                elif line_count == 0 and child_count != len(self.children) - 1:
                    output += '|-- ' + line
                elif child_count != len(self.children) -1:
                    output += '|   ' + line
                else:
                    output += '    ' + line
                line_count += 1
            child_count += 1
        return output

    def __init__(self, path): #initialization
        global indent
        self.path = path
        self.children = []
        self.node_hash = self.generate_file_hash(path)
        self.is_leaf = True
        
        if not isdir(path):
            # print("{}Exiting init".format(' ' * indent * 2))
            return
        for obj in sorted(listdir(path)):
            # print("{}Adding child called {}".format(' ' * indent * 2, dir))
            indent += 1
            new_child = Node(join(path, obj))
            indent -= 1
            self.add_child(new_child)
      
if __name__ == '__main__': #main program
    tree = None
    if len(sys.argv) < 2: #If no directory is specified, it defaults to the current directory of the program
        tree = Node('./')
    else:
        tree = Node(sys.argv[1]) #parameter 1 is the specified directory
    print(tree) #output directory tree structure with hash value

The running screenshot is as follows:

Hash algorithm based on Python

Guess you like