Hash (Hash) algorithm, also known as hash algorithm, is a method of creating digital "fingerprint" from arbitrary data content through a one-way function (One-way Function), and is an important cornerstone of cryptography security. This algorithm compresses the message or data into a digest, which reduces the amount of data and fixes the data format. After the plaintext information of any length is calculated by hash, the length of the output information digest is consistent.
Hash algorithm and its characteristics:
(1) Forward fast: Given the plaintext and hash algorithm, the hash value of plaintext of any length can be quickly calculated within limited time and limited resources.
(2) Avalanche effect: If there is any change in the original input information, the generated hash value will be very different.
(3) Difficulty in reverse: Given a number of hash values, under the existing computing conditions, it is almost impossible to reverse the corresponding original plaintext within a limited time.
(4) Conflict avoidance: In general, different plaintexts will not get the same hash value after hash calculation.
It is very simple to generate a hash value in the Python language, either through its built-in hash() function or through the MD5 algorithm of the hashlib module.
from os import listdir from os.path import isdir, join from hashlib import md5 import sys indent = 0 #indent initial value class Node: #Define the node class def add_child(self, child): assert isinstance(child, Node) is_leaf = False if child in self.children: return self.children.append(child) hashes = [] for node in self.children: hashes.append(node.get_hash()) prehash = ''.join(hashes) self.node_hash = md5(prehash.encode('utf-8')).hexdigest() def get_hash(self): #Return the hash value of the node return self.node_hash def generate_file_hash(self, path): #Information summary of the generated file # print('{}Generating hash for {}'.format(' ' * indent * 2, path)) file_hash = md5() #file information summary if isdir(path): file_hash.update(''.encode('utf-8')) else: with open(path, 'rb') as f: for chunk in iter(lambda: f.read(4096), b''): file_hash.update(chunk) return file_hash.hexdigest() def __str__(self): #Define the output format if isdir(self.path): output = self.path + ' (' + self.get_hash() + ')' else: output = self.path + ' (' + self.get_hash() + ')' child_count = 0 for child in self.children: toadd = str(child) line_count = 0 for line in toadd.split('\n'): output += '\n' if line_count == 0 and child_count == len(self.children) - 1: output += '`-- ' + line elif line_count == 0 and child_count != len(self.children) - 1: output += '|-- ' + line elif child_count != len(self.children) -1: output += '| ' + line else: output += ' ' + line line_count += 1 child_count += 1 return output def __init__(self, path): #initialization global indent self.path = path self.children = [] self.node_hash = self.generate_file_hash(path) self.is_leaf = True if not isdir(path): # print("{}Exiting init".format(' ' * indent * 2)) return for obj in sorted(listdir(path)): # print("{}Adding child called {}".format(' ' * indent * 2, dir)) indent += 1 new_child = Node(join(path, obj)) indent -= 1 self.add_child(new_child) if __name__ == '__main__': #main program tree = None if len(sys.argv) < 2: #If no directory is specified, it defaults to the current directory of the program tree = Node('./') else: tree = Node(sys.argv[1]) #parameter 1 is the specified directory print(tree) #output directory tree structure with hash value
The running screenshot is as follows: