table of Contents
The coolest application of hash functions-blockchain
Data Necklace-A compromise between hash and list, a compromise between O(1) and O(n)
Mapping abstract data types and their implementation
Abstract data type "mapping": ADT Map
background
Hashing: Hashing
- As mentioned above, if the data items are sorted by size, binary search is used to reduce the complexity of the algorithm
- Construct a new data structure to reduce the complexity of the search algorithm to O(1)
- Need to have more prior knowledge of the location of the data item
Hash table
- Every storage location-slot, each slot has a unique name
Using the method of finding the remainder, get the hash table
- Load factor: the proportion of the slot occupied by the data item
Perfect hash function
If a hash function can map each data item to a different slot, then this hash function is called a "perfect hash function"
Data items often change, how to design a perfect hash function?
A good hash function has characteristics
- Least conflict (near perfect)
- Low computational difficulty (small additional overhead)
- Fully disperse data items (save space)
application
"Fingerprint function"
- Compressibility-"fingerprints" obtained from data of any length have a fixed length
- Ease of calculation-it is easy to calculate the "fingerprint" from the original data, and it is almost impossible to calculate the original data from the fingerprint
- Modification resistance-minor changes to the original data will cause a huge change in the "fingerprint"
- Anti-conflict-Knowing the original data and the "fingerprint", it is very difficult to find the data with the same fingerprint (forgery)
example
import hashlib
hashlib.md5("hello world!").hexdigest()
hashlib.sha1("hello world!").hexdigest()
# 还可以用update方法
m = hashlib.md5()
m.update("hello world!")
m.update("this is part #2")
m.hexdigest()
- Save password in encrypted form
- Prevent file tampering
- Lottery betting application
The coolest application of hash functions-blockchain
meaning
Blockchain is a distributed database
The nodes connected through the network, each node saves all the data of the entire database, and the data stored in any location will be synchronized
Essential characteristics
Decentralization: There is no control center or coordination center node. All nodes are equal and cannot be controlled
Proof of workload: Whoever has a lot of workload will master the modification of the entire network;
Isn't hash calculation very easy to calculate? Why pay massive calculations?
Because it is difficult to calculate, the speed of new block generation is controlled to facilitate synchronization in the entire distributed network
Design of hash function
Folding method
Square taking method
Non-numerical treatment
Increasing weight is a good way to deal with anagrams, but it increases the amount of calculation
Therefore, the hash function cannot become the computational burden of the stored procedure and the search process, otherwise you can directly perform sequential search and binary search.
Hash conflict resolution
Skip detection method
Rehash
The length of the hash table is set to a prime number to ensure uniform distribution
Second detection
Data Necklace-A compromise between hash and list, a compromise between O(1) and O(n)
Mapping abstract data types and their implementation
Abstract data type "mapping": ADT Map
Code example
H=HashTable()
H[54]="cat"
H[24]="dog"
print(H.slots)
print(H.data)
print(H[24]) # dog
print(H[20]) # None
class HashTable:
def __init__(self):
self.size = 11
self.slots = [None]*self.size
self.data = [None]*self.size