Data Structure: Hash Table (Hash Table) Open Addressing Principle and C++ Implementation

Foreword:

        Recently I wrote the RabinKarp algorithm, which uses a certain Hash method. I reviewed the implementation principle of the Hash table, otherwise I feel like I will forget it~~

principle:

Hash table:
        A hash table, also known as a hash table, is a data structure (key-value) corresponding to a key value, that is to say, any data will have a corresponding key value (key) for store it. We can understand him in this way: when we go to the cinema to watch a movie, we obviously have to buy a movie ticket, and the movie ticket has our seat number on it, and we need to be seated when we watch a movie; It is regarded as data (value), and the seats in the cinema are our keys (key). It can be found that each person has a specific action corresponding to it. This is the idea of ​​Hash.
         For the Hash table implemented by the open addressing method, all data is stored in a structure array TheCells , each element in this array is a Hash structure, which stores the elements stored in the structure, that is, our data (value), and it also stores the state of our structure, namely: empty, used, deleted, the reason for this is that we will use the lazy deletion method to implement the element deletion operation.


Find & Insert:
        Then as a data structure, its most important operation is obviously insert or search. For Hash table, search and insert are crucial operations, because it will process a data to be stored in advance and find its corresponding Location. For a Hash table, we will know the size of the Hash table TableSize , which is the size of its structure array. Then I will perform the value (Element) operation on the element we store in it .
 1. If the stored data is of type int, then its value Element is itself;

2. If it is a string type, then the value is the sum of the ASCII codes corresponding to each character;
3. If it is other types, you can also define it according to your own needs;

        So when we get the Element of the corresponding element, where should we store it? A simple way is to store the location, that is, the corresponding key value key = Element % TableSize, so that the corresponding location can be found, and the key is also guaranteed to be within the storage range of the array. But this will also encounter some problems. When our TableSize is not a prime number, we will find that the key value is easy to repeat. One solution is to set the TableSize to a prime number when applying. This is the implementation code of this part:
/* Determine prime function: determine whether the input value is a prime number
 * Return value: bool: true for prime numbers, false otherwise
 * Parameters: num: int input to be judged
 */
bool isPrime(int num) {
	if (num == 2 || num == 3) // Determine if it is divisible by 2 or 3
		return true;
	if (num % 6 != 1 && num % 6 != 5) // divisible by 2,3 except for remainders 1 and 5
		return false;
	for (int i = 5; i*i < num; i += 6) // Determine if it is divisible by a number with remainder 1 or 5
		if (num % i == 0 || num % (i + 2) == 0)
			return false;
	return true;
}

/* Next prime function: get the next prime number of the input value
 * Return value: int type: next prime number
 * Parameters: num: the first prime number for comparison
 */
int NextPrime(int num) {
	int n = num + 1;
	while (!isPrime(n)) // Check if the next digit is prime
	{
		n++;
	}
	return n;
}

The appeal code is used to deal with the problem of prime numbers. It provides methods for selecting prime numbers and judging prime numbers. You can take a look at the method for judging prime numbers. I think it is more efficient. Then in addition to this, we also need to initialize the structure array TheCells, the code is as follows:
/* Build function: build a TheCells to store data
 * Return value: none
 * Parameters: tablesize: The size of the constructed TheCells
 */
void HashMap::Init(int tablesize) {
	if (tablesize < MinTableSize) // Determine whether the input size is legal
	{
		cout << "Hash table size is too small!" << endl;
		return; // return directly
	}

	TableSize = NextPrime(tablesize); // size is the first prime larger than the input
	TheCells = new HashEntry[TableSize]; // apply for dynamic space
	if (TheCells == NULL) // Check if the application is successful
		cout << "Failed to request memory!" << endl;

	for (int i = 0; i < TableSize; i++) // initialize storage cell state
		TheCells[i].Info = Empty;
}

        When we have applied for the TheCells array, we can start our search and insertion operations, but there is still a problem to be solved. The TableSize of prime numbers solves a large part of the key value coincidence problem for us, but this does not mean that there will be no coincidence. For example, when we insert 3 and 14 into a Hash table with a TableSize of 11, their key values ​​are both 3, but one node obviously cannot store two elements at the same time, so what should we do? There is a simple idea that when TheCells[key] is already occupied, we choose to store it in TheCells[key + 1] until an empty node is found, and then store this element into it; but this is obviously a very inconvenient. Appropriate method, because when 3, 4, 5, 6 have been stored in the Hash table, it is obviously inefficient to find the position of key = 7 in the storage 14, so we use another method, Let a function H(x) mean that x gets the key value, H(xi - 1) = H(x0) + i^2 , where i is the number of times to find the new key value; in the previous case, when we inserted 14 It will first search for the position of key = 3, then search for key = 4, and finally search for the position of key = 7 and insert the element, so that we can solve the problem just now.
        Then the same is true for the search operation. If the element in the corresponding key value is different from the element we are looking for, the next key value is searched until the element we want or an unoccupied node is found. The code is as follows:
/* Find function: find the corresponding element and return its position; if the element does not exist, return the position where it can be inserted (square detection method)
 * Return value: none
 * Parameters: key: the element you want to find
 */
Position HashMap::Find(int key) {
	Position CurPos; // store the element position
	int CollisionNum = 0; // store F(i)

	CurPos = Hash(key); // Get the initial position of the corresponding element
	while(TheCells[CurPos].Info == Legitimate && TheCells[CurPos].Element != key) // Determine if the element is found, or if it does not exist
	{
		CurPos += 2 * ++CollisionNum - 1; // find next position: H(i) = H(0) + F(i); F(i) = i^2
		if (CurPos >= TableSize)
			CurPos -= TableSize;
	}

	return CurPos; // return the corresponding position
}

        The insertion operation will become very simple, we directly find the position where the corresponding element should be, and then insert and change the node state:
/* Insert function: insert the corresponding element (square detection method)
 * Return value: none
 * Parameters: key: the element you want to insert
 */
void HashMap::Insert(int key) {
	Position Pos; // store the insertion position

	Pos = Find(key); // Get the insertion position
	if (TheCells[Pos].Info != Legitimate) // Determine if the key already exists
	{
		TheCells[Pos].Info = Legitimate; // Change storage cell state
		TheCells[Pos].Element = key;
		CurSize++; // increase the used cell size
	}
	if (CurSize > TableSize / 2) // Determine the re-hash condition
		ReHash();
}


Delete operation:

        The deletion operation will also become very simple. We only need to find the node where the corresponding element is located, and change the state of the node to Deleted, that is, to be deleted. There is no need to operate the value of the stored element, which is Our lazy delete will save our programming a lot of operations:
/* delete function: delete the corresponding element
 * Return value: none
 *Parameter: key: the element you want to delete
 */
void HashMap::Remove(int key) {
	Position Pos; // store the corresponding element position
	
	Pos = Find(key); // find for element position
	if (TheCells[Pos].Info == Legitimate) // Determine if the element exists
	{
		TheCells[Pos].Info = Deleted;
		CurSize--; // reduce used cell size
	}
}


Rehashing:

        Perhaps careful friends have seen that there is a function called eHash() at the end of the insertion operation, which is an important function of our Hash table, called re-hashing, which can make half of the key values ​​in our Hash table in use. After that, the storage location of the Hash table can be extended. We do this because there are already a large number of actual surfaces. When the used space in the Hash table exceeds half, the efficiency of its insertion and search will become very low, so we re-apply for a new one by expanding its space. The new TheCells and delete the old, while inserting the original elements into the new TheCells. Of course, the TableSize of the new table is twice the original.
        Here is the implementation code:
/* Re-hash function: re-hash the HashMap
 * Return value: none
 * Parameters: none
 */
void HashMap::ReHash() {
	int oldSize = TableSize; // store the old HashMap size
	HashEntry *oldCells = TheCells; // store old TheCells

	Init(2 * TableSize); // Build new TheCells
	CurSize = 0;
	for (int i = 0; i < oldSize; i++) // reinsert old table elements
		if (oldCells[i].Info == Legitimate)
			Insert(oldCells[i].Element);

	delete oldCells; // delete old TheCells
	oldCells = NULL;
}


Then the review of the hash table (Hash table) ends here, welcome everyone to discuss~~

Reference: "Data Structure and Algorithm Analysis - C Language Description"

~~Reprint please indicate the source

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324773490&siteId=291194637