Data structure learning - On the hash table open and closed hash hash

EDITORIAL

       Balanced tree structure and order, there is no corresponding relationship between the key elements of its storage position, when searching an element must be many times more critical code. Sequential search time complexity is O (N), the balance of the tree height of the tree, i.e., O (), depending on the efficiency of the search process of comparing the number of elements in the search.
       Ideal Search: You can not pass any comparison, first get the element you want to search directly from the table. If you construct a storage structure, through some kind of function (hashFunc) the storage position of the element between its key relationship can be established with one mapping, you can find the element by the function quickly when looking up.

When this structure:
Insert element
according to a key element to be inserted, this function calculates the position of the element storage location for storage and Click to
search for elements
of the key elements of the same calculation, the obtained function value as the element storage location, Click to take a position in the structure element comparison, if the keys are equal, the search is successful
         in this way is the hash (hash) method, a conversion function used in the process called hash hash (hash column) function, constructed structure called a hash table (hash table) (or called a hash table)

Hash collision

For two keywords and data elements (i = j!), There is =, but:! Hash (i) == Hash (j), namely: Different key calculated by the same number of the same Ha Ha Xiha Greece address the phenomenon known as hash or hash collision conflict. The elements having different key data having the same hash address as "synonymous." How did it happen hash collision to handle it?

Hash conflict resolution

Resolve two kinds of hash collisions common method is: closed and open hash hash

Achieve closed hashing

Closed Hash: Also called open addressing method, when a hash collision occurs, if the hash table is not filled, indicating that there are inevitable in the hash table empty position, it can be stored in the key position in the conflict "under an "empty position to go. Then how to find the next empty position it?

Class member variables :

private:
	vector<elem> _ht;
	int _size;
/////////////////////////////////////////

enum  state{
	empty,
	exist,
	deleted

};

typedef struct elem{

		pair<K, V> val;
		state sta;

	}elem;

//1.闭散列
#include<map>
#include<vector>
#include<utility>
#include<iostream>
using namespace std;



const int PRIMECOUNT = 28;
const size_t primeList[PRIMECOUNT] =
{
	53ul, 97ul, 193ul, 389ul, 769ul,
	1543ul, 3079ul, 6151ul, 12289ul, 24593ul,
	49157ul, 98317ul, 196613ul, 393241ul, 786433ul,
	1572869ul, 3145739ul, 6291469ul, 12582917ul, 25165843ul,
	50331653ul, 100663319ul, 201326611ul, 402653189ul, 805306457ul,
	1610612741ul, 3221225473ul, 4294967291ul
};

enum  state{
	empty,
	exist,
	deleted

};


template<class K,class V>

class hash_1{

typedef struct elem{

		pair<K, V> val;
		state sta;

	}elem;


public:

	hash_1(int n = 3)
		:_size(0)
		, _ht(n)
	{
		for (int i = 0; i < _ht.capacity(); ++i){
		
			_ht[i].sta = empty;
		}
	
	}
	
	bool insert(const K& key){
	
		check_capacity();

		size_t _hashaddr = hash_func(key);
		pair<K, V>_val = { key, _hashaddr };     //pair类型的变量初始化
		size_t _start = _hashaddr;
		while (_ht[_hashaddr].sta == exist)
		{
			if (_ht[_hashaddr].sta == exist && _ht[_hashaddr].val.first == key)
				return false;

			_hashaddr++;
			if (_hashaddr == _ht.capacity())
				_hashaddr == 0;
		
			if (_hashaddr == _start)
				return false;
		    	
		
		}
	

	//如果为空,可以直接插入
		_ht[_hashaddr].val = _val;
		_ht[_hashaddr].sta = exist;
		++_size;
		return true;
	}

	void check_capacity(){
	 
		//增容的条件是: α>=0.7
		if (_size*10/_ht.capacity()>=7){
			hash_1<K, V>newht(getnextprime(_ht.capacity()));
             
			for (int i = 0; i < _ht.capacity(); ++i){
				if (_ht[i].sta == exist){
				
					newht.insert(_ht[i].val.first);
				}
				
			
			}

			Swap(newht);
		
		}
	
	
	}

	int getnextprime(size_t n){
	   
		for (int i = 0; i < PRIMECOUNT; ++i){
			if (primeList[i]>n)
				return primeList[i];
		}
		
		return 0;
	
	}

	



	int find(const K& key){
	//如果找到了,返回下标;没有找到就打印“不存在”;
		int _hashaddr = hash_func(key);
		size_t start = _hashaddr;
	
		while ( _ht[_hashaddr].sta != empty ){
		
			if (_ht[_hashaddr].sta == exist && _ht[_hashaddr].val.first == key)
				return _hashaddr;
		
			++_hashaddr;

			if (_hashaddr == _ht.capacity()){
				_hashaddr = 0;
			}
			if (_hashaddr == start)
			{
				cout << "不存在" << endl;
				return -1;
			}
		
		
		}
		cout << "不存在" << endl;
		return -1;
	}

	void erase(const K& key){
		
		int index = find(key);
		if (index != -1){

			_ht[index].sta = deleted;
			++_size;

		}
        
		return;
	
	}
	void Swap(hash_1<K, V>& ht){
		swap(_ht, ht._ht);
		swap(_size, ht._size);
		

	}


private:	
 size_t hash_func(const K& key){

		return key % _ht.capacity();
	}


private:
	vector<elem> _ht;
	int _size;
	
};

///////////////////////////////////////// gorgeous split /////// ////////////////////////////////////////////////// ///

Achieve open hashing

Open hashing method is also called address chain (open-chain method), calculated using the first set of address hash function hash key, the key having the same address attributed to the same subset, each subset is called a barrel, each tub element by a single linked list, the first node of each linked list is stored in the hash table.
Here Insert Picture Description
Class member variables :

private:
		vector<Node *> _ht;                    
		size_t _size;        
///////////////////////////////////////////////

template<class v>

struct node{

	node(const v& data)
	:_val(data),
	_pnext(nullptr)
	
	{}

	int _val;
	node<v>* _pnext;

};

    typedef node<V> Node;
	typedef node* PNode;


//哈希---开散列--哈希桶

//开散列的实现

template<class v>

struct node{

	node(const v& data)
	:_val(data),
	_pnext(nullptr)
	
	{}

	int _val;
	node<v>* _pnext;

};



template<class V, class HF = DefHashF<T> >
class HashBucket
{
	typedef node<V> Node;
	typedef node* PNode;

public:
	//构造函数
	HashBucket(size_t capacity = 3)
		: _size(0)
	{
		_ht.resize(GetNextPrime(capacity), nullptr);
	}

	//哈希桶中的元素插入----- 哈希桶中的元素不能重复
	PNode* Insert(const V& data)
	{
		// 确认是否需要扩容。。。
		// _CheckCapacity();

		// 1. 计算元素所在的桶号
		size_t bucketNo = HashFunc(data);

		// 2. 检测该元素是否在桶中
		PNode pCur = _ht[bucketNo];
		while (pCur)
		{
			if (pCur->_data == data)
				return pCur;

			pCur = pCur->_pNext;
		}

		// 3. 插入新元素
		pCur = new Node(data);

		// 采用头插法插入,效率高
		pCur->_pNext = _ht[bucketNo];
		_ht[bucketNo] = pCur;
		_size++;

		return pCur
	}


	// 删除哈希桶中为data的元素(data不会重复),返回删除元素的下一个节点
	PNode* Erase(const V& data)
	{
		size_t bucketNo = HashFunc(data);
		PNode pCur = _ht[bucketNo];
		PNode pPrev = nullptr;
		//PNode pRet = nullptr;

		while (pCur!=nullptr && pCur->_val!=data)
		{
			
			pPrev = pCur;
			pCur = pCur->_pnext;
			
		}

		//要删除的节点不存在
		if (pCur == nullptr)
			return nullptr;
		//如果是头删
		if (pPrev == nullptr){	
		_ht[bucketNo] = pCur->_pnext;
		return _ht[bucketNo];
		
		}

		//数据存在,并且非首元素 
		pPrev->_pnext = pCur->_pnext;
		return pPrev->_pnext;
	
	}

	// 查找data是否在哈希桶中
	PNode Find(const V& data)
	{
		size_t bucketNo = HashFunc(data);
		PNode pCur = _ht[bucketNo];

		while (pCur)
		{
			if (pCur->_data == data)
				return pCur;

			pCur = pCur->_pNext;
		}

		return nullptr;
	}

	size_t Size()const
	{
		return _size;
	}

	bool Empty()const
	{
		return 0 == _size;
	}

	void Clear()
	{
		for (size_t bucketNo = 0; bucketNo < _ht.capacity(); ++bucketNo)
		{
			PNode pCur = _ht[bucketNo];
			while (pCur)
			{
				PNode pNext = pCur->_pNext;
				delete pCur;
				pCur = pNext;
			}
		}

		_size = 0;
	}

	bool BucketCount()const
	{
		return _ht.capacity();
	}


	
	void Swap(HashBucket<V, HF>& ht)
	{
		swap(_ht, ht._ht);
		swap(_size, ht._size);
	}

	~HashBucket()
	{
		Clear();
	}


	/*桶的个数是一定的,随着元素的不断插入,每个桶中元素的个数不断增多,极端情况下,可能会导致一
		个桶中链表节点非常多,会影响的哈希表的性能,因此在一定条件下需要对哈希表进行增容,那该条件
		怎么确认呢?开散列最好的情况是:每个哈希桶中刚好挂一个节点,再继续插入元素时,每一次都会发
		生哈希冲突,因此,在元素个数刚好等于桶的个数时,可以给哈希表增容。*/

	void _CheckCapacity(){

		size_t bucketCount = BucketCount();

		if (_size == bucketCount){

			//增容
			HashBucket<V, HF> newHt(getnextprime(_ht.capacity()));
			for (int i = 0; i < newHt._ht.capacity; ++i)
				newHt._ht[i] = nullptr;


			for (int j = 0; j < _ht.capacity(); ++j){

				PNode cur = _ht[j];
				int hashNo = -1;
				while (cur){

					//取旧哈希桶i号桶的第一个节点
					_ht[j] = cur->_pnext;		//头删
					//计算当前节点在新空间的桶号
					hashNo = newHt.HashFunc(cur->_val);	//计算在那个哈希桶中
					//头插法将该节点插入新空间
					cur->_pnext = newHt[hashNo];//连接新桶中的内容
					newHt[hashNo] = cur;//将给定的第一个节点放入新空间
					//取旧哈希桶i号桶的第next个节点
					cur = ht[j]//cur取原链表中的下一个节点
				}
			}

		}
		newHt._size = _size;
		this->Swap(newHt);
	}

	


	private:
		size_t HashFunc(const V& data)
		{
			return HF()(data) % _ht.BucketCount();
		}

	private:
		vector<Node *> _ht;                    
		size_t _size;                             // 哈希表中有效元素的个数

	};

The ratio of open hash hash comparison :
Application Chain address method handle overflow, the need for additional link pointer, appears to increase the storage overhead. In fact: because the opening method must address a large amount of free space in order to ensure the efficiency of the search, the secondary probe method as in claim load factor of a <= 0.7, and the pointer entry off than the space occupied by much larger, so the method using a chain instead of the address save storage space than the opening address method.

Guess you like

Origin blog.csdn.net/tonglin12138/article/details/92777272