数据结构学习笔记（五）散列、分离链接法

散列为我们一般性的以一种用于常数时间执行插入，删除和查找表的技术，但是那些需要排序信息的将不会得到有效的支持，因此如findMax，findMin等都是散列操作都是散列所不支持的。

散列的核心就是一个可靠的散列函数。

1.假如输入的关键字是整数：

一般性的合理的方法就是直接返回键值对表的大小的长度的取余运算的结果，为了减少由上述过程带来的一些不必要的麻烦，一般选择将表的大小设置为一个素数，当使用随机整数时，散列函数不仅计算起来简单而且关键字分配也比较均匀。

2.关键字为字符串：

可以将字符串中的ASCII码或者是Unicode码加起来，如下代码：

public static int hash(String key,int tableSize){
    int hashVal = 0;
    for(int i=0;i<key.length();i++){
        hashVal += key.charAt(i);
    }
    return hashVal % tableSize;
}

不过上述的方法受到表的大小的限制，表的如果很大将不会很好的分配关键字，假如表的大小为10007，而ASCII码最多是127，则散列值在0-1016之间，显然是一种不均匀的分配方式。

理解了散列函数之后，剩下的问题就是哈希冲突的问题的解决，通常有两大类方法，开放定址法和分离链接法。

1.分离链接法：

/**
 * Hash table execise to learn
 * Separate chaining table implementation of hash tables.
 * Note that all "matching" is based on the equals method.
 * @author ************
 *
 */
public class SeparateChainingHashTableExce <AnyType>{
	
	private static final int DEFAULT_TABLE_SIZE = 101;
	private List<AnyType> [] theLists;
	private int currentSize;
	/**
	 * Constructor the hash table
	 */
	public SeparateChainingHashTableExce(){
		this(DEFAULT_TABLE_SIZE);
	}
	
	/**
	 * Construct the hash table
	 * @param size approximate table size
	 */
	public SeparateChainingHashTableExce(int size){
		/**
		 * The arrayLists
		 */
		theLists = new LinkedList[nextPrime(size)];
		for(int i = 0; i < theLists.length;i++){
			theLists[i] = new LinkedList<>();
		}
	}
	
	/**
	 * Insert into the hash table if the item is already
	 * present ,then do nothing
	 * @param x the item to insert
	 */
	public void insert(AnyType x){
		List<AnyType> whichList = theLists[myhash(x)];
		if(!whichList.contains(x)){
			whichList.add(x);
			
			//if the table is full ,then rehash the table 
			if(++ currentSize > theLists.length){
				rehash();
			}
		}
	}
	
	/**
	 *Remove from the hash table
	 * @param x the item to remove 
	 */
	public void remove(AnyType x){
		List<AnyType> whichList = theLists[myhash(x)];
		if(whichList.contains(x)){
			whichList.remove(x);
					currentSize --;
		}
	}
	
	/**
	 * Find an item in the hash table
	 * @param x the item to search for
	 * @return true is x is not found
	 */
	public boolean contains(AnyType x ){
		List<AnyType> whichList = theLists[myhash(x)];
		return whichList.contains(x);
	}

	/**
	 * Make the hash table logically empty
	 */
	public void makeEmpty(){
		for (int i =0 ; i < theLists.length; i++){
			theLists[i].clear();
		}
		currentSize = 0;
	}
	
	/**
	 * A hash routine for String objects
	 * @param key the String to hash
	 * @param tableSize the size of the hash table
	 * @return the hash value
	 */
	public static int hash(String key, int tableSize){
		int hashVal = 0;
		for (int i =0; i<key.length(); i++){
			hashVal = 37 * hashVal + key.charAt(i);
		}
		hashVal %= tableSize;
		if(hashVal < 0){
			hashVal += tableSize;
		}
		return hashVal;
	}
	
	private void rehash(){
		List<AnyType> [] oldLists = theLists;
		
		//Create new double-sized ,empty table
		theLists = new List[nextPrime(2 * theLists.length)];
		for(int j = 0; j< theLists.length; j++){
			theLists[j] = new LinkedList<>();
		}
		
		//Copy the table
		currentSize = 0;
		for (List<AnyType> list : oldLists) {
			for(AnyType item : list){
				insert(item);
			}
			
		}
	}
	
	/**
	 * to caculate the hash value to find the number of septareChain
	 * @param x the element input
	 * @return the hash value response to the element 
	 */
	private int myhash(AnyType x){
		int hashVal = theLists.length;
		hashVal %= theLists.length;
		if(hashVal < 0){
			hashVal += theLists.length;
		}
		return hashVal;
	}
	
	/**
	 * Internal method to find a prime number at least as large as n
	 * @param n the starting number(must be positive)
	 * @return a prime number larger than or equal to n
	 */
	private static int nextPrime(int n ){
		if(n % 2 == 0){
			n++;
		}
		for(;!isPrime(n);n+=2)
			;
		return n;
	}
	
	/**
	 * Internal method to test if a number is prime
	 * Not an afficent algorithm
	 * @param n the number to test 
	 * @return the tesult of the test
	 */
	private static boolean isPrime(int n ){
		if(n == 2 || n == 3){
			return true;
		}
		if(n ==1 || n % 2 == 0){
			return false;
		}
		
		for (int i = 3; i * i <= n; i+=2){
			if(n % i == 0)
				return false;
		}
		return true;
	}
}

其中利用了一个重要的性质，就是当装填因子小于0.5时，插入的时候将会极少出现失败的情况，所以我们设定的装填因子一个阈值，但能够达到时，就扩充表，然后再进行重新散列，重新散列到新的扩充的表中。当然这里是比较的当前表的大小和当前使用的大小，表满了之后就重新hash。

	public void insert(AnyType x){
		List<AnyType> whichList = theLists[myhash(x)];
		if(!whichList.contains(x)){
			whichList.add(x);
			
			//if the table is full ,then rehash the table 
			if(++ currentSize > theLists.length){
				rehash();
			}
		}
	}

重新hash：

	private void rehash(){
		List<AnyType> [] oldLists = theLists;
		
		//Create new double-sized ,empty table
		theLists = new List[nextPrime(2 * theLists.length)];
		for(int j = 0; j< theLists.length; j++){
			theLists[j] = new LinkedList<>();
		}
		
		//Copy the table
		currentSize = 0;
		for (List<AnyType> list : oldLists) {
			for(AnyType item : list){
				insert(item);
			}
			
		}
	}

数据结构学习笔记（五）散列、分离链接法

猜你喜欢