Let's assume we have some code
class WrongHashCode{
public int code=0;
@Override
public int hashCode(){
return code;
}
}
public class Rehashing {
public static void main(String[] args) {
//Initial capacity is 2 and load factor 75%
HashMap<WrongHashCode,String> hashMap=new HashMap<>(2,0.75f);
WrongHashCode wrongHashCode=new WrongHashCode();
//put object to be lost
hashMap.put(wrongHashCode,"Test1");
//Change hashcode of same Key object
wrongHashCode.code++;
//Resizing hashMap involved 'cause load factor barrier
hashMap.put(wrongHashCode,"Test2");
//Always 2
System.out.println("Keys count " + hashMap.keySet().size());
}
}
So, my question is why after resizing hashMap (that, as far, as I understand involves rehashing keys), we still have 2 keys in keySet instead of 1 (since key object is same for both existing KV pairs) ?
So, my question is why after resizing hashMap (that, as far, as I understand involves rehashing keys)
It actually does not involve rehashing keys – at least not in the HashMap
code except in certain circumstances (see below). It involves repositioning them in the map buckets. Inside of HashMap
is a Entry
class which has the following fields:
final K key;
V value;
Entry<K,V> next;
int hash;
The hash
field is the stored hashcode for the key that is calculated when the put(...)
call is made. This means that if you change the hashcode in your object it will not affect the entry in the HashMap unless you re-put it into the map. Of course if you change the hashcode for a key you won't be even able to find it in the HashMap
because it has a different hashcode as the stored hash entry.
we still have 2 keys in keySet instead of 1 (since key object is same for both existing KV pairs) ?
So even though you've changed the hash for the single object, it is in the map with 2 entries with different hash fields in it.
All that said, there is code inside of HashMap
which may rehash the keys when a HashMap is resized – see the package protected HashMap.transfer(...)
method in jdk 7 (at least). This is why the hash
field above is not final
. It is only used however when initHashSeedAsNeeded(...)
returns true to use "alternative hashing". The following sets the threshold of number of entries where the alt-hashing is enabled:
-Djdk.map.althashing.threshold=1
With this set on the VM, I'm actually able to get the hashcode()
to be called again when the resizing happens but I'm not able to get the 2nd put(...)
to be seen as an overwrite. Part of the problem is that the HashMap.hash(...)
method is doing an XOR with the internal hashseed
which is changed when the resizing happens, but after the put(...)
records the new hash code for the incoming entry.