Why do HashSets not have a stable serialization?

tamperingbeluga :

Take a HashSet in Java. Put a string in it. Serialize it. You end up with some bytes - bytesA.

Take bytesA, deserialize it back as an Object - fromBytes.

Now reserialize fromBytes and you've got yourself another array of bytes - bytesB.

Strangely enough, these two byte arrays are not equal. One byte is different! Why? Interestingly, this does not affect TreeSet or HashMap. It does however affect LinkedHashSet.

Set<String> stringSet = new HashSet<>();
stringSet.add("aaaaaaaaaa");

//Serialize it
byte[] bytesA;
try (ByteArrayOutputStream bos = new ByteArrayOutputStream()) {
  ObjectOutputStream out = new ObjectOutputStream(bos);
  out.writeObject(stringSet);
  out.flush();
  bytesA = bos.toByteArray();
}

// Deserialize it
Object fromBytes;
try (ByteArrayInputStream is = new ByteArrayInputStream(bytesA)) {
  try(ObjectInputStream ois = new ObjectInputStream(is)) {
    fromBytes = ois.readObject();
  }
}

//Serialize it.
byte[] bytesB;
try (ByteArrayOutputStream bos = new ByteArrayOutputStream()) {
  ObjectOutputStream out = new ObjectOutputStream(bos);
  out.writeObject(fromBytes);
  out.flush();
  bytesB = bos.toByteArray();
}

assert Arrays.equals(bytesA, bytesB); 

//array contents differ at index [43], expected: <16> but was: <2>

In case these help: xxd hex dump of bytesA

00000000: aced 0005 7372 0011 6a61 7661 2e75 7469  ....sr..java.uti
00000010: 6c2e 4861 7368 5365 74ba 4485 9596 b8b7  l.HashSet.D.....
00000020: 3403 0000 7870 770c 0000 0010 3f40 0000  4...xpw.....?@..
00000030: 0000 0001 7400 0a61 6161 6161 6161 6161  ....t..aaaaaaaaa
00000040: 6178                                     ax

xxd hex dump of bytesB

00000000: aced 0005 7372 0011 6a61 7661 2e75 7469  ....sr..java.uti
00000010: 6c2e 4861 7368 5365 74ba 4485 9596 b8b7  l.HashSet.D.....
00000020: 3403 0000 7870 770c 0000 0002 3f40 0000  4...xpw.....?@..
00000030: 0000 0001 7400 0a61 6161 6161 6161 6161  ....t..aaaaaaaaa
00000040: 6178                                     ax

3rd line 6th column is the difference.

I'm on Java 11.0.3.


(RESOLVED)

As per Alex R's response - what happens is that HashSet's writeObject stores the capacity, loadFactor, and size of the backing HashMap, but its readObject recalculates the capacity as:

capacity = (int)Math.min((float)size * Math.min(1.0F / loadFactor, 4.0F), 1.07374182E9F);

Other than a sanity check, it actually ignores the capacity value that was originally stored!

Alex R :

If you create a HashSet using the constructor it creates a HashMap with a default size of 16.

If you deserialize it, the size might be initialized to be less than 16 if your set contains less entries. This is what happens in this case.

Take a look at the readObject implementation of HashSet to see how the size is calculated.

Printing the two byte arrays gives you a hint that this has happened indeed:

[..., 16, ...]
[..., 2,...]

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=321944&siteId=1