Should a HashSet be allowed to be added to itself in Java?

davidmerrick :

According to the contract for a Set in Java, "it is not permissible for a set to contain itself as an element" (source). However, this is possible in the case of a HashSet of Objects, as demonstrated here:

Set<Object> mySet = new HashSet<>();
mySet.add(mySet);
assertThat(mySet.size(), equalTo(1));

This assertion passes, but I would expect the behavior to be to either have the resulting set be 0 or to throw an Exception. I realize the underlying implementation of a HashSet is a HashMap, but it seems like there should be an equality check before adding an element to avoid violating that contract, no?

Marco13 :

Others have already pointed out why it is questionable from a mathematical point of view, by referring to Russell's paradox.

This does not answer your question on a technical level, though.

So let's dissect this:

First, once more the relevant part from the JavaDoc of the Set interface:

Note: Great care must be exercised if mutable objects are used as set elements. The behavior of a set is not specified if the value of an object is changed in a manner that affects equals comparisons while the object is an element in the set. A special case of this prohibition is that it is not permissible for a set to contain itself as an element.

Interestingly, the JavaDoc of the List interface makes a similar, although somewhat weaker, and at the same time more technical statement:

While it is permissible for lists to contain themselves as elements, extreme caution is advised: the equals and hashCode methods are no longer well defined on such a list.

And finally, the crux is in the JavaDoc of the Collection interface, which is the common ancestor of both the Set and the List interface:

Some collection operations which perform recursive traversal of the collection may fail with an exception for self-referential instances where the collection directly or indirectly contains itself. This includes the clone(), equals(), hashCode() and toString() methods. Implementations may optionally handle the self-referential scenario, however most current implementations do not do so.

(Emphasis by me)

The bold part is a hint at why the approach that you proposed in your question would not be sufficient:

it seems like there should be an equality check before adding an element to avoid violating that contract, no?

This would not help you here. The key point is that you'll always run into problems when the collection will directly or indirectly contain itself. Imagine this scenario:

Set<Object> setA = new HashSet<Object>();
Set<Object> setB = new HashSet<Object>();
setA.add(setB);
setB.add(setA);

Obviously, neither of the sets contains itself directly. But each of them contains the other - and therefore, itself indirectly. This could not be avoided by a simple referential equality check (using == in the add method).


Avoiding such an "inconsistent state" is basically impossible in practice. Of course it is possible in theory, using referential Reachability computations. In fact, the Garbage Collector basically has to do exactly that!

But it becomes impossible in practice when custom classes are involved. Imagine a class like this:

class Container {

    Set<Object> set;

    @Override 
    int hashCode() {
        return set.hashCode(); 
    }
}

And messing around with this and its set:

Set<Object> set = new HashSet<Object>();
Container container = new Container();
container.set = set;
set.add(container);

The add method of the Set basically has no way of detecting whether the object that is added there has some (indirect) reference to the set itself.

Long story short:

You cannot prevent the programmer from messing things up.

Guess you like

Origin http://10.200.1.11:23101/article/api/json?id=423639&siteId=1