Correct implementation for property of all objects that are equal

scristalli :

The problem

Consider an implementation of a graph, SampleGraph<N>. Consider an implementation of the graph nodes, Node extends N, correctly overriding hashCode and equals to mirror logical equality between two nodes.

Now, let's say we want to add some property p to a node. Such a property is bound to logical instances of a node, i.e. for Node n1, n2, n1.equals(n2) implies p(n1) = p(n2)

If I simply add the property as a field of the Node class, this has happened to me:

  • I define Node n1, n2 such that n1.equals(n2) but n1 != n2
  • I add n1 and n2 to a graph: n1 when inserting the logical node, and n2 when referencing to the node during insertion of edges. The graph stores both instances.
  • Later, I retrieve the node from the graph (n1 is returned) and set the property p on it to some value. Later, I traverse all the edges of the graph, and retrieve the node from one of them (n2 is returned). The property p is not set, causing a logical error in my model.

To summarize, current behavior:

graph.addNode(n1) // n1 is added
graph.addEdge(n2,nOther) // graph stores n2
graph.queryForNode({query}) // n1 is returned
graph.queryForEdge({query}).sourceNode() // n2 is returned

The question

All the following statements seem reasonable to me. None of them fully convinces me over the others, so I'm looking for best practice guidelines based on software engineering canons.

S1 - The graph implementation is poor. Upon adding a node, the graph should always internally check if it has an instance of the same node (equals evaluates to true) memorized. If so, such instance should always be the only reference used by the graph.

graph.addNode(n1) // n1 is added
graph.addEdge(n2,nOther) // graph internally checks that n2.equals(n1), doesn't store n2
graph.queryForNode({query}) // n1 is returned
graph.queryForEdge({query}).sourceNode() // n1 is returned

S2 - Assuming the graph behaves as in S1 is a mistake. The programmer should take care that always the same instance of a node is passed to the graph.

graph.addNode(n1) // n1 is added
graph.addEdge(n1,nOther) // the programmer uses n1 every time he refers to the node
graph.queryForNode({query}) // n1 is returned
graph.queryForEdge({query}).sourceNode() // n1 is returned

S3 - The property is not implemented the right way. It should be an information which is external to the class Node. A collection, such as a HashMap<N, Property>, would work just fine, treating different instances as the same object based on hashCode.

HashMap<N, Property> properties;

graph.addNode(n1) // n1 is added
graph.addEdge(n2,nOther) // graph stores n2
graph.queryForNode({query}) // n1 is returned
graph.queryForEdge({query}).sourceNode() // n2 is returned

// get the property. Difference in instances does not matter
properties.get(n1)
properties.get(n2) //same property is returned

S4 - Same as S3, but we could hide the implementation inside Node, this way:

class Node {
  private static HashMap<N, Property> properties;

  public Property getProperty() {
    return properties.get(this);
  }
}

Edit: added code snippets for current behavior and tentative solutions following Stephen C's answer. To clarify, the whole example comes from using a real graph data structure from an open source Java project.

Stephen C :

For my mind, it comes down choosing between APIs with strong or weak abstraction.

  • If you choose strong abstraction, the API would hide the fact that Node objects have identity, and would canonicalize them when they are added to the SimpleGraph.

  • If you choose weak abstraction, the API would assume that Node objects have identity, and it would be up to the caller to canonicalize them before adding them to the SimpleGraph.

The two approaches lead to different API contracts and require different implementation strategies. The choice is likely to have performance implications ... if that is significant.

Then there are finer details of the API design that may or may not match your specific use-case for the graphs.

The point is that you need to make the choice.

(This a bit is like deciding to use the collections List interface and its clean model, versus implementing your own linked list data structure so that you can efficiently "splice" 2 lists together. Either approach could be correct, depending on the requirements of your application.)

Note that you usually can make a choice, though the choice may be a difficult one. For example, if you are using an API designed by someone else:

  • You can choose to use it as-is. (Suck it up!)
  • You can choose to try to influence the design. (Good luck!)
  • You can choose to switch to a different API; i.e. a different vendor.
  • You can choose to fork the API and adjust it to your own requirements (or preferences if this is what this is about)
  • You can choose to design and implement your own API from scratch.

And if you really don't have a choice, then this question is moot. Just use the API.


If this is a open-source API then you probably don't have the choice of getting the designers to change it. Significant API overhauls have a tendency of creating a lot of work for other people; i.e. the many other projects that depend on the API. A responsible API designer / design team takes this into account. Or else they find that they lose relevance because their APIs get a reputation for being unstable.

So ... if you are aiming to influence an existing open-source API design ... 'cos you think they are doing it incorrectly (for some definition of incorrect) ... you are probably better off "forking" the API and dealing with the consequences.


And finally, if you are looking for "best practice" advice, be aware that there are no best practices. And this is not just a philosophical issue. This is about why you will get screwed if you go asking for / looking for "best practice" advice, and then follow it.


As a footnote: have you ever wondered why the Java and Android standard class libraries don't offer any general-purpose graph APIs or implementations? And why they took such a long time to appear in 3rd party libraries (Guava version 20.0)?

The answer is that there is no consensus on what such an API should be like. There are just too many conflicting use-cases and requirement sets.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=96897&siteId=1