But wait! There is more to object equality. Even though overriding
== worked for simple equality comparisons, there are some cases where that isn't just enough.
In the following example, we build an array of duplicate
Item objects and apply
uniq on it. See what happens to the
Array#uniq to return only one element since the rest were duplicates; but it returned everything. Clearly, #uniq did not work. We did override the
== method to return
true if the items are identical and we verified that it works by comparing an Item to its clone. So, what went wrong?
The short answer is that we failed to implement two other methods that are crucial to get object equality correct: the
hash methods. Why do we need these two over and above the simple
There are a lot of operations in Ruby that need to check the equality of two objects. While
== serves the purpose well, it is not really fast. For operations that might involve large number of equality checks (like
Array#uniq and Hash lookups), the speed disadvantage adds up and becomes an overhead. To get around this, Ruby provides a
hash method with every object. It returns a numeric value which is usually unique to every object.
In the following example, we print the hash values for different objects. Take a look:
Do not confuse the method
hash, which returns a hash code, with the data structure Hash. A hash code of an object is usually a short (and in Ruby, always numeric) identifier of an object. Hash is a data structure that uses the hash code of objects for fast key lookup and thus derives the name.
So instead of comparing two objects using
==, which could be expensive when the objects are large, Ruby uses the
hash of the object when possible. Being a simple numeric value, this comparison is almost always faster than comparing the various instance variables of the underlying object.
Array#uniq method, as you might have guessed, uses the result of
hash to compare objects and identify duplicates. Let us see how this works out in practice:
Array#uniq now works correctly for the
item object. This is because we implemented two methods:
What is the
hash method doing? The
^ operator used is the binary XOR. The
hash method returns the result of XORing all the instance variables that determine the state of the object. This ensures that whenever the state of the object changes, the hash code as well changes. Distinct hash codes for distinct objects is an extremely desirable property of hash codes through which operations on collections become faster.
We also introduced the
eql? method in the above example. In fact it was called by
Array#uniq twice to check the equality of the elements of the array. Even though we use
== to check for equality of objects, routines like
Array#uniq uses the
eql? instead. This means that we must implement the
eql? method as well whenever we override
==. In most cases, these two methods will be identical, so you can implement the actual comparison in one method and have the other method just call it.
To summarize, if you ever override any of the
eql? or the
hash method, you must override the others as well.