C#: Using IEquatable<>
In C# IEquatable<>
is used to implement comparison of custom classes.
Different solutions exist for the proper implementation of the required
interface in regard to surrounding requirements. Especially the requirement
for the hashing function is not immediately obvious in the context of
comparison.
It took me some time to figure out the best approach. So, here it is.
Contents
Purpose of IEquatable<>
What makes IEquatable<>
useful? Every class has Equals()
derived from
object
, so one can use it as long as one does not need more than the
comparison of references. In fact, object.Equals()
has the same semantics
as object.ReferenceEquals()
. As soon as a more detailed comparison of
contents is required, one could simply override object.Equals()
for own
purposes.
The sole reason is that we would like to avoid casting and especially boxing.
Let us look at the declaration of object.Equals()
.
public virtual bool Equals (object obj);
Note that it takes an object
as parameter. Now, if you overrode it and
used it to compare two instances of your class, you would write something like
the following.
a1.Equals(a2);
Here, a2
would be implicitly casted to object
only to be casted back
to your type in your implementation. This has more impact on performance than
necessary, which is especially expensive in context of boxing and absolutely
not justified for such a fundamental operation.
So, while it is still useful to override object.Equals()
for situations
where the other instance is really an object
, the common case should avoid
unnecessary conversion.
Purpose of IEquatable<>
.
IEquatable<>
is required in order to limit the impact on performance for
comparison.
Implementation of IEquatable<>
After having understood why we want to implement IEquatable<>
, let us look
at a sample class containing some data.
class Class { public uint Data1 { get; set; } public uint Data2 { get; set; } }
We would like to support comparison of its instances and use as less code as
possible in order to accomplish that. Especially we would like to avoid the
verbose example in the documentation of IEquatable<>.Equals()
and
implement the comparison in regard to both properties.
As Class
instances should be comparable to other instances thereof, we
derive it from IEquatable<Class>
and provide the following implementation.
class Class : IEquatable<Class> { public bool Equals(Class c) => c is Class && ( Data1, Data2) == (c.Data1, c.Data2); public override bool Equals(object o) => (o is Class c) && Equals(c); public static bool operator == (in Class c1, in Class c2) => Equals(c1, c2); public static bool operator != (in Class c1, in Class c2) => !Equals(c1, c2); public override int GetHashCode() => HashCode.Combine(Data1, Data2); public uint Data1 { get; set; } public uint Data2 { get; set; } }
Note that this approach is especially valid in the context of nullable reference types, which were introduced in C# 8.0.
Let us look at this piece by piece.
Equals(Class)
The only method that IEquatable<Class>
requires is Equals(Class)
. It
is the place where the actual comparison of the contents takes place.
public bool Equals(Class c) => c is Class && ( Data1, Data2) == (c.Data1, c.Data2);
Firstly, we check if the parameter is null
. As this
itself cannot be
null
, this might be sufficient to determine inequality already. The check
c is Class
for null
has slightly unfortunate syntax in C# and might seem redundant,
since we take the parameter as Class
already. However, it also checks if
c
is null
and returns false
in such case.
With this approach we avoid using ReferenceEquals()
or the operator
==
. That way we avoid overloads of ==
and are foreseeing hiding
1 of ReferenceEquals()
, so we can rely on the semantics of the
check. Note that this approach is especially valid in the context of nullable
reference types, so we can avoid the damn-it operator, which is called the
null-forgiving operator officially.
If Class
was actually required to be not null
for nullable reference
types, we would be able to drop the check entirely. However, the check is only
performed by the static analysis at compile-time and nothing prevents the
parameter from being null
at execution-time. Therefore, the check for
null
is actually required in the general case.
The rest of the method is the comparison of contents. Here, we are creating value tuples containing the properties of interest and are comparing those 2.
With Equals(Class)
we have implemented the foundation of our comparison.
Equals(object)
Next, we want to override object.Equals()
. Without a more specialized
method, something like this would throw.
Class c1 = new Class(); object c3 = new Class(); Debug.Assert(c1.Equals(c3));
The compiler would not find the proper Equals()
in Class
as its
parameter is of type object
and instead find object.Equals()
, which
would compare the references and return false
as those do not match.
Instead, we are providing a proper way of dealing with the other instance of
our class even if it comes in the shape of an object
.
public override bool Equals(object o) => (o is Class c) && Equals(c);
Here, we check if the other object
is actually a Class
. If it is, we
treat it as such and perform the value comparison relying on
Equals(Class)
.
operator ==
I would like to perform comparison of contents more concisely using the
operators as if we would have value types at hand. You can skip this section
if you still want to use ==
for reference comparison, probably because you
want to ship around some pitfalls regarding those.
var c1 = new Class(); var c2 = new Class(); Debug.Assert(c1 == c2);
In order to support value comparison through ==
, we implement the
according comparison operators.
public static bool operator == (in Class c1, in Class c2) => Equals(c1, c2); public static bool operator != (in Class c1, in Class c2) => !Equals(c1, c2);
As we do not override existing implementations, we are free to mark the
parameters as read-only using the in
keyword. Note that we are able to
use Equals(Class, Class)
here as we have overridden Equals()
for our
class and therefore the static method taking two references is available.
The following check will still throw.
Class c1 = new Class(); object c3 = new Class(); Debug.Assert(c1 == c3);
The compiler issues a level-2 warning in this case.
warning CS0253: Possible unintended reference comparison; to get a value comparison, cast the right hand side to type 'Class'
What happens? The compiler looks for a fitting operator ==
. The one we
have implemented does not quite fit, as it expects both operands being
Class
, while the one is supposed to be object
. So, the next best fit
is the default one, which compares references. This might be not what we want,
so the compiler issues a warning.
There are two possible solutions in this case. Either you do what the compiler
suggests and cast one of the parameters to match the signature of the required
operator ==
. If it happens in many places throughout the code, however,
the better approach would be to simply define fitting operators in Class
.
public static bool operator == (in object c1, in Class c2) => Equals(c1, c2); public static bool operator == (in Class c1, in object c2) => Equals(c1, c2); public static bool operator != (in object c1, in Class c2) => !Equals(c1, c2); public static bool operator != (in Class c1, in object c2) => !Equals(c1, c2);
I have omitted those in my solution above, since they make the general case more convoluted than necessary, but you might want to keep this solution in mind if it comes handy.
GetHashCode()
Why GetHashCode()
?
If we omit the implementation of GetHashCode()
in our class, we get
level-3 warnings for diverging from the safe path.
warning CS0659: 'Class' overrides Object.Equals(object o) but does not override Object.GetHashCode() warning CS0661: 'Class' defines operator == or operator != but does not override Object.GetHashCode()
Usually one would assume the following semantics if we have no hashing
function for Class
provided.
- If
Class
is not used in a hashed container, -
we would prefer to not be bothered with a warning. This is the case for many other languages such as C++, Rust and Ada. While we are at it, what else is missing in
Class
? An enumerator to use its instances in aforeach
loop, just in case? - If
Class
is actually used in a hashed container, -
we would like to see a compile-time error. Especially we would like to avoid the usage of instances in a hashed container without a proper hashing function. In C++ you have to implement
operator ()
for your class. In Rust you have to implement theHash
trait for your type. In Ada you have to provide a hashing function to the container.
Hashing is not required here as we are not using instances of Class
in
a hashed container, so why does the compiler warn us? Let us understand the
connection between hashing and our simple comparison.
The story begins — as stories often do — with the architecture of Java. In
Java, every class is at least implicitly derived from the base class
Object
and Object
provides some common methods such as hashCode()
.
As C# has been designed, this aspect of architecture in Java has been borrowed
also. Hence, object
contains GetHashCode()
and all classes are
implicitly derived from object
, including Class
. Therefore, every
class implicitly contains GetHashCode()
. As a consequence, instances of
Class
can very well be used in a hashed container such as HashSet<>
.
This is especially the case even if Class
does not override
GetHashCode()
.
But how does object
determine the hash for Class
, especially since it
does not know the data of interest? It does not determine that. The default
object.GetHashCode()
is absolutely not fitting 3 for our case and
the compiler knows it.
What the compiler does not know, is that we do not want to use Class
in
a hashed container. It actually does not care. The static analysis does not
go that far. The fact that we have implemented some equality functionality is
enough for it to assume that we might use its hash, so it warns us.
It issues no error, however, since it does not determine if a hashing function
is actually required.
This is something I regard as a design defect in the language. The better
approach would have been to offer an interface like IHashable if hashing is
needed. However, generics were added three years after the initial design of
the type system and a generic hash table had required the storage of variant
types. Take a look at Jon Skeet's perspective on hypothetical improvements
on object
for more context.
In conclusion, we have to simply accept this flaw in C# and implement
GetHashCode()
anyway, just in case we need it. Ignoring the warning might
bite us later on.
Implementation of GetHashCode()
After having understood why we have to provide a hashing method in the presence of equality methods or operators, the implementation is straightforward.
public override int GetHashCode() => HashCode.Combine(Data1, Data2);
We are using HashCode.Combine()
in order to provide a hash combined from
the hashes of underlying properties. Note that HashCode.Combine()
is
defined for up to eight parameters only 4. Therefore, in order to
support the hashing for more than eight properties using this method, you have
to split those and combine the hashes of the hashes.
What exact requirements GetHashCode()
needs to meet is beyond the scope of
this post. Consult the great post by Eric Lippert for details.
Difference between Equals()
and ==
We have derived our class from IEquatable<>
and we have implemented
Equals(Class)
. Then, we have overridden Equals(object)
. We have
implemented the according equality operators and we have even provided
a matching hashing function.
While testing the implementation, you will find the following check throw anyway.
object c3 = new Class(); object c4 = new Class(); Debug.Assert(c3 == c4);
This might be surprising since using Equals()
is just fine.
object c3 = new Class(); object c4 = new Class(); Debug.Assert(c3.Equals(c4));
First, we need to understand what the type of c3
is when we write
something like the following.
object c3 = new Class();
We are saying that c3
is of type object
. This is something that is
fixed at compile-time. However, at execution-time, we might specify what
specific kind of object
c3
is. In this case we are saying that it is
more specifically a Class
.
So, the type of c3
and c4
is object
at compile-time and Class
at execution-time.
The final piece is the following.
Difference between Equals()
and ==
.
Operator ==
is determined at compile-time, while Equals()
is
determined at execution-time.
As both parameters are of type Class
at execution-time, our override of
Equals()
is found and used to determine equality. On the other hand, as
both parameters are of type object
at compile-time, our overload of ==
is not found and the default one used instead. The default ==
compares
references, which do not match. Therefore, it returns false
and the
assertion fails.
There are no good solutions to this in C# and the compiler issues no warnings. You just have to keep this in mind especially if you have some kind of container with generic objects in it.
Conclusion
If Class
lived in Rust, we would simply derive the Eq
and
PartialEq
macros for it. This is clean and simple.
#[derive(Eq, PartialEq)] struct Class { data1: u32, data2: u32, }
However, in C# no macros exist and no default implementation can be requested from the compiler. Therefore, more code and consideration is needed to find the best approach. I have shown my current solution for comparison along with some things I have learned on the way. The greatest take-away here is the template for your own types along with the explanation why the hashing function is required.
Have fun with it and write solid code.
- 1
-
Although
object.ReferenceEquals()
cannot be overridden, it still can be hidden either by some base class or byClass
itself. Using the keywordnew
in the declaration would silence the according warning. - 2
-
One might ask for the performance impact of creating value tuples for the only purpose of comparison. The alternative would have been to compare each element individually and to make use of lazy evaluation right away.
public bool Equals(Class c) => /* […] */ && Data1.Equals(c.Data1) && Data2.Equals(c.Data2);
I have run some quick and dirty performance tests on both variations being optimized. Those are the results for 10 million iterations.
[ms]
Value Tuples
Direct Comparison
Maximum
184.32
181.14
Average
183.70
178.43
Minimum
183.13
177.35
The difference for the pessimistic case is 3ms and for the average case 5ms. Looking at the IL code all data is gathered first for value tuples while lazy comparison takes place for the direct approach.
In a time critical context using a language like C, C++, Rust or Ada we probably would discuss the particular case. However, in no such context would C# with its garbage collection be used. Therefore, I am using the comparison via value tuples with the benefit of simplicity.
- 3
-
You might wonder what the consequence would be of not providing
GetHashCode()
for your class and use it in a hashed container anyway.Consider the following example.
var c1 = new Class(); var c2 = new Class(); var set = new HashSet<Class>(); set.Add(c1); Debug.Assert(set.Contains(c2));
We are adding
c1
to a hashed container. Then, we check if a class with the same property values exists in it. Obviously, it does. However, the check will still throw, because without a proper hashing function the references are used for the hash per default. Those do not match and thereforec1
andc2
are never compared inHashSet<>.Contains()
.The compiler does not issue an error here because
Class
hasGetHashCode()
derived fromobject
, so some hashing function is available. - 4
-
The restriction for an upper bound of generic parameters comes from the fact that C# does not support an equivalent of variadic templates in C++. As a consequence, the number of generic parameters is fixed. However, one is free to provide multiple overloads of the same method with different number of parameters. So, in order to support a variable number of generic parameters, signatures for according methods have to be provided explicitly. Therefore,
HashCode.Combine()
is defined for up to eight parameters only. Another instance of this restriction isTuple.Create()
.