C#: Using IEquatable<>

2019-12-10

Michael Schupikov

In C# IEquatable<> is used to implement comparison of custom classes. Different solutions exist for the proper implementation of the required interface in regard to surrounding requirements. Especially the requirement for the hashing function is not immediately obvious in the context of comparison.

It took me some time to figure out the best approach. So, here it is.

Contents

Purpose of `IEquatable<>`

What makes IEquatable<> useful? Every class has Equals() derived from object, so one can use it as long as one does not need more than the comparison of references. In fact, object.Equals() has the same semantics as object.ReferenceEquals(). As soon as a more detailed comparison of contents is required, one could simply override object.Equals() for own purposes.

The sole reason is that we would like to avoid casting and especially boxing. Let us look at the declaration of object.Equals().

public virtual bool Equals (object obj);

Note that it takes an object as parameter. Now, if you overrode it and used it to compare two instances of your class, you would write something like the following.

a1.Equals(a2);

Here, a2 would be implicitly casted to object only to be casted back to your type in your implementation. This has more impact on performance than necessary, which is especially expensive in context of boxing and absolutely not justified for such a fundamental operation.

So, while it is still useful to override object.Equals() for situations where the other instance is really an object, the common case should avoid unnecessary conversion.

Purpose of IEquatable<>.

IEquatable<> is required in order to limit the impact on performance for comparison.

Implementation of `IEquatable<>`

After having understood why we want to implement IEquatable<>, let us look at a sample class containing some data.

class Class
{
  public uint Data1 { get; set; }
  public uint Data2 { get; set; }
}

We would like to support comparison of its instances and use as less code as possible in order to accomplish that. Especially we would like to avoid the verbose example in the documentation of IEquatable<>.Equals() and implement the comparison in regard to both properties.

As Class instances should be comparable to other instances thereof, we derive it from IEquatable<Class> and provide the following implementation.

class Class : IEquatable<Class>
{
  public bool Equals(Class c)
    => c is Class
    && (  Data1,   Data2)
    == (c.Data1, c.Data2);

  public override bool Equals(object o)
    => (o is Class c) && Equals(c);

  public static bool operator == (in Class c1, in Class c2)
    => Equals(c1, c2);

  public static bool operator != (in Class c1, in Class c2)
    => !Equals(c1, c2);

  public override int GetHashCode()
    => HashCode.Combine(Data1, Data2);

  public uint Data1 { get; set; }
  public uint Data2 { get; set; }
}

Note that this approach is especially valid in the context of nullable reference types, which were introduced in C# 8.0.

Let us look at this piece by piece.

`Equals(Class)`

The only method that IEquatable<Class> requires is Equals(Class). It is the place where the actual comparison of the contents takes place.

public bool Equals(Class c)
  => c is Class
  && (  Data1,   Data2)
  == (c.Data1, c.Data2);

Firstly, we check if the parameter is null. As this itself cannot be null, this might be sufficient to determine inequality already. The check

c is Class

for null has slightly unfortunate syntax in C# and might seem redundant, since we take the parameter as Class already. However, it also checks if c is null and returns false in such case.

With this approach we avoid using ReferenceEquals() or the operator ==. That way we avoid overloads of == and are foreseeing hiding 1 of ReferenceEquals(), so we can rely on the semantics of the check. Note that this approach is especially valid in the context of nullable reference types, so we can avoid the damn-it operator, which is called the null-forgiving operator officially.

If Class was actually required to be not null for nullable reference types, we would be able to drop the check entirely. However, the check is only performed by the static analysis at compile-time and nothing prevents the parameter from being null at execution-time. Therefore, the check for null is actually required in the general case.

The rest of the method is the comparison of contents. Here, we are creating value tuples containing the properties of interest and are comparing those 2.

With Equals(Class) we have implemented the foundation of our comparison.

`Equals(object)`

Next, we want to override object.Equals(). Without a more specialized method, something like this would throw.

Class  c1 = new Class();
object c3 = new Class();

Debug.Assert(c1.Equals(c3));

The compiler would not find the proper Equals() in Class as its parameter is of type object and instead find object.Equals(), which would compare the references and return false as those do not match. Instead, we are providing a proper way of dealing with the other instance of our class even if it comes in the shape of an object.

public override bool Equals(object o)
  => (o is Class c) && Equals(c);

Here, we check if the other object is actually a Class. If it is, we treat it as such and perform the value comparison relying on Equals(Class).

`operator ==`

I would like to perform comparison of contents more concisely using the operators as if we would have value types at hand. You can skip this section if you still want to use == for reference comparison, probably because you want to ship around some pitfalls regarding those.

var c1 = new Class();
var c2 = new Class();

Debug.Assert(c1 == c2);

In order to support value comparison through ==, we implement the according comparison operators.

public static bool operator == (in Class c1, in Class c2)
  => Equals(c1, c2);

public static bool operator != (in Class c1, in Class c2)
  => !Equals(c1, c2);

As we do not override existing implementations, we are free to mark the parameters as read-only using the in keyword. Note that we are able to use Equals(Class, Class) here as we have overridden Equals() for our class and therefore the static method taking two references is available.

The following check will still throw.

Class  c1 = new Class();
object c3 = new Class();

Debug.Assert(c1 == c3);

The compiler issues a level-2 warning in this case.

warning CS0253: Possible unintended reference comparison;
                to get a value comparison,
                cast the right hand side to type 'Class'

What happens? The compiler looks for a fitting operator ==. The one we have implemented does not quite fit, as it expects both operands being Class, while the one is supposed to be object. So, the next best fit is the default one, which compares references. This might be not what we want, so the compiler issues a warning.

There are two possible solutions in this case. Either you do what the compiler suggests and cast one of the parameters to match the signature of the required operator ==. If it happens in many places throughout the code, however, the better approach would be to simply define fitting operators in Class.

public static bool operator == (in object c1, in Class c2)
  => Equals(c1, c2);

public static bool operator == (in Class c1, in object c2)
  => Equals(c1, c2);

public static bool operator != (in object c1, in Class c2)
  => !Equals(c1, c2);

public static bool operator != (in Class c1, in object c2)
  => !Equals(c1, c2);

I have omitted those in my solution above, since they make the general case more convoluted than necessary, but you might want to keep this solution in mind if it comes handy.

`GetHashCode()`

Why `GetHashCode()`?

If we omit the implementation of GetHashCode() in our class, we get level-3 warnings for diverging from the safe path.

warning CS0659: 'Class' overrides Object.Equals(object o)
                but does not override Object.GetHashCode()

warning CS0661: 'Class' defines operator == or operator !=
                but does not override Object.GetHashCode()

Usually one would assume the following semantics if we have no hashing function for Class provided.

If Class is not used in a hashed container,: we would prefer to not be bothered with a warning. This is the case for many other languages such as C++, Rust and Ada. While we are at it, what else is missing in Class? An enumerator to use its instances in a foreach loop, just in case?
If Class is actually used in a hashed container,: we would like to see a compile-time error. Especially we would like to avoid the usage of instances in a hashed container without a proper hashing function. In C++ you have to implement operator () for your class. In Rust you have to implement the Hash trait for your type. In Ada you have to provide a hashing function to the container.

Hashing is not required here as we are not using instances of Class in a hashed container, so why does the compiler warn us? Let us understand the connection between hashing and our simple comparison.

The story begins — as stories often do — with the architecture of Java. In Java, every class is at least implicitly derived from the base class Object and Object provides some common methods such as hashCode(). As C# has been designed, this aspect of architecture in Java has been borrowed also. Hence, object contains GetHashCode() and all classes are implicitly derived from object, including Class. Therefore, every class implicitly contains GetHashCode(). As a consequence, instances of Class can very well be used in a hashed container such as HashSet<>. This is especially the case even if Class does not override GetHashCode().

But how does object determine the hash for Class, especially since it does not know the data of interest? It does not determine that. The default object.GetHashCode() is absolutely not fitting 3 for our case and the compiler knows it.

What the compiler does not know, is that we do not want to use Class in a hashed container. It actually does not care. The static analysis does not go that far. The fact that we have implemented some equality functionality is enough for it to assume that we might use its hash, so it warns us. It issues no error, however, since it does not determine if a hashing function is actually required.

This is something I regard as a design defect in the language. The better approach would have been to offer an interface like IHashable if hashing is needed. However, generics were added three years after the initial design of the type system and a generic hash table had required the storage of variant types. Take a look at Jon Skeet's perspective on hypothetical improvements on object for more context.

In conclusion, we have to simply accept this flaw in C# and implement GetHashCode() anyway, just in case we need it. Ignoring the warning might bite us later on.

Implementation of `GetHashCode()`

After having understood why we have to provide a hashing method in the presence of equality methods or operators, the implementation is straightforward.

public override int GetHashCode()
  => HashCode.Combine(Data1, Data2);

We are using HashCode.Combine() in order to provide a hash combined from the hashes of underlying properties. Note that HashCode.Combine() is defined for up to eight parameters only 4. Therefore, in order to support the hashing for more than eight properties using this method, you have to split those and combine the hashes of the hashes.

What exact requirements GetHashCode() needs to meet is beyond the scope of this post. Consult the great post by Eric Lippert for details.

Difference between `Equals()` and `==`

We have derived our class from IEquatable<> and we have implemented Equals(Class). Then, we have overridden Equals(object). We have implemented the according equality operators and we have even provided a matching hashing function.

While testing the implementation, you will find the following check throw anyway.

object c3 = new Class();
object c4 = new Class();

Debug.Assert(c3 == c4);

This might be surprising since using Equals() is just fine.

object c3 = new Class();
object c4 = new Class();

Debug.Assert(c3.Equals(c4));

First, we need to understand what the type of c3 is when we write something like the following.

object c3 = new Class();

We are saying that c3 is of type object. This is something that is fixed at compile-time. However, at execution-time, we might specify what specific kind of object c3 is. In this case we are saying that it is more specifically a Class.

So, the type of c3 and c4 is object at compile-time and Class at execution-time.

The final piece is the following.

Difference between Equals() and ==.

Operator == is determined at compile-time, while Equals() is determined at execution-time.

As both parameters are of type Class at execution-time, our override of Equals() is found and used to determine equality. On the other hand, as both parameters are of type object at compile-time, our overload of == is not found and the default one used instead. The default == compares references, which do not match. Therefore, it returns false and the assertion fails.

There are no good solutions to this in C# and the compiler issues no warnings. You just have to keep this in mind especially if you have some kind of container with generic objects in it.

Conclusion

If Class lived in Rust, we would simply derive the Eq and PartialEq macros for it. This is clean and simple.

#[derive(Eq, PartialEq)]
struct Class {
    data1: u32,
    data2: u32,
}

However, in C# no macros exist and no default implementation can be requested from the compiler. Therefore, more code and consideration is needed to find the best approach. I have shown my current solution for comparison along with some things I have learned on the way. The greatest take-away here is the template for your own types along with the explanation why the hashing function is required.

Have fun with it and write solid code.

1

Although object.ReferenceEquals() cannot be overridden, it still can be hidden either by some base class or by Class itself. Using the keyword new in the declaration would silence the according warning.

2

One might ask for the performance impact of creating value tuples for the only purpose of comparison. The alternative would have been to compare each element individually and to make use of lazy evaluation right away.

public bool Equals(Class c)
  => /* […] */
  && Data1.Equals(c.Data1)
  && Data2.Equals(c.Data2);

I have run some quick and dirty performance tests on both variations being optimized. Those are the results for 10 million iterations.

[ms]	Value Tuples	Direct Comparison
Maximum	184.32	181.14
Average	183.70	178.43
Minimum	183.13	177.35

The difference for the pessimistic case is 3ms and for the average case 5ms. Looking at the IL code all data is gathered first for value tuples while lazy comparison takes place for the direct approach.

In a time critical context using a language like C, C++, Rust or Ada we probably would discuss the particular case. However, in no such context would C# with its garbage collection be used. Therefore, I am using the comparison via value tuples with the benefit of simplicity.

3

You might wonder what the consequence would be of not providing GetHashCode() for your class and use it in a hashed container anyway.

Consider the following example.

var c1 = new Class();
var c2 = new Class();

var set = new HashSet<Class>();
set.Add(c1);

Debug.Assert(set.Contains(c2));

We are adding c1 to a hashed container. Then, we check if a class with the same property values exists in it. Obviously, it does. However, the check will still throw, because without a proper hashing function the references are used for the hash per default. Those do not match and therefore c1 and c2 are never compared in HashSet<>.Contains().

The compiler does not issue an error here because Class has GetHashCode() derived from object, so some hashing function is available.

4

The restriction for an upper bound of generic parameters comes from the fact that C# does not support an equivalent of variadic templates in C++. As a consequence, the number of generic parameters is fixed. However, one is free to provide multiple overloads of the same method with different number of parameters. So, in order to support a variable number of generic parameters, signatures for according methods have to be provided explicitly. Therefore, HashCode.Combine() is defined for up to eight parameters only. Another instance of this restriction is Tuple.Create().

Why GetHashCode()?

Implementation of GetHashCode()

Why `GetHashCode()`?

Implementation of `GetHashCode()`