Why was IEnumerable<T> made covariant in C# 4?
Marc's and CodeInChaos's answers are pretty good, but just to add a few more details:
First off, it sounds like you are interested in learning about the design process we went through to make this feature. If so, then I encourage you to read my lengthy series of articles that I wrote while designing and implementing the feature. Start from the bottom of the page:
Covariance and contravariance blog posts
Is it just to make the annoying casts in LINQ expressions go away?
No, it is not just to avoid Cast<T>
expressions, but doing so was one of the motivators that encouraged us to do this feature. We realized that there would be an uptick in the number of "why can't I use a sequence of Giraffes in this method that takes a sequence of Animals?" questions, because LINQ encourages the use of sequence types. We knew that we wanted to add covariance to IEnumerable<T>
first.
We actually considered making IEnumerable<T>
covariant even in C# 3 but decided that it would be strange to do so without introducing the whole feature for anyone to use.
Won't this introduce the same problems like with
string[] <: object[]
(broken array variance) in C#?
It does not directly introduce that problem because the compiler only allows variance when it is known to be typesafe. However, it does preserve the broken array variance problem. With covariance, IEnumerable<string[]>
is implicitly convertible to IEnumerable<object[]>
, so if you have a sequence of string arrays, you can treat that as a sequence of object arrays, and then you have the same problem as before: you can try to put a Giraffe into that string array and get an exception at runtime.
How was the addition of the covariance done from a compatibility point of view?
Carefully.
Will earlier code still work on later versions of .NET or is recompilation necessary here?
Only one way to find out. Try it and see what fails!
It's often a bad idea to try to force code compiled against .NET X to run against .NET Y if X != Y, regardless of changes to the type system.
What about the other way around?
Same answer.
Is it possible that certain use cases will behave different now?
Absolutely. Making an interface covariant where it was invariant before is technically a "breaking change" because it can cause working code to break. For example:
if (x is IEnumerable<Animal>)
ABC();
else if (x is IEnumerable<Turtle>)
DEF();
When IE<T>
is not covariant, this code chooses either ABC or DEF or neither. When it is covariant, it never chooses DEF anymore.
Or:
class B { public void M(IEnumerable<Turtle> turtles){} }
class D : B { public void M(IEnumerable<Animal> animals){} }
Before, if you called M on an instance of D with a sequence of turtles as the argument, overload resolution chooses B.M because that is the only applicable method. If IE is covariant, then overload resolution now chooses D.M because both methods are applicable, and an applicable method on a more-derived class always beats an applicable method on a less-derived class, regardless of whether the argument type match is exact or not.
Or:
class Weird : IEnumerable<Turtle>, IEnumerable<Banana> { ... }
class B
{
public void M(IEnumerable<Banana> bananas) {}
}
class D : B
{
public void M(IEnumerable<Animal> animals) {}
public void M(IEnumerable<Fruit> fruits) {}
}
If IE is invariant then a call to d.M(weird)
resolves to B.M. If IE suddenly becomes covariant then both methods D.M are applicable, both are better than the method on the base class, and neither is better than the other, so, overload resolution becomes ambiguous and we report an error.
When we decided to make these breaking changes, we were hoping that (1) the situations would be rare, and (2) when situations like this arise, almost always it is because the author of the class is attempting to simulate covariance in a language that doesn't have it. By adding covariance directly, hopefully when the code "breaks" on recompilation, the author can simply remove the crazy gear trying to simulate a feature that now exists.
In order:
Is it just to make the annoying casts in LINQ expressions go away?
It makes things behave like people generally expect ;p
Won't this introduce the same problems like with
string[] <: object[]
(broken array variance) in C#?
No; since it doesn't expose any Add
mechanism or similar (and can't; out
and in
are enforced at the compiler)
How was the addition of the covariance done from a compatibility point of view?
The CLI already supported it, this merely makes C# (and some of the existing BCL methods) aware of it
Will earlier code still work on later versions of .NET or is recompilation necessary here?
It is entirely backwards compatible, however: C# that relies on C# 4.0 variance won't compile in a C# 2.0 etc compiler
What about the other way around?
That is not unreasonable
Was previous code using this interface strictly invariant in all cases or is it possible that certain use cases will behave different now?
Some BCL calls (IsAssignableFrom
) may return differently now
Is it just to make the annoying casts in LINQ expressions go away?
Not only when using LINQ. It's useful everywhere you have an IEnumerable<Derived>
and the code expects a IEnumerable<Base>
.
Won't this introduce the same problems like with
string[]
<:object[]
(broken array variance) in C#?
No, because covariance is only allowed on interfaces that return values of that type, but don't accept them. So it's safe.
How was the addition of the covariance done from a compatibility point of view? Will earlier code still work on later versions of .NET or is recompilation necessary here? What about the other way around?
I think already compiled code will mostly work as is. Some runtime type-checks (is
, IsAssignableFrom
, ...) will return true where they returned false earlier.
Was previous code using this interface strictly invariant in all cases or is it possible that certain use cases will behave different now?
Not sure what you mean by that
The biggest problems are related to overload resolution. Since now additional implicit conversions are possible a different overload might be chosen.
void DoSomething(IEnumerabe<Base> bla);
void DoSomething(object blub);
IEnumerable<Derived> values = ...;
DoSomething(values);
But of course, if these overload behave differently, the API is already badly designed.