The performance implications of IEnumerable vs. IQueryable
It all started innocently enough. I was implementing a “Older Posts/Newer Posts” feature for my new web site and was writing code like this:
IEnumerable<Post> FilterByCategory(IEnumerable<Post> posts, string category) {
if( !string.IsNullOrEmpty(category) ) {
return posts.Where(p => p.Category.Contains(category));
}
}
...
var posts = FilterByCategory(db.Posts, category);
int count = posts.Count();
...
The “db” was an EF object context object, but it could just as easily been a LINQ to SQL context. Once I ran this code, it failed at run-time with a null reference exception on Category. “That’s strange,” I thought. “Some of my categories are null, but I expect the ‘like’ operation in SQL to which Contains maps to skip the null values.” That should’ve been my first clue.
Clue #2 was when I added the null check into my Where expression and found that their were far fewer results than I expected. Some experimentation revealed that the case of the category string mattered. “Hm. That’s really strange,” I thought. “By default, the ‘like’ operation doesn’t care about case.” Second clue unnoticed.
My 3rd and final clue was that even though my site was only showing a fraction of the values I knew where in the database, it had slowed to a crawl. By now, those of you experienced with LINQ to Entities/SQL are hollering from the audience: “Don’t go into the woods alone! IEnumerable kills all the benefits of IQueryable!”
See, what I’d done was unwittingly switched from LINQ to Entities, which takes my C# expressions and translates them into SQL, and was now running LINQ to Objects, which executes my expressions directly.
“But that can’t be,” I thought, getting hot under the collar (I was wearing a dress shirt that day — the girlfriend likes me to look dapper!). “To move from LINQ to Entities/SQL to LINQ to Objects, I thought I had to be explicit and use a method like ToList() or ToArray().” Au contraire mon fraire (the girlfriend also really likes France).
Here’s what I expected to be happening. If I have an expression like “db.Posts” and I execute that expression by doing a foreach, I expect the SQL produced by LINQ to Entities/SQL to look like this:
select * from Posts
If I add a Where clause, I expect the SQL to be modified:
select * from Posts where Category like '%whatever%'
Further, if I do a Count on the whole thing, e.g.
db.Posts.Where(p => p.Contains(category)).Count()
I expect that to turn into the following SQL:
select count(*) from Posts where Category like '%whatever%'
And that’s all true if I keep things to just “var” but I wasn’t — I was being clever and building functions to build up my queries. And because I couldn’t use “var” as a function parameter, I had to pick a type. I picked the wrong one: IEnumerable.
The problem with IEnumerable is that it doesn’t have enough information to support the building up of queries. Let’s take a look at the extension method of Count over an IEnumerable:
public static int Count<TSource>(this IEnumerable<TSource> source) {
...
int num = 0;
using (IEnumerator<TSource> enumerator = source.GetEnumerator()) {
while (enumerator.MoveNext()) { num++; }
}
return num;
}
See? It’s not composing the source IEnumerable over which it’s operating — it’s executing the enumerator and counting the results. Further, since our example IEnumerator was a Where statement, which was in turn a accessing the list of Posts from the database, the effect was filtering in the Where over objects constituted from the following SQL:
select * from Posts
How did I see that? Well, I tried hooking up the supremely useful SQL Profiler to my ISP’s database that was holding the data, but I didn’t have permission. Luckily, the SQL tab in LinqPad will show me what SQL is being executed and it showed me just that (or rather, the slightly more verbose and more correct SQL that LINQ to Entities generates in these circumstances).
Now, I had a problem. I didn’t want to pass around IEnumerable, because clearly that’s slowing things down. A lot. On the other hand, I don’t want to use ObjectSet
Unlike IEnumerable, IQueryable exposes the underlying expression so that it can be composed by the caller. In fact, if you look at the IQueryable implementation of the Count extension method, you’ll see something very different:
public static int Count<TSource>(this IQueryable<TSource> source) {
...
return source.Provider.Execute<int>(
Expression.Call(null,
((MethodInfo) MethodBase.GetCurrentMethod()).
MakeGenericMethod(
new Type[] { typeof(TSource) }),
new Expression[] { source.Expression }));
}
This code isn’t exactly intuitive, but what’s happening is that we’re forming an expression which is composed of whatever expression is exposed by the IQueryable we’re operating over and the Count method, which we’re then implementing. To get this code path to execute for our example, we simply have to replace the use of IEnumerable with IQueryable:
IQueryable<Post> FilterByCategory(IQueryable<Post> posts, string category) {
if( !string.IsNullOrEmpty(category) ) {
return posts.Where(p => p.Category.Contains(category));
}
}
...
var posts = FilterByCategory(db.Posts, category);
int count = posts.Count();
...
Notice that none of the actual code changes. However, this new code runs much faster and with the case- and null-insensitivity built into the ‘like’ operator in SQL instead of semantics of the Contains method in LINQ to Objects.
The way it works is that we stack one IQueryable implementation onto another, in our case Count works on the Where which works on the ObjectSet returned from the Posts property on the object context (ObjectSet itself is an IQueryable). Because each outer IQueryable is reaching into the expression exposed by the inner IQueryable, it’s only the outermost one — Count in our example — that causes the execution (foreach would also do it, as would ToList() or ToArray()).
Using IEnumerable, I was pulling back the ~3000 posts from my blog, then filtering them on the client-side and then doing a count of that.With IQueryable, I execute the complete query on the server-side:
select count(*) from Posts where Category like '%whatever%'
And, as our felon friend Ms. Stewart would say: “that’s a good thing.”