April 5, 2010 .net

The performance implications of IEnumerable vs. IQueryable

It all started innocently enough. I was implementing a Older Posts/Newer Posts” feature for my new web site and was writing code like this:

IEnumerable<Post> FilterByCategory(IEnumerable<Post> posts, string category) {
  if( !string.IsNullOrEmpty(category) ) {
return posts.Where(p => p.Category.Contains(category));
}
}
...
  var posts = FilterByCategory(db.Posts, category);
  int count = posts.Count();
...

The db” was an EF object context object, but it could just as easily been a LINQ to SQL context. Once I ran this code, it failed at run-time with a null reference exception on Category. That’s strange,” I thought. Some of my categories are null, but I expect the like’ operation in SQL to which Contains maps to skip the null values.” That should’ve been my first clue.

Clue #2 was when I added the null check into my Where expression and found that their were far fewer results than I expected. Some experimentation revealed that the case of the category string mattered. Hm. That’s really strange,” I thought. By default, the like’ operation doesn’t care about case.” Second clue unnoticed.

My 3rd and final clue was that even though my site was only showing a fraction of the values I knew where in the database, it had slowed to a crawl. By now, those of you experienced with LINQ to Entities/SQL are hollering from the audience: Don’t go into the woods alone! IEnumerable kills all the benefits of IQueryable!”

See, what I’d done was unwittingly switched from LINQ to Entities, which takes my C# expressions and translates them into SQL, and was now running LINQ to Objects, which executes my expressions directly.

But that can’t be,” I thought, getting hot under the collar (I was wearing a dress shirt that day — the girlfriend likes me to look dapper!). To move from LINQ to Entities/SQL to LINQ to Objects, I thought I had to be explicit and use a method like ToList() or ToArray().” Au contraire mon fraire (the girlfriend also really likes France).

Here’s what I expected to be happening. If I have an expression like db.Posts” and I execute that expression by doing a foreach, I expect the SQL produced by LINQ to Entities/SQL to look like this:

select * from Posts

If I add a Where clause, I expect the SQL to be modified:

select * from Posts where Category like '%whatever%'

Further, if I do a Count on the whole thing, e.g.

db.Posts.Where(p => p.Contains(category)).Count()

I expect that to turn into the following SQL:

select count(*) from Posts where Category like '%whatever%'

And that’s all true if I keep things to just var” but I wasn’t — I was being clever and building functions to build up my queries. And because I couldn’t use var” as a function parameter, I had to pick a type. I picked the wrong one: IEnumerable.

The problem with IEnumerable is that it doesn’t have enough information to support the building up of queries. Let’s take a look at the extension method of Count over an IEnumerable:

public static int Count<TSource>(this IEnumerable<TSource> source) {
...
int num = 0;
  using (IEnumerator<TSource> enumerator = source.GetEnumerator()) {
    while (enumerator.MoveNext()) { num++; }
  }
  return num;
}

See? It’s not composing the source IEnumerable over which it’s operating — it’s executing the enumerator and counting the results. Further, since our example IEnumerator was a Where statement, which was in turn a accessing the list of Posts from the database, the effect was filtering in the Where over objects constituted from the following SQL:

select * from Posts

How did I see that? Well, I tried hooking up the supremely useful SQL Profiler to my ISPs database that was holding the data, but I didn’t have permission. Luckily, the SQL tab in LinqPad will show me what SQL is being executed and it showed me just that (or rather, the slightly more verbose and more correct SQL that LINQ to Entities generates in these circumstances).

Now, I had a problem. I didn’t want to pass around IEnumerable, because clearly that’s slowing things down. A lot. On the other hand, I don’t want to use ObjectSet because it doesn’t compose, i.e. Where doesn’t return that. What is the right interface to use to compose separate expressions into a single SQL statement? As you’ve probably guessed by now from the title of this post, the answer is: IQueryable.

Unlike IEnumerable, IQueryable exposes the underlying expression so that it can be composed by the caller. In fact, if you look at the IQueryable implementation of the Count extension method, you’ll see something very different:

public static int Count<TSource>(this IQueryable<TSource> source) {
  ...
  return source.Provider.Execute<int>(
Expression.Call(null,
((MethodInfo) MethodBase.GetCurrentMethod()).
MakeGenericMethod(
new Type[] { typeof(TSource) }),
new Expression[] { source.Expression }));
}

This code isn’t exactly intuitive, but what’s happening is that we’re forming an expression which is composed of whatever expression is exposed by the IQueryable we’re operating over and the Count method, which we’re then implementing. To get this code path to execute for our example, we simply have to replace the use of IEnumerable with IQueryable:

IQueryable<Post> FilterByCategory(IQueryable<Post> posts, string category) {
  if( !string.IsNullOrEmpty(category) ) {
    return posts.Where(p => p.Category.Contains(category));
  }
}
...
  var posts = FilterByCategory(db.Posts, category);
  int count = posts.Count();
...

Notice that none of the actual code changes. However, this new code runs much faster and with the case- and null-insensitivity built into the like’ operator in SQL instead of semantics of the Contains method in LINQ to Objects.

The way it works is that we stack one IQueryable implementation onto another, in our case Count works on the Where which works on the ObjectSet returned from the Posts property on the object context (ObjectSet itself is an IQueryable). Because each outer IQueryable is reaching into the expression exposed by the inner IQueryable, it’s only the outermost one — Count in our example — that causes the execution (foreach would also do it, as would ToList() or ToArray()).

Using IEnumerable, I was pulling back the ~3000 posts from my blog, then filtering them on the client-side and then doing a count of that.With IQueryable, I execute the complete query on the server-side:

select count(*) from Posts where Category like '%whatever%'

And, as our felon friend Ms. Stewart would say: that’s a good thing.”