May 20, 2010 colophon

a whole new sellsbrothers.com

The new sellsbrothers.com implementation has been a while in the making. In fact, I’ve had the final art in my hands since August of 2005. I’ve tried several times to sit down and rebuild my 15-year-old sellsbrothers.com completely from scratch using the latest tools. This time, I had a book contract (“Programming Data,” Addison-Wesley, 2010) and I needed some real-world experience with Entity Framework 4.0 and OData, so I fired up Visual Studio 2010 a coupla months ago and went to town.

The Data Modeling

The first thing I did was design my data model. I started very small with just Post and Comment. That was enough to get most of my content in. And that lead to my first principle (we all need principles):

thou shalt have no .html files served from the file system.

On my old site, I had a mix of static and dynamic content which lead to all kinds of trouble. This time, the HTML at least was going to be all dynamic. So, once I had my model defined, I had to import all of my static data into my live system. For that, I needed a tool to parse the static HTML and pull out structured data. Luckily, Phil Haack came to my rescue here.

Before he was a Microsoft employee in charge of MVC, Phil was well-known author of the SubText open source CMS project. A few years ago, in one of my aborted attempts to get my site into a reasonable state (it has evolved from a single static text file into a mish-mash of static and dynamic content over 15 years), I asked Phil to help me get my site moved over to SubText. To help me out, he built the tool that parsed my static HTML, transforming the data into the SubText database format. For this, all I had to do was transform the data from his format into mine, but before I could do that, I had to hook my schema up to a real-live datastore. I didn’t want to have to take old web site down at all; I wanted to have both sites up and running at the same time. This lead to principle #2:

thou shalt keep both web sites running with the existing live set of data.

And, in fact, that’s what happened. For many weeks while I was building my new web site, I was dumping static data into the live database. However, since my old web site sorting things by date, there was only one place to even see this old data being put in (the /news/archive.aspx page). Otherwise, it was all imperceptible.

To make this happen, I had to map my new data model onto my existing data. I could do this in one of two ways:

I could create the new schema on my ISP-hosted SQL Server 2008 database (securewebs.com rocks, btw — highly recommended!) and move the data over.
I could use my existing schema and just map it on the client-side using the wonder and beauty that was EF4.

Since I was trying to get real-world experience with our data stack, I tried to use the tools and processes that a real-world developer has and they often don’t get to change the database without a real need, especially on a running system. So, I went with option #2.

And I’m so glad I did. It worked really, really well to change names and select fields I cared about or didn’t care about all from the client-side without ever touching the database. Sometimes I had to make database changes and when that happened, I has careful and deliberate, making the case to my inner DB administrator, but mostly I just didn’t have to.

And when I needed whole new tables of data, that lead to another principle:

build out all new tables in my development environment first.

This way, I could make sure they worked in my new environment and could refactor to my heart’s content before disturbing my (inner) DB admin with request after request to change a live, running database. I used a very simple repository pattern in my MVC2 web site to hide the fact that I was actually accessing two databases, so when I switched everything to a single database, none of my view or controller code had to change. Beautiful!

Data Wants To Be Clean

And even though I was careful to keep my schema the same on the backend and map it as I wanted in my new web site via EF, not all of my old data worked in my new world. For example, I was building a web site on my local box, so anything with a hard-coded link to sellsbrothers.com had to be changed. Also, I was using a set of

To do this cleaning, I used a combination of LINQPad, SSMS and EF-based C# code to perform data cleaning tasks. This yielded two tools that I’m still using:

BlogEdit: An unimagintively named general-purpose post and comment creation and editing tool. I built the first version of this long before WPF, so kept hacking on it in WinForms (whose data binding sucks compared to WPF, btw) as I needed it to have new features. Eventually I gave this tool WYSIWIG HTML editing by shelling out to Expression Web, but I need real AtomPub support on the site so I can move to Windows Live Writer for that functionality in the future.
BulkChangeDatabaseTable: This was an app that I’d use to run my questions to find “dirty” data, perform regular expression replaces with and then — and this is the best part — show the changes in WinDiff so I could make sure I was happy with the changes before commiting them to the database. This extra eyeballing saved me from wrecking a bunch of data.

During this data cleaning, I applied one simple rule that I adopted early and always regretted when I ignored:

thou shalt throw away no data.

Even if the data didn’t seem to have any use in the new world, I kept it. And it’s a good thing I did, because I always, always needed it.

For example, when I ran Phil’s tool to parse my static web pages, he pulled out the tags that went with all of my static posts. I wasn’t going to use them to build permalinks, why did I need them?

I’ll tell you why: because I’ve got 2600 posts in my blog from 15 years of doing this, I cross-link to my own content all the live-long day and a bunch of those cross-links are to, you guessed it, to what used to be static data. So, I have to turn links embedded in my content of the form “/writing/#footag” into links of the form “/posts/details/452″. But how do I look up the mapping between “footag” and “452″? That’s right — I actually went to my (inner) DB admin and begged him for a new column on my live database called “EntryName” where I tucked the data as I imported the data from Phil’s tool, even though I didn’t know why I might need it. It was a good principle.

Forwarding Old Links

And how did I even figure out I had all those broken links? Well, I asked my good friend and web expert Kent Sharkey how to make sure my site was at least internally consist before I shipped it and he recommended Xenu Link Sleuth for the job. This lead to another principle:

thou shalt ship the new site with no broken internal links.

Which was followed closely by another principle:

thou shalt not stress over broken links to external content.

Just because I’m completely anal about making sure every link I ever pass out to the world stays valid for all eternity doesn’t mean that the rest of the world is similiarly anal. That’s a shame, but there’s nothing I can do if little sites like microsoft.com decide to move things without a forwarding address. I can, however, make sure that all of my links worked internally and I used Xenu to do that. I started out with several hundred broken links and before I shipped the new site, I had zero.

Not all of that was changing old content, however. In fact, most of it wasn’t. Because I wanted existing external links out in the world to find the same content in the new place, I had to make sure the old links still worked. That’s not to say I was a slave to the old URL format, however. I didn’t want to expose .aspx extensions. I wanted to do things the new, cool, MVC way, i.e. instead of /news/showTopic.aspx?ixTopic=452 (my old format), I wanted /posts/details/452. So, this lead to a new principle:

thou shalt built the new web site the way you want and make the old URLs work externally.

I was using MVC and I wanted to do it right. That meant laying out the “URL space” the way it made sense in the new world (and it’s much nicer in general, imo). However, instead of changing my content to use this new URL schema, I used it as a representative sample of how links to my content in the real-world might be coming into my site, which gave me initial data about what URLs I needed to forward. Ongoing, I’ll dig through 404 logs to find the rest and make those URLs work appropriately.

I used a few means of forwarding the old URLs:

Mapping sub-folders to categories: In the old site, I physically had the files in folders that matched the sub-folders, e.g. /fun mapped to /fun/default.aspx. In the new world, /fun meant /posts/details/?category=fun. This sub-folder thing only works for the set of well-defined categories on the site (all of which are entries in the database, of course), but if you want to do sub-string search across categories on my site you can, e.g. /posts/details/?category=foo.
Kept sub-folder URLs, e.g. /tinysells and /writing: I still liked these URLs, so I kept them and built controllers to handle them.
Using the IIS URL Rewriter: This was the big gun. Jon Galloway, who was invaluable in this work, turned me onto it and I’m glad he did. The URL Rewriter is a small, simple add-in to IIS7 that lets you describe patterns and rules for forwarding when those patterns are matched. I have something like a dozen patterns that do the work to forward 100s of URLs that are in my own content and might be out in the world. And it works so, so well. Highly recommended.

So, with a combination of data cleaning to make my content work across both the old site and the new site under development, making some of my old URLs work because of conventions I adopted that I wanted to keep and URL rewriting, I had a simple, feature-complete, 100% data-driven re-implementation of sellsbrothers.com.

What’s New?

Of course, I couldn’t just reimplement the site without doing something new:

Way, way faster. SQL Server 2008 and EF4 make the site noticibly faster. I love it. Surfing from my box, as soon as the browser window is visible, I’m looking at the content on my site. What’s better than that?
I made tinysells.com work again, e.g. tinysells.com/42. I broke when I moved it from simpleurl.com to godaddy.com. Luckily, godaddy.com was just forwarding to sellsbrothers.com/tinysells/, so that was easy to implement with a MVC controller. That was all data I already had in the database because John Elliot, another helper I had on the site a while ago, set it up for me.
I added reCAPTCHA support: Now I’m hoping I won’t have to moderate comments at all. So far, so good. Also, I added the ability to add HTML content, which is encoded, so it comes right back the way it went in, i.e. no action scripts or links or anything a spammer would want but the characters a coder wants putting content into a technical blog.
Per category ATOM and OData feeds (and RSS feeds, too, if you care). For example, if you click on the ATOM or OData icons on the home page, you’ll get the feed for everything. However, if you click on it on one of the category pages, e.g. /fun, you’ll get a feed filtered by category.
Paging in OData and HTML: This lets you scroll to the bottom of both the OData feed and the HTML page to scroll backwards and forwards in time.
New layout including fixed-sized content area for readability, google ads and bing search (I’d happily replace google ads with bing ads if they’d let me).
Nearly every sub-page is category driven, although even the ones that aren’t, e.g. /tinysells and /writing and still completely data-driven. Further, the writing page is so data-driven that if the data is just an ISDN, it creates an ASIN associate ID for amazon.com. Buy those books, people! : )

The Room for Improvement

As always, there’s a long list of things I wish I had time to do:

The way I handle layout is with tables ’cuz I couldn’t figure out how to make CSS do what I wanted. I’d love expert help!
Space preservation in comments so code is formatted correctly. I don’t actually know the right way to go about this.
Blog Conversations: The idea here is to let folks put their email on a forum comment so that when someone else comments, they’re notified. This happens on forums and Facebook now and I like it for maintaining a conversation over time.
In spite of my principle, I didn’t get 100% of the HTML content on the site into the database. Some of the older, obscure stuff is still in HTML. It’s still reachable, but I haven’t motivated myself to get every last scrap. I will.
I can easily expose more via OData. I can’t think why not to and who knows what folks might want to do with the data.
I could make the site a little more readable on mobile devices.
I really need full support for AtomPub so I can use Windows Live Writer.
I’d like to add the name of the article into the URL (apparently search engines like that kind of thing : ).
Pulling book covers on the writing page from the ISBN number would liven up the joint, I think.
Pass the SEO Toolkit check. (I’m not so great just now.)

Luckily, with the infrastructure I’ve got in place now, laying in these features over time will be easy, which was the whole point of doing this work in the first place.

Where are we?

All of this brings me back to one more principle. I call it Principle Zero:

thou shalt make everything data-driven.

I’m living the data-driven application dream here, people. When designing my data model and writing my code, I imagined that sellsbrothers.com was but one instance of a class of web applications and I kept all of the content, down to my name and email address, in the database. If I found myself putting data into the code, I figured out where it belonged in the database instead.

This lead to all kinds of real-world uses of the database features of Visual Studio, including EF, OData, Database projects, SQL execution, live table-based data editing, etc. I lived the data-driven dream and it yielded a web site that’s much faster and runs on much less code:

Old site:
- 191 .aspx file, 286 KB
- 400 .cs file, 511 KB
New site:
- 14 .aspx files, 19 KB
- 34 .cs files, 80 KB

Do the math and that’s 100+% of the content and functionality for 10% of the code. I knew I wanted to do it to gain experience with our end-to-end data stack story. I had no idea I would love it so much.

May 11, 2010 data

Entity Framework 4.0 POCO Classes and Data Services

If you’ve flipped on the POCO (Plain Ol’ CLR Objects) code generation T4 templates for Entity Framework to enable testing or just ’cuz you like the code better, you might find that you lack the ability to expose that same model via Data Services as OData (Open Data). If you surf to the feed, you’ll likely see something like this:

The XML page cannot be displayed

Cannot view XML input using XSL style sheet. Please correct the error and then click the Refresh button, or try again later.

The following tags were not closed: feed. Error processing resource ‘http://localhost:10749/MyODataEndpoint.svc/Posts’

There are two problems. The first problem is that we’re not reporting the problem very well. You can’t see what’s happening in IE8 with a simple View Source, as apparently IE won’t show malformed XML. Instead, you have to use Fiddler or some other tool (I’m a big tcpTrace fan) to see the actual error in the HTTP response:

<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<feed ...>
<title type="text">Posts</title>
<id>http://localhost:8080/MyODataEndpoint.svc/Posts</id>
<updated>2010-05-11T22:48:13Z</updated>
<link rel="self" title="Posts" href="Posts" />
<m:error>
<m:code></m:code>
<m:message xml:lang="en-US">Internal Server Error. The type
'System.Data.Entity.DynamicProxies.Post_CF2ABE5AD0B93AE51D470C9FDFD72E780956A6FD7294E0B4205C6324E1053422'
is not a complex type or an entity type.</m:message>
</m:error>

It’s in the creation of the OData feed that the error happens, so instead of clearing the response and just returning the error, we dump it into the middle of the output, making it very difficult to find. In this case, what we’re telling you is that you’ve mistakenly left dynamic proxy creation on, which doesn’t work with EF4 POCO objects and Data Services in .NET 4.0. To fix this, you need to override the CreateDataSource method in your DataService derived class:

public class MyODataEndpoint : DataService<FooEntities> {
  public static void InitializeService(DataServiceConfiguration config) {
    ...
  }

  protected override sellsbrothersEntities CreateDataSource() {
    var dataSource = new FooEntities();
    dataSource.ContextOptions.ProxyCreationEnabled = false;
    return dataSource;
  }
}

This solution came from Shyam Pather, a Dev Manager on the EF team. He says that once you turn off proxy generation, you give up lazy loading and “immediate” change tracking. Instead, you’ll get “snapshot” change tracking, which means the context won’t be informed when the properites are changed, but the context still detects changes when you call DetectChanges() or SaveChanges(). For the internals of a Data Service, none of this matters, but any code you write in query interceptors, change interceptors, or service operations will have to be aware of this.

This limitations are only true when used from the OData endpoint, of course. The rest of your app will get proxy creation by default unless you turn it off.

May 7, 2010 fun

Working Hard: WhirlyBall

What my team does on an average Wednesday afternoon:

It was surprisingly fun.

May 6, 2010

We’re taking OData on the Road!

We have a series of free, day-long events we’re doing around the world to show off the beauty and wonder that is the Open Data Protocol. In the morning we’ll be showing you OData and in the afternoon we’ll help you get your OData services up and running. Come one, come all!

New York, NY - May 12, 2010
Chicago, IL - May 14, 2010
Mountain View, CA - May 18, 2010
Shanghai, China - June 1, 2010
Tokyo, Japan - June 3, 2010
Reading, United Kingdom - June 15, 2010
Paris, France - June 17, 2010

Your speakers are going to include Doug Purdy, so book now. Spots are going to go fast!

April 20, 2010 oslofeaturedcontent

SQL Server Modeling CTP (November 2009 Release 3) for Visual Studio 2010 RTM Now Available

Here’s what Kraig has to say about the November 2010 SQL Server Model CTP that matches the RTM of Visual Studio 2010:

A update of the SQL Server Modeling CTP (November 2009) that’s compatible with the official (RTM) release of Visual Studio 2010 is now available on the Microsoft Download Center. This release is strictly an updated version of the original November 2009 CTP release to support the final release of Visual Studio 2010 and .NET Framework 4.

We highly recommend you uninstall and install in the following order.

Uninstall any existing SQL Server Modeling CTP from Add and Remove Programs
Uninstall Visual Studio 2010 and .NET Framework 4 Beta 2 or RC from Add and Remove Programs
Install Visual Studio 2010 and .NET Framework 4
Install the SQL Server Modeling November 2009 CTP Release 3.

If you are unable to uninstall SQL Server Modeling CTP from Add and Remove Programs for any reason, you can remove each component using the following command lines. You need to run all three in order to completely remove SQL Server Modeling CTP so you can install the new CTP:

M Tools: Msiexec /x {B7EE8AF2-3DCC-4AFE-8BD2-5A6CE9E85B3A}

Quadrant: Msiexec /x {61F3728B-1A7D-4dd8-88A5-001CBB9D2CFA}

Domains: Msiexec /x {11DA75C8-10AB-4288-A1BB-B3C2593524A7}

Note: These steps will not remove the SQL Server Modeling CTP entry in Add and Remove Programs but you will be able to install the new CTP.

Thank you and enjoy Visual Studio 2010!

Kraig Brockschmidt
Program Manager, Data Developer Center

April 5, 2010 .net

The performance implications of IEnumerable vs. IQueryable

It all started innocently enough. I was implementing a “Older Posts/Newer Posts” feature for my new web site and was writing code like this:

IEnumerable<Post> FilterByCategory(IEnumerable<Post> posts, string category) {
  if( !string.IsNullOrEmpty(category) ) {
    return posts.Where(p => p.Category.Contains(category));
  }
}
...
  var posts = FilterByCategory(db.Posts, category);
  int count = posts.Count();
...

The “db” was an EF object context object, but it could just as easily been a LINQ to SQL context. Once I ran this code, it failed at run-time with a null reference exception on Category. “That’s strange,” I thought. “Some of my categories are null, but I expect the ‘like’ operation in SQL to which Contains maps to skip the null values.” That should’ve been my first clue.

Clue #2 was when I added the null check into my Where expression and found that their were far fewer results than I expected. Some experimentation revealed that the case of the category string mattered. “Hm. That’s really strange,” I thought. “By default, the ‘like’ operation doesn’t care about case.” Second clue unnoticed.

My 3rd and final clue was that even though my site was only showing a fraction of the values I knew where in the database, it had slowed to a crawl. By now, those of you experienced with LINQ to Entities/SQL are hollering from the audience: “Don’t go into the woods alone! IEnumerable kills all the benefits of IQueryable!”

See, what I’d done was unwittingly switched from LINQ to Entities, which takes my C# expressions and translates them into SQL, and was now running LINQ to Objects, which executes my expressions directly.

“But that can’t be,” I thought, getting hot under the collar (I was wearing a dress shirt that day — the girlfriend likes me to look dapper!). “To move from LINQ to Entities/SQL to LINQ to Objects, I thought I had to be explicit and use a method like ToList() or ToArray().” Au contraire mon fraire (the girlfriend also really likes France).

Here’s what I expected to be happening. If I have an expression like “db.Posts” and I execute that expression by doing a foreach, I expect the SQL produced by LINQ to Entities/SQL to look like this:

select * from Posts

If I add a Where clause, I expect the SQL to be modified:

select * from Posts where Category like '%whatever%'

Further, if I do a Count on the whole thing, e.g.

db.Posts.Where(p => p.Contains(category)).Count()

I expect that to turn into the following SQL:

select count(*) from Posts where Category like '%whatever%'

And that’s all true if I keep things to just “var” but I wasn’t — I was being clever and building functions to build up my queries. And because I couldn’t use “var” as a function parameter, I had to pick a type. I picked the wrong one: IEnumerable.

The problem with IEnumerable is that it doesn’t have enough information to support the building up of queries. Let’s take a look at the extension method of Count over an IEnumerable:

public static int Count<TSource>(this IEnumerable<TSource> source) {
  ...
  int num = 0;
  using (IEnumerator<TSource> enumerator = source.GetEnumerator()) {
    while (enumerator.MoveNext()) { num++; }
  }
  return num;
}

See? It’s not composing the source IEnumerable over which it’s operating — it’s executing the enumerator and counting the results. Further, since our example IEnumerator was a Where statement, which was in turn a accessing the list of Posts from the database, the effect was filtering in the Where over objects constituted from the following SQL:

select * from Posts

How did I see that? Well, I tried hooking up the supremely useful SQL Profiler to my ISP’s database that was holding the data, but I didn’t have permission. Luckily, the SQL tab in LinqPad will show me what SQL is being executed and it showed me just that (or rather, the slightly more verbose and more correct SQL that LINQ to Entities generates in these circumstances).

Now, I had a problem. I didn’t want to pass around IEnumerable, because clearly that’s slowing things down. A lot. On the other hand, I don’t want to use ObjectSet because it doesn’t compose, i.e. Where doesn’t return that. What is the right interface to use to compose separate expressions into a single SQL statement? As you’ve probably guessed by now from the title of this post, the answer is: IQueryable.

Unlike IEnumerable, IQueryable exposes the underlying expression so that it can be composed by the caller. In fact, if you look at the IQueryable implementation of the Count extension method, you’ll see something very different:

public static int Count<TSource>(this IQueryable<TSource> source) {
  ...
  return source.Provider.Execute<int>(
    Expression.Call(null,
      ((MethodInfo) MethodBase.GetCurrentMethod()).
      MakeGenericMethod(
        new Type[] { typeof(TSource) }),
        new Expression[] { source.Expression }));
}

This code isn’t exactly intuitive, but what’s happening is that we’re forming an expression which is composed of whatever expression is exposed by the IQueryable we’re operating over and the Count method, which we’re then implementing. To get this code path to execute for our example, we simply have to replace the use of IEnumerable with IQueryable:

IQueryable<Post> FilterByCategory(IQueryable<Post> posts, string category) {
  if( !string.IsNullOrEmpty(category) ) {
    return posts.Where(p => p.Category.Contains(category));
  }
}
...
  var posts = FilterByCategory(db.Posts, category);
  int count = posts.Count();
...

Notice that none of the actual code changes. However, this new code runs much faster and with the case- and null-insensitivity built into the ‘like’ operator in SQL instead of semantics of the Contains method in LINQ to Objects.

The way it works is that we stack one IQueryable implementation onto another, in our case Count works on the Where which works on the ObjectSet returned from the Posts property on the object context (ObjectSet itself is an IQueryable). Because each outer IQueryable is reaching into the expression exposed by the inner IQueryable, it’s only the outermost one — Count in our example — that causes the execution (foreach would also do it, as would ToList() or ToArray()).

Using IEnumerable, I was pulling back the ~3000 posts from my blog, then filtering them on the client-side and then doing a count of that.With IQueryable, I execute the complete query on the server-side:

select count(*) from Posts where Category like '%whatever%'

And, as our felon friend Ms. Stewart would say: “that’s a good thing.”

April 2, 2010 spout

College info for my sophomore

I went to a college planning sessions at my sons’ high school not because I’m hung up on getting my Sophomore into a top school, but because I thought I’d get a jump on things. I learned I was actually behind.

For one, I learned that the high school has an online system that will do some amazing things:

It will give my son a personality test and an interest test.
From those tests, it will tell him what kinds of careers he might want to consider.
Based on those careers, what major should he have.
From the major, what schools around the country offer it.
In those schools, what the entrance requirements are.

That means that my son can answer questions about personality and interests and draw a straight line through to what he needs to do to get into a school so he can learn to do the jobs he’ll like and be good at. Holy cow. We didn’t have anything like that when I was a kid.

Further, the online system has two complete SAT and ACT tests in it, so, along with the PSAT that he’s already taking, he can do a practice ACT, figure out which test he’s best at (my 34 ACT score was way better than my 1240 SATs) and just take that test, since most schools these days take both SAT or ACT results.

This is all freely provided by the high school and, in fact, they have counseling sessions with the students at each grade level for them to get the most from this system.

It’s no wonder that 93% of students from this high school go on to 4 or 2-year college degree programs.

That was the good part.

The scary part is that my eldest, half way through his Sophomore year, is essentially half-way through his high school career. Colleges only see their grades through the end of Junior year, since most college applications are due in the middle of January of their Senior year at the latest. I have to sit down with my son and have the conversation about how “even if you get a 4.0 from now on, the best grades you can have are…”

Is it just me or is the world moving faster with each passing day?

March 27, 2010 tools

Updated the CsvFileTester for Jet 4.0

I was playing around building a tool to let me edit a database table in Excel, so I updated my CvsFileTester project to work in a modern world, including the 32-bit only Jet 4.0 driver you’ve probably go lying around on your HD.

Enjoy.

← Newer Entries Older Entries →