The new sellsbrothers.com implementation has been a while in the making. In fact, I’ve had the final art in my hands
since August of 2005. I’ve tried several times to sit down and rebuild my
15-year-old sellsbrothers.com completely from scratch using the latest tools.
This time, I had a book contract (“Programming Data,” Addison-Wesley, 2010) and
I needed some real-world experience with Entity Framework 4.0 and OData, so I
fired up Visual Studio 2010 a coupla months ago and went to town.
The Data Modeling
The first thing I did was design my data model. I started very small with
just Post and Comment. That was enough to get most of my content in. And that
lead to my first principle (we all need principles):
thou shalt have no .html files served from the file system.
On my old site, I had a mix of static and dynamic content which lead to all
kinds of trouble. This time, the HTML at least was going to be all dynamic. So, once I had my
model defined, I had to import all of my static data into my live system. For
that, I needed a tool to parse the static HTML and pull out structured data.
Luckily, Phil Haack came to my rescue here.
Before he was a Microsoft employee in charge of MVC, Phil was well-known
author of the SubText open source CMS project. A few years ago, in one of my
aborted attempts to get my site into a reasonable state (it has evolved from a
single static text file into a mish-mash of static and dynamic content over 15
years), I asked Phil to help me get my site moved over to SubText. To help me
out, he built
the tool that parsed my static HTML, transforming the data into the SubText database format.
For this, all I had to do was transform the
data from his format into mine, but before I could do that, I had to hook my
schema up to a real-live datastore. I didn’t want to have to take old web site
down at all; I wanted to have both sites up and running at the same time. This
lead to principle #2:
thou shalt keep both web sites running with the existing live set of
data.
And, in fact, that’s what happened. For many weeks while I was building my
new web site, I was dumping static data into the live database. However, since
my old web site sorting things by date, there was only one place to even see
this old data being put in (the /news/archive.aspx page). Otherwise, it was all
imperceptible.
To make this happen, I had to map my new data model onto my existing data. I
could do this in one of two ways:
- I could create the new schema on my ISP-hosted SQL Server 2008 database
(securewebs.com rocks, btw — highly recommended!) and move the data over.
- I could use my existing schema and just map it on the client-side using
the wonder and beauty that was EF4.
Since I was trying to get real-world experience with our data stack, I tried
to use the tools and processes that a real-world developer has and they often don’t
get to change the database without a real need, especially on a running system. So, I went with option #2.
And
I’m so glad I did. It worked really, really well to change names and select
fields I cared about or didn’t care about all from the client-side without ever
touching the database. Sometimes I had to make database changes and when that happened, I
has careful and deliberate, making the case to my inner DB administrator, but
mostly I just didn’t have to.
And when I needed whole new tables of data, that lead to another principle:
build out all new tables in my development environment first.
This way, I could make sure they worked in my new environment and could
refactor to my heart’s content before disturbing my (inner) DB admin with
request after request to change a live, running database. I used a very simple
repository pattern in my MVC2 web site to hide the fact that I was actually
accessing two databases, so when I switched everything to a single database,
none of my view or controller code had to change. Beautiful!
Data Wants To Be Clean
And even though I was careful to keep my schema the same on the backend and
map it as I wanted in my new web site via EF, not all of my old data worked in my new
world. For example, I was building a web site on my local box, so anything with
a hard-coded link to sellsbrothers.com had to be changed. Also, I was using a
set of
To do this cleaning, I used a combination of
LINQPad,
SSMS and EF-based C#
code to perform data cleaning tasks. This yielded two tools that I’m still
using:
- BlogEdit: An unimagintively named general-purpose post
and comment creation and editing tool. I built the first version of this
long before WPF, so kept hacking on it in WinForms (whose data binding sucks
compared to WPF, btw) as I needed it to have new features. Eventually I gave
this tool WYSIWIG HTML editing by shelling out to Expression Web, but I need
real AtomPub support on the site so I can move to Windows Live Writer for
that functionality in the future.
- BulkChangeDatabaseTable: This was an app that I’d use
to run my questions to find “dirty” data, perform regular expression
replaces with and then — and this is the best part — show the changes in
WinDiff so I could make sure I was happy with the changes before commiting
them to the database. This extra eyeballing saved me from wrecking a bunch
of data.
During this data cleaning, I applied one simple rule that I adopted early and
always regretted when I ignored:
thou shalt throw away no data.
Even if the data didn’t seem to have any use in the new world, I kept it. And
it’s a good thing I did, because I always, always needed it.
For example, when I ran Phil’s tool to parse my static web pages, he pulled
out the tags that went with all of my static posts. I wasn’t
going to use them to build permalinks, why did I need them?
I’ll tell you why: because I’ve got 2600 posts in my blog from 15 years of
doing this, I cross-link to my own content all the live-long day and a bunch of
those cross-links are to, you guessed it, to what used to be static data. So, I
have to turn links embedded in my content of the form “/writing/#footag” into
links of the form “/posts/details/452″. But how do I look up the mapping between
“footag” and “452″? That’s right — I actually went to my (inner) DB admin and
begged him for a new column on my live database called “EntryName” where I
tucked the data as I imported the data from Phil’s tool, even
though I didn’t know why I might need it. It was a good principle.
Forwarding Old Links
And how did I even figure out I had all those broken links? Well, I asked my
good friend and web expert Kent Sharkey how to make sure my site was at least
internally consist before I shipped it and he recommended
Xenu Link Sleuth for the
job. This lead to another principle:
thou shalt ship the new site with no broken internal links.
Which was followed closely by another principle:
thou shalt not stress over broken links to external content.
Just because I’m completely anal about making sure every link I ever pass out
to the world stays valid for all eternity doesn’t mean that the rest of the
world is similiarly anal. That’s a shame, but there’s nothing I can do if little
sites like microsoft.com decide to move things without a forwarding address. I
can, however, make sure that all of my links worked internally and I used Xenu
to do that. I started out with several hundred broken links and before I shipped
the new site, I had zero.
Not all of that was changing old content, however. In fact, most of it
wasn’t. Because I wanted existing external links out in the world to find the
same content in the new place, I had to make sure the old links still worked. That’s not to say I was a slave to the old URL format, however. I didn’t
want to expose .aspx extensions. I wanted to do things the new, cool, MVC way,
i.e. instead of /news/showTopic.aspx?ixTopic=452 (my old format), I wanted
/posts/details/452. So, this lead to a new principle:
thou shalt built the new web site the way you want and make the old URLs
work externally.
I was using MVC and I wanted to do it right. That meant laying out the “URL
space” the way it made sense in the new world (and it’s much nicer in general,
imo). However, instead of changing my content to use this new URL schema, I used
it as a representative sample of how links to my content in the real-world might
be coming into my site, which gave me initial data about what URLs I needed to
forward. Ongoing, I’ll dig through 404 logs to find the rest and make those URLs
work appropriately.
I used a few means of forwarding the old URLs:
- Mapping sub-folders to categories: In the old site, I
physically had the files in folders that matched the sub-folders, e.g. /fun
mapped to /fun/default.aspx. In the new world, /fun meant
/posts/details/?category=fun. This sub-folder thing only works for the set
of well-defined categories on the site (all of which are entries in the
database, of course), but if you want to do sub-string search across
categories on my site you can, e.g. /posts/details/?category=foo.
- Kept sub-folder URLs, e.g. /tinysells and /writing: I
still liked these URLs, so I kept them and built controllers to handle them.
- Using the IIS
URL Rewriter: This was the big gun.
Jon Galloway, who was
invaluable in this work, turned me onto it and I’m glad he did. The URL
Rewriter is a small, simple add-in to IIS7 that lets you describe patterns and rules for
forwarding when those patterns are matched. I have something like a dozen
patterns that do the work to forward 100s of URLs that are in my own content
and might be out in the world. And it works so, so well. Highly recommended.
So, with a combination of data cleaning to make my content work across both
the old site and the new site under development, making some of my old URLs work
because of conventions I adopted that I wanted to keep and URL rewriting, I had
a simple, feature-complete, 100% data-driven re-implementation of
sellsbrothers.com.
What’s New?
Of course, I couldn’t just reimplement the site without doing something new:
- Way, way faster. SQL Server 2008 and EF4 make the site
noticibly faster. I love it. Surfing from my box, as soon as the browser
window is visible, I’m looking at the content on my site. What’s better than
that?
- I made tinysells.com work again, e.g.
tinysells.com/42. I broke when I moved
it from simpleurl.com to godaddy.com. Luckily, godaddy.com was just
forwarding to sellsbrothers.com/tinysells/
, so that was easy to
implement with a MVC controller. That was all data I already had in the
database because John Elliot, another helper I had on the site a while ago,
set it up for me.
- I added reCAPTCHA support: Now I’m hoping I won’t have
to moderate comments at all. So far, so good. Also, I added the ability to
add HTML content, which is encoded, so it comes right back the way it went
in, i.e. no action scripts or links or anything a spammer would want but the
characters a coder wants putting content into a technical blog.
- Per category ATOM and OData feeds (and RSS feeds, too,
if you care). For example, if you click on the ATOM or OData icons on the
home page, you’ll get the feed for everything. However, if you click on it
on one of the category pages, e.g. /fun, you’ll get a feed filtered by
category.
- Paging in OData and HTML: This lets you scroll to the
bottom of both the OData feed and the HTML page to scroll backwards and
forwards in time.
- New layout including fixed-sized content area for
readability, google ads and bing search (I’d happily replace google ads with
bing ads if they’d let me).
- Nearly every sub-page is category driven, although even
the ones that aren’t, e.g. /tinysells and
/writing and still completely
data-driven. Further, the writing page is so data-driven that if the data is
just an ISDN, it creates an ASIN associate ID for amazon.com. Buy those
books, people! : )
The Room for Improvement
As always, there’s a long list of things I wish I had time to do:
- The way I handle layout is with tables ’cuz I couldn’t
figure out how to make CSS do what I wanted. I’d love expert help!
- Space preservation in comments so code is formatted correctly.
I don’t actually know the right way to go about this.
- Blog Conversations: The idea here is to let folks put
their email on a forum comment so that when someone else comments, they’re
notified. This happens on forums and Facebook now and I like it for
maintaining a conversation over time.
- In spite of my principle, I didn’t get 100% of the HTML content
on the site into the database. Some of the older, obscure stuff is
still in HTML. It’s still reachable, but I haven’t motivated myself to get
every last scrap. I will.
- I can easily expose more via OData. I can’t think why
not to and who knows what folks might want to do with the data.
- I could make the site a little more readable on mobile devices.
- I really need full support for AtomPub so I can use
Windows Live Writer.
- I’d like to add the name of the article into the URL (apparently search
engines like that kind of thing : ).
- Pulling book covers on the writing page from the ISBN number
would liven up the joint, I think.
- Pass the SEO Toolkit check. (I’m
not so great just now.)
Luckily, with the infrastructure I’ve got in place now, laying in these
features over time will be easy, which was the whole point of doing this work in
the first place.
Where are we?
All of this brings me back to one more principle. I call it Principle Zero:
thou shalt make everything data-driven.
I’m living the data-driven application dream here, people. When designing my
data model and writing my code, I imagined that sellsbrothers.com was but one instance of a class of web
applications and I kept all of the content, down to my name and email address,
in the database. If I found myself putting data into the code, I figured out
where it belonged in the database instead.
This lead to all kinds of real-world uses of the database features of Visual Studio, including EF, OData, Database
projects, SQL execution, live table-based data editing, etc. I lived the
data-driven dream and it yielded a web site that’s much faster and runs on much
less code:
- Old site:
- 191 .aspx file, 286 KB
- 400 .cs file, 511 KB
- New site:
- 14 .aspx files, 19 KB
- 34 .cs files, 80 KB
Do the math and that’s 100+% of the content and functionality for 10% of the
code. I knew I wanted to do it to gain experience with our end-to-end data stack
story. I had no idea I would love it so much.