May 2005 - Posts

Ramblings on Data Layers and Relational Database Shortcomings

Sahil recently posted his thoughts on web services and enterprise system design, and in his usual blunt (yet effective and entertaining) manner questioned the need for business layers that simply delegate to a data layer without any extra work. It got me thinking. I get asked this question too, and my usual answer is usually three fold:

1. "because you'll need to isolate the specifics of the database when you come to change the data model later on" -- I mean everyone has to change their data model at some point, just because the next version needs some new features that require table changes. It's to save time in the long run.

2. In many cases your most important asset is your business logic. You don't want your business logic written in terms of the specifics of your database, because that reduces the value of your business logic and makes it harder to be maintained by the people who know more about the business than the database.

3. Commingling data access code and business logic also makes it harder to have your business logic work on more than one data source, or new data sources in the future. The general rule of thumb is that the data layer *finds* the data, and the business layer *uses* the data. Maybe you'll need to integrate your application with another that passes data in through XML files, how will your SQL-ridden business layer handle that? Maintaining the separation between the data finder and the data consumer ensures your business logic remains agile enough to work with multiple data sources.

Of course for the first point you could say the same thing about the business layer. This too will change going forward and you'll still have to change method parameters, objects etc. for the next version.

The second point deserves some discussion as it makes more sense. But I think it underlines one very simple fact: Databases are too hard. You don't want your business logic and data manipulation code together because they're seemingly incompatible. The data access code isn't just accessing data, it's decoding data. The data layer tries to make sense of the data in the database - but if the database could model data the way we use it, life wouldn't be so complicated in the first place!

Relational databases are very powerful, but they're also very distant from how we interact with that data once we get hold of it. Relational databases were first developed in the 70s, and were in use long before modern object orientated programming languages, and it shows. In fact the funny thing is, commercial relational databases today aren't really all that relational at all -- it's not like relations are a central feature of relational databases - the ability to relate data is central, but not the relationship itself. When I approach the data I'm often faced with having to find the views that give me the model I'm looking for, or recreate the relationships all over again in a new select statement. The list of tables resembles a flat-file. There's no central relational data model as your first point of access.

Also in the majority of cases (in databases in which I've worked on), there is only one primary relationship between two entities. By primary I mean the relationship you would use 90% of the time. For example, tasks and sub-tasks. Folders and files. Projects and tasks. Cars and parts. Patients and their medical history records. Customers and orders. Stocks and their historic prices. Sure you will likely need to relate this data differently at some point, but wouldn't it be useful to at least be given the primary relationships for free without having to do any joins at all? The way relational databases work today, you really have to dig around to find those relationships - even when they're obvious.

The primary relationship also helps identify the meaning of the data. For example, an 'orders' table doesn't tell you much other than the fact it probably contains orders for something. Opening up the table and seeing all those foreign keys probably won't help you much either. But when you can see a primary relationship between customers and orders, such that when you look at a customer it can take you straight to an order, it tells you that these are orders placed by your customers, as opposed to purchase req's for example. It adds meaning to the table that isn't otherwise obvious.

I think it's these failings of relational databases that are central to how unmanageable databases can become, and how complex data layers can develop.

Hierarchical data is a great way to present a primary relationship between entities. Hierarchies provide a very clear relationship -- child and parent -- between the different entities in your database. Relational databases don't make it easy to view your data hierarchically. I think this is why XML databases are coming into fashion. XML databases already have a hierarchical structure in place, and the hierarchy represents the primary relationship between entities. All major relational database vendors seem to be rushing to integrate both XML and semi-structured data access into their databases.

Another good example of hierarchical data is an object database. Object databases usually just persist the data in your objects transparently. Have you ever heard of a data layer in a system using an object database? You don't really need it because the data representation on disk is transparent: the data *is* the business objects. In object databases you maintain a separation between the specific data source and the business logic through interfaces, and this is usually enough of a separation to not require a data layer.

Exposing primary relationships is one of the goals of a good data layer. Yet no matter how good the data layer is, someone usually ends up bypassing it, especially in smaller shops where there's no real DBA. Reports are a classic example of this. How many times have systems spent months designing a great data layer and business layer and then developed their reports as stored procedures and crystal reports files? It's the easiest option and I see it all the time.

Yet if the database was structured and exposed in a more meaningful way not to require a data layer, directly accessing the data might not be such a bad thing. If the query language could return data in a format that fits in with modern programming language structures, then having the business layer talk straight to the query language might not be such a bad thing.

XML databases, XML access layers and the XML query language, XQuery, hold a lot of promise for revolutionizing the way we interact with persistent data. SQLXML, SQL Server 2005's XML integration, Oracle's 10g v2 and IBM's Viper efforts to expose relational data as hierarchical XML are great steps towards solving these problems and taking the focus away from complex data layers.

This brings me on to the third point -- data source dependency. Common wisdom is that your business logic should never request the data itself, it should always be provided the data, so as to remove any dependency on that particular source. Once it has the data, it doesn't matter where it came from so long as it's in a format it can understand -- be that a dataset, collection, XML or whatever. This is the foundation of what's become known as "contract first" programming.

A part of this problem is rooted in the fact that database query languages are so different to one another. Once a query language is adopted, you're bound to a particular data platform. Business logic uses OOP techniques to query data in an object model, the data layer uses SQL to query data in the database, and the user interface calls the business logic methods to query data. If they all used the same query language, we might not need quite so much separation and layering.

In fact, with the same query language, the only extra layer we would require would be a translator that can translate one model to another. And this translation could be accomplished with a declarative language like XSLT, not a general purpose programming language. If the data source was configurable externally and used the same query language, then switching data sources or models wouldn't require any code changes regardless of how many layers there were.

Perhaps W3C's XQuery or language innovations such as Microsoft Research's C-Omega project hold the key to solving a part of this problem, providing one ubiquitous query syntax for all data sources and all layers. Why should my UI be limited to a very restrictive business logic API for querying data? Why shouldn't it be able to run a query on the business layer? After all, objects are data too.

Anyway these are all just ongoing thoughts and a work-in-progress, as such there's probably lots of holes in my reasoning. If you have anything to add on the subject I'd be more than open to hearing your opinions.

The Usefulness Of #warning In C#

There are often times when I'm writing code that I knowingly take a shortcut, sometimes because of a deadline, sometimes because I'm impatient and just want it to work, and sometimes because of pure laziness. It might not even be a shortcut, just that I'm aware that this one particular bit of code will be critical to the performance of the application and might need reviewing to make sure it's written to run as optimal as possible. 

The guilt of writing such code is often unbearable so I decorate the area of code with something that reminds me to go back and visit it later. For this I use the #warning, which lets you generate a level 1 warning in your source code. The warning will appear every time you compile so it's quite difficult to ignore. I know that some use // TO DO, but personally I don't find this very useful because the todo item appears in your task list and is lost / filtered out when you build.

It's also something that's easy to search on, for those days when you're looking for some code to optimize - you just do a find-in-files for #warning and that's your task list for the day.

tilde: XML Websites Without Any Programming

Click here to download a tool I wrote for creating data-driven websites without any programming or databases, from any editor capable of outputting HTML.

I recently finished recreating my website (http://www.servicestuff.com, in about 3 hours) using this tool. I've called the tool "tilde the site builder", it's simple to setup and use, and its purpose is to let you create data-driven websites in your favorite designer (Word, FrontPage, anything that outputs HTML). The data comes from an XML file, and simple commands that can be inserted along with the text of your site let you bind to different parts of the XML file.

eg. When editing the web page, rather than type your company name, put ~/companyInfo/name~ (an xpath) and tilde will automatically pull that path out of the XML file and insert it into the file. It also pulls sequences out of the file (eg. $/companyInfo/employee$) and can put the result into tables with similar syntax. And you don't have to edit any HTML by hand, you just focus on the WYSIWYG aspect of editing the page. And this substitution is all done server side so you don't have to rely on any specific browser.

Tilde is a very cool tool for several reasons:

1. It's really simple.
2. I can use whichever designer/editor I want -- I'm not stuck with Visual Studio. A friend of mine used Excel to generate a data-driven website! Personally I use a combination of FrontPage and Microsoft Publisher.
3. I just need to design a couple of pages and the XML creates the rest of the site.
4. No need for a SQL database and the headache of setting up and maintaining a database, not to mention the price.
5. I can keep a backup and history of my entire site content, by putting the XML file into source control.
6. It uses XPath to address parts of the XML, and XPath is cool.
7. It even supports master pages, so you don't have to copy banners and navigation bars to each page.
8. It comes with a little web server of its own, so you can design your site locally off-line and just upload it when you're done.
9. It works with any ASP.Net enabled site, just by copying files on to the server. You can make the site live in minutes.
10. It's a really simple way to target multiple platforms without having to recreate code or data (WEP, PocketPC etc.)

It's a lot like using XSLT without having to know XSLT. XPath and the tilde commands are really simple to learn compared to having to know all of XSLT, I even provide a quick intro to XPath in my guide to using tilde-SiteBuilder.

Anyway, while I'm testing it you can download it for free, so go ahead and try it out and let me know what you think.

Also, if you're interested in the templates I used on my site, add a comment and I'll mail them to you.