Database
Database
I was finally able to attend the June Atlanta Cutting Edge .NET UG meeting yesterday. Shawn Wildermuth gave a great presentation the on VS2008 SP1 data technologies: ADO.NET Entity Framework and ADO.NET Data Services (Astoria).
Shawn's recommendations were:
- The Entity Framework (EF) is for enterprises with a large, stable data store
- The most important part of EF is the Entity Data Model (EDM) itself. It is very robust in the mappings it can handle and the extensibility points available.
- Enterprises should hold off and see whether Microsoft and third parties will invest in the EDM. He recommends nHibernate for now.
- ADO.NET Data Services (DS) are only useful for Internet (not intranet or, presumably extranet) where data security is not an issue. His biggest concern was the serialization overhead. I basically interpret that to mean that you should not use DS for essentially 2 tier development.
Interesting points were:
- Shawn was unsure whether EF exposes the model to consumers. I will have to investigate. He did believe, however, that the model is extensible.
- They don't have model/db generation from objects, but they DO have round tripping between the model and the data store.
- As discussed in the July 2007 MSDN article on EF, the Designer is lightweight and you should prepare yourself to be designing in angle brackets for the 1.0 release at least.
- In 1.0, at least, EF will not support Unit of Work and POCO objects. It will support, however, what they call IPOCO, which means the standard data binding interfaces of INotifyPropertyChanged and INotifyCollectionChanged.
- EF can only model/update a single database in 1.0
- DS is a simple proxy layer (my words) over any LINQ source (IQueryable).
- DS provides a command line code code generation step that provides a client side LINQ interface for .NET and Silverlight and a LINQ-like interface for ECMAScript. Nice. Anyone have a MSBuild task for it yet?
It is common to use SQL Scripts to create or initialize a DB from a setup program. For example TS DB Pro integrates with SQLCMD variable syntax.
One issue with this is that an MSI may need access to SQLCMD.exe. I've picked up the bits from a local SQL Server 2005 installation, but you don't have to do that. Feature Pack for Microsoft SQL Server 2005 provides a redistributable package for just SQLCMD:
Microsoft SQL Server 2005 Command Line Query Utility
The SQLCMD utility allows users to connect, send Transact-SQL batches, and output rowset information from SQL Server 7.0, SQL Server 2000, and SQL Server 2005 instances. SQLCMD is a replacement for ISQL and OSQL, but can coexist with installations that have ISQL or OSQL installed.
Note: Microsoft SQL Server 2005 Command Line Query Utility requires Microsoft SQL Server Native Client, also available on this page.
Audience(s): Customer, Partner, Developer
A compilation of SQL Server T-SQL Guidelines.
-
Best Practices Analyser for Microsoft SQL Server 2000 1.0
-
Brian Walker
-
-
Vyas Kondreddi
-
CMS, Centers for Medicare and Medicaid Services, US Department of Health and Human Services
-
Craig Mullins
Let me know if you know of any other in print or online.
Update: I guess Intellisense isn't there. It's a mystery. Please give lots of comments so people know we are serious about wanting Intellisense.
I was extremely disappointed when SQL Server 2005 Management Studio and even the text editor for SQL scripts in Visual Studio 2005 did not support Intellisense for database objects. I was incredulous. This is a feature that is probably available in Team Edition for Database Professionals but not in the base IDE. I am shocked that basic editing, source control, and project features afforded other modules in VS2005 is not available to database projects.
However:
red-gate software has stepped into the gap and provided SQL Prompt, which provides Intellisense features in SQL Server 2005 Management Studio and SQL Server 2000 Query Analyser, for free.
Run, don't walk.
I've been helping my client move to new technologies (VS2005, Team System, EntLib2). Unfortunately, SQL Server 2005 isn't yet in the road map. The pilot application has the typical Employee hierarchy. In SQL Server 2005, recursive queries are a breeze using CTEs once you get the hang of it...but they're not available in SQL SERVER 2000. To approximate recursive queries, one uses Table-valued User-defined Functions (TVF) in SQL Server 2000. Unfortunately, it's impossible to use anything but a constant as the parameters in SQL Server 2000. To get around this limitation, SQL Server 2005 now has the APPLY query operator.
I call this double jeopardy!
More info on CTEs, TVFs and APPLY operator for SQL Server Magazine (subscription required)
It took me longer to find this than I expected so I'll post it here.
I've frequently been harping on SQL Server Express (SSE) as a replacement for Access and any kind of local persistent data caching or storage. Here's how you open a local database mdf file. In addition SSE can create an instance for you where the server is called a User Instances. So pick your poison. I'll have to play with it to figure out mine. If you've been playing with it for a while, what do you recommend?
Update: Added Link fro Sql Server Magazine.
Since 2001 I have advocated Guids as surrogate primary keys in database tables over identity fields. The reasons are two fold. First, I can design for "one way traffic" in that I don't need to query the identity field and wire it back in saving a lot of goo code. Second, I can wire up an object graph with persistent ids and save them in one go, saving significant round trip costs. A third side benefit is that any data error in the object graph can be tagged with the Guid so that any client can navigate to the offending object. All in all, an ORM-geek's dream.
The catch is clustered key index fragmentation. I'll point you shortly to blog post by Kimberly Tripp that explains the issue. Suffice it to say that Guid primary keys it has been a hard sell to DBAs (Having a surrogate key itself can be a hard sell). However, I believe that "Code for Clarity. Optimize Empirically" applies to databases as well. A clear data model makes everyone's life easier. So what to do?
Well, SQL Server 2005 now supports the newsequentialid function. Defined only for the default clause, it creates a unique monotonically increasing uniqeidentifer field. Kimberly Tripp gives a thumbs-up along with some great analysis. A must read.
Now the issue, of course, is that I want to create Guids in the Id properties of the objects. I can think of two steps to enable compatibility. The first step is a configurablity switch so that some model objects do not create their Guids. You would need to inactivate the objects after they are persisted or implement round-tripping the keys, which is untenable for object graph persistence. However if you need the performance you already are special-casing your code, so some extra thought for these objects should not be a real issue.
Paul Wilson asked:
I've always been under the impression that the main reason to choose guids over int (or bigint) identities was for either replication and/or so clients could generate their own unique keys. But it doesn't seem like this new sequential "guid" can be of any value in either of these situations -- so why not just use an identity?
For me the issue is object graphs. To save an object graph performantly you need to put all the inserts/updates/deletes in one batch. When you do so, you need some prewiring of the objects, since, on insert their persistant ids (if you are not using identities) would not exist yet. Even if you did some sort of use of @@identity in the batch you still have the problem of handling errors in the batch. If you have an error persisting change in an object in an object graph (or a collection of objects, for that matter) how do you communicate which object erred? You need a prewired id. Having a Guid solves all these problems.
Moreover, this benefit reaches the whole way up and down the call stack. If you have a JScript Ajax client call a web service call an other web service call the database, you need to propagate the ids and/or errors the whole way up the call chain. This either produces a lot of cumbersome coding or painful architectonic. The Guid, however is painless: anyone can generate a Guid, so objects can be created anywhere along the chain and anyone who needs a persistent handle to the object a priori gets one for free.
Thus the question remains: how do you integrate a frame work that depends on early generation of Guids with newsequentailid, which requires late generation of Guids. That was what I was addressing in this post.
Oh, a final benefit is it's easy to encrypt a Guid; some ints are rather small for encrypting, in my opinion. An encrypted Guid makes a nice opaque id in a URL. Remember, don't forget the salt!
More on newsequentialid from SQL Server Magazine (subscription required)
On a recent project, a reviewer mentioned that SQL Server Express should never be used in a corporate application because companies would never install a database on the client. This is wrong headed because of two important trends: IT Centralization and ClickOnce Deployment.
As part of an organization that had many hundreds of Access applications to migrate to a new Wide Area File System (WAFS) infrastructure I can inform you on the pain of herding a plethora of unmanaged databases. As the tools to centrally manage IT improve, it will become more and more untenable for a corporation to accept unmanageable databases.
I say unmanageable databases, because the second trend, ClickOnce Deployment, not only speaks of centralized deployment but also of unprivileged user installations; really, self-service installations. This is the "trick" of Express Edition that older school architects are missing, in my opinion. You get the deployment and database management ease of an MDB file but the power of a real database. With Expression Edition being virtually part of the operating system (something you can enforce with centralized IT), internal teams can play with MDF files just as easy as they did with MDBs, but, when it is time to grow up, the switch to a centralized server is painless.
A final trend is simply inevitability: JET and MDAC are dead; Long Live SQL Server Express Edition. Deal with it.
Karl Gram created
dbUnit, a T-SQL based unit test creation tool that integrates with NUnit. Sweet.
Here's more news on Oracle jumping on the .NET bandwagon.
Indian blogospherian Anjana notes that Oracle has just launched a .NET developer center. Time for you cross-breeds to link up. Anjana has more details.
Actually, several groups at our company are big .NET and Oracle houses. If only we had forty hour days...
One of the most frustrating features of T-SQL versus PL/SQL was always that nested scripts and command line variables were not defined. This made native “build from scratch” database scripts impossible to write in a straight forward manner. One was left with either using SQLDMO or having a script template and a preprocessor to build the scripts on the fly.
In fact, the latter is the approach I took for one former client. I defined an XML vocabulary for sequencing definable groups of SQL scripts and a token replacement scheme to approximate command line variables and pseudo-variables. Like WSH all over again. No fun...and now, no longer necessary!
Having been a version control manager for several years, I am a strong advocate of “build from scratch” databases rather than what I call “sourdough” database builds, where you have an “all the cooks in the soup” development database that you clean and copy using “Backup/Restore” or DTS. Of course there are diffing programs and code you can write to automatically shove new database objects to a version control system. I don't think it's wise to play fast and loose with, what I consider, the most valuable part of your code base.
Now databases can be bootstrapped using the standard SQL Server 2005 tool-set via SQLCMD. Sweet!
Update: Early Adoper has also picked up on the significance of using SQL Server Express (SSX) to replace Jet. Go there and read the example code. Kent Tengles shares some caveats to remember accessing SSX from OLEDB. Via Benjamin Mitchell.
Ok, a little hyperbole. Access will probably never die. However, the Jet engine can be replaced!
In his post about changes to ASP.NET 2.0, Shanku Niyogi, announces that AccessDataSource, which enabled binding to Access databases using the Jet engine is being dropped in favor of one that provides binding to SQL Server 2005 Express Edition. Some are upset, believing that Microsoft is pushing people toward SQL Server Express Edition as a marketing ploy. However, Benjamin Mitchell, and the rest of us that have had to deal with Access databases that have long outgrown their britches are rejoicing.
As referencd in the SQL Server 2005 Express Edition Overview, The key enabling feature on SQL Server Express Edition is XCopy deployment of database files. Now you can deploy real, robust databases as easily as you could the very corrupatble access database files.
MSDE was a notorious deployment bad boy. The Express Edition Overview, referenced above, explains that the problematic merge module technology is no longer available. However that presents a non-trivial deployment story. A self-installing executable or MSI file can be used. There is also a 36M auto-installing web download available. Either way, a seamless installation is probably not easily doable. The Jet installations are 4 MB although there is a different one for each operating system. More over SQL Server 2005 Express Edition requires that the .NET 2.0 runtime also be installed. Some installation creativity required.
The Express Edition install includes 50 product licenses, meaning that up to fifty programs can use the Express Edition engine for free. You only need to install it once. If only it were part of the operating system... Looks like we may have to wait until 2008 for that.
Ingo Rammer, N-Tier development expert, shares three steps for better performing and more scalable database access. His team, thinktecture, does a whole lot of architectural consulting on these issues. Highly recommended.
The three steps are:
- Separate your data according to transaction volume
- Choose an UPDATE strategy to eliminate deadlocks
- Reduce the number of database round trips with application-specific, optimized SQL.
Here's an awesome demo of C-omega with Gavin Bierman of MSR UK. It shows SQL, XML, and XQuery built into a MSIL language. Very Nice! Best of all, the compiler preview is available for download and compiles down to MSIL like any other .NET language: You can write your favorite application in it if you want!
Gavin was asked whether C-omega would be making its way into the next C#. One thing to consider is that one feature, nullable types, have already made it in!
It also would be a good reason as to why MS has strategically chosen XQuery over XSLT.
Given Don Box's recent musing on “the impedance mismatch” which was previously championed by Dare Obasanjo, “The Great Convergence” does not look so far off.
By the way, this, to me, is probably the most important reason to switch to C# from VB.NET, etc. C# is a future language with international standardization muscle; the others are just legacy language reruns. Don't write new code, especially class library code, in them!
Don Demsak (DonXML) has a great posting that compares the relative performance of four ways to stuff an object from the database:
- DataReader
- DataSet
- XPathNavigator
- XmlSerialization
They are in order :-)
Not surprisingly, XmlSerialization is the slowest (by 47%). Something to think about, no?
Don also references a nice anti-DataSet for Web Services post by Scott Hansleman.
Brad Abrams leads us to a new BCL Blog posting on DateTime by Anthony Moore.
After noting that changing DateTime to use Utc Dates is impractical, Anthony recommends how to serialize dates in binary and text formats.
The comments section raises issues I have in mind. Namely:
- Whidbey should include a new class that supports Utc internal date format.
- All dates should be stored as Utc. My personal feeling is that only display should be tied to timezone of the local user, everything else should be UTC.
- Some dates are truly timezone neutral, such as time-of-day values and dates, such as birth dates.
An interesting note here is that SQL Server 2005 will include a DateTime that remembers it's timezone. I couldn't find more information on it (Does anyone have a good posting explaining the new datatypes?), but I did find that (as is just mentioned) .NET 2.0 will not support the new Date and Time types.
Note MSDN already has a good outline of DateTime best practices.