posted on Wednesday, August 25, 2004 10:01 AM by Benjy

Data Extraction Challenges

Data Extraction layer development should be classified with Health hazard warnings. Still keeping my head down (or rather, its been forced down with this millstone) and trying to finish off my current set of components.

As I mentioned in some of my previous entries, one of the requirements I have is to develop this component that extracts data from a legacy DB into XML Messages. Actually , the DB is not that old. In fact the application is only a couple of years old and implemented with SQL Server as data store. The key problems though are (a) the application is completely monolithic and not designed with any extraction requirements or services in mind. (It does what it needs to do, but only when used with its own front end) and  (b) from a DB design point of view, suffice to say that it does not pass muster.

Since the XSD's that govern the messages are so FAR FAR APART from the DB Design, writing SQLXML will not work. (At least, with my limited knowledge of SQL XML it wont work and I dont have the time to put in a 'spike' as the XP folk call it to investigate). I looked briefly at typed datasets but threw them away. Too much code to write.

So what did I settle for then?

  1. I created a set of classes for the XSD using the XSD.exe. I kept this in a separate TYPES dll.
  2. I created a set of SQL DB views corresponding to the major elements in the XSD. This encapsulated all the mappings to various tables, string manipulations, data format conversions, joins etc.
  3. In the main application I instantiated classes from the types DLL (collections wherever necessary) and then wrote the SQL Code to get the data from the views and fill up the class members.
  4. I wrote a little utility that generated the 'standard' code which instantiated the data reader and assigned values to the members from the data reader. This saved me a lot of time.
  5. Once i got the main objects, I tied them up to the root class and then used the default serialisation (XmlSerialiser) to generate the Xml file.

This works fine. In fact I'm thinking of more code generation options using the basic approach listed above.

Maybe the points above will help someone else. I'll explain why the typed datasets approach is no good for my requirements in a later post.

Have fun.

 

Comments