XML
XML-related technologies. Also see Web Services and WSE
At the end of Aaron Skonnard's September 2006 Service Station article on System.Xml 2.0, he summarizes with a list of guidelines. Read and do.
- Always use the static Create factory methods for creating readers and writers, even when you need support for things such as validation.
- If you care about performance, you should always use XPathDocument as your in-memory store when querying or transforming the document.
- Only use XmlDocument when you need an editable store, and when you do need one, use XPathNavigator to write the updating logic.
- Always use XslCompiledTransform to execute XSLT transformations when you're concerned about performance.
- Take advantage of the various API improvements to simplify your code.
- Read "What's New in System.Xml for Visual Studio 2005 and the .NET Framework 2.0 Release," by Mark Fussell
Tim Ewald shares his solution for XML Schema versioning problems. Instead of using xs:any, which causes lots of problems, have the schema validator simply ignore nodes it doesn't recognise and validate those it does recognise. Tim says schema validators will allow you to ignore validation errors, and says he will have an example soon. The contract is:
- A service can evolve it's contract in a controlled way without breaking clients
- Clients must assume that the contract they get is a snapshot in time and the service is free to evolve it's contract in a controlled way
- An application producing an XML instance should make sure it matches the schema that application is using
- An application consuming an XML instance should assume it matches the schema that application is using plus additional elements
- If an application consuming an XML instance wants to schema validate it, it should be forgiving in how it deals with unknown elements in the stream and should not simply throw exceptions
He has his code and several more entries:
Dare's article on Schematron a ways back showed how to use Schematron for validation and mix it with Xml Schema. The architectural question is: when should you use it?
As a corporate architect I heard many projects where agreement on the XSD contract was the easy part. It is the rest of the taxonomic rules that took the most work between partner companies. This is where Schematron comes in.
You do not need Shematron if you:
- own both sides of the contract and they are deployed together. In that case you should have a helper or adapter class that owns the taxonomic business rules.
You should use Schematron if you are:
- making a standard spec used by third parties
- working with a partner company that owns the "other side" of the contract
You can use Shematron
- all the time as a documentation tool for otherwise implied semantic rules
Any other reasons you can think of?
Started with a client that's uses AS/400 for fulfillment. We'll be passing XML via MQ Series from a .NET app.
So how do you process XML in RPG? Answer: XML4PR. IBM provides it as part of the XML Toolkit for iSeries. Contents here. Documentation: types, procedures.
After a million years, the XInclude spec has been released. Unfortunately it is seriously flawed, as described, here, by Dare Obasanjo, the former System.Xml PM. The good news is that the API workaround he mentioned will be implemented in .Net 2.0, but it still does not bode well for XSD schema designers. Dare has more on the W3C's reasons for doing what they did and why they are wrong headed here.
If you've ever worked on a large XML file that a lot of people work on, you'll know why you want XInclude. Pity it isn't transparent.
Someone in the blogosphere pointed me to this excellent series from June 2003 that explains the .NET XML APIs and provides guidelines on when to use each. Written by Aaron Skonnard himself. Highly recommended.
Note that this will be dated material when .NET 2.0 is released.
Dare Obasanjo shares an important update on the Beta 2 version of System.Xml 2.0. Highlights:
- XQuery has been dropped!
- As expected, XPathDocument editing has been dropped. It was dropped once before in Beta 2 of 1.0. Maybe they'll drop it again in 3.0 Beta2... :-)
- A schema validator object has been added that allows an in memory XML representation, such as XmlDocument, to be validated without having to run through a parser. Very nice!
- XPathEditableNavigator has been merged into XPathNavigator. The Xml usage guidelines now state that the XPathNavigator is the preferred API for exposing XML to the world. I'll repost this point.
- There are now APIs in XmlReader/Writer for dealing with large data stream content.
Dare provides no word on any changes to their XSLT 2.0 decision.
Here's an awesome demo of C-omega with Gavin Bierman of MSR UK. It shows SQL, XML, and XQuery built into a MSIL language. Very Nice! Best of all, the compiler preview is available for download and compiles down to MSIL like any other .NET language: You can write your favorite application in it if you want!
Gavin was asked whether C-omega would be making its way into the next C#. One thing to consider is that one feature, nullable types, have already made it in!
It also would be a good reason as to why MS has strategically chosen XQuery over XSLT.
Given Don Box's recent musing on “the impedance mismatch” which was previously championed by Dare Obasanjo, “The Great Convergence” does not look so far off.
By the way, this, to me, is probably the most important reason to switch to C# from VB.NET, etc. C# is a future language with international standardization muscle; the others are just legacy language reruns. Don't write new code, especially class library code, in them!
Dare shares gudelines on how to design extensible, versionable XML vocabularies.
The outline is:
Message Transfer Negotiation vs. Versioning Message Payloads
Version Numbers vs. Namespace Names
The Difference Between Versioning and Extensibility
Guidelines for Designing Extensible XML Formats
- XML formats should be designed to be extensible.
- Extensions must not use the namespace of the XML format.
- All XML elements in the format should allow any extension attributes, and elements with complex content should allow for extension elements as children.
- Formats that support extensibility must specify a processing model for dealing with extensions.
Why XML Formats Should Be Designed to Be Extensible
Why Extensions Mustn't Use the Namespace of the XML Format
Using XML Schema to Design an Extensible XML Format
Guidelines for Designing Versionable XML Formats
- If the next version of a format is backward compatible with previous versions, then the old namespace name must be used in conjunction with XML's extensibility model.
- A new namespace name must be used when backward compatibility is not permitted. That is, software must break if it does not understand the new language components.
- Formats should specify a mustUnderstand model for dealing with backward incompatible changes to the format that don't change the namespace name.
Using XML Schema to Design a Versionable XML Format
I believe I failed to post about this back when it was announced in May: .NET 2.0 will support XQuery 1.0 but will not support XPath 2.0/XSLT 2.0. [However it will still support XPath/XSLT 1.0] See the posting by Dare Obasanjo's boss Mark Fussel and Dare's post here. (Arpan Desai, the XQuery program manager, has more analysis here.) The posts also imply that they are not inclined to support XSLT 2.0 in the future.
My colleague who sits on a working group for an industry standard XML vocabulary was very chagrined to read about Microsoft's direction and the implications for his standard.
The waters do seem a bit cloudy, however. Arpan Desai seems to suggest in a response to Dare's post, that XPathNavigator and other “XPath” queries will just use the XQuery data model, so there's no need to worry about whether you can use the new syntax in the standard XML classes; you can. However, XSL Transformations proper must be XSLT 1.0 stuff.
All hope is not lost. In a later post by Dare asking for input for the Longhorn Framework XML library folks asked for XSLT 2.0 support. He even muses about it in a later post, but is is presented with a surprise.
Christian Weyer suggests passing message objects that are XmlSerialization attribute encrusted back and forth from ASMX web services.
Well...I suppose it's easer to read than staring at WSDL. However, it's certainly not as easy as:
[WebMethod]
public string HelloWorld()
{
return "Hello again...";
}
Dino Chiesa's response to my WSDL First: A road to Pain? posting was to do both together. That is, do your web method, look at the WSDL, and change your web method to improve the WSDL.
Of course you can't do [SoapDocumentService(ParameterStyle=SoapParameterStyle.Bare)] when you have multiple parameters when you do it. It's a lot easier to read and write in code, though. Maintenance of the contract, however, may be another story.
Don Demsak (DonXML) has a great posting that compares the relative performance of four ways to stuff an object from the database:
- DataReader
- DataSet
- XPathNavigator
- XmlSerialization
They are in order :-)
Not surprisingly, XmlSerialization is the slowest (by 47%). Something to think about, no?
Don also references a nice anti-DataSet for Web Services post by Scott Hansleman.
A colleague of mine is struggling with how to define his XML vocabulary in XML Schema. Specifically, he is struggling with whether he should define an id attribute on every element. On one hand, IDs are very easy for business type folks to wrap their heads around, as opposed to XPath expressions. Moreover, schema validations are vastly simplified. On the other hand, some implementers are complaining that for various language implementations, there is a large object creation overhead to represent simple elements as objects that contain attributes, just to support the id attribute.
A few thoughts on this:
The bigger question is whether the XmlSerializer should be used at all if performance is an issue to you. Many have written that XmlReader/Writer and XPathDocument are better approaches. [1] In fact Microsoft has reversed the direction and once again beefing up XPathDocument for .NET 2.0 and suggesting that it be used over XmlDocument for writing as well as reading. [2] Finally note that we are encouraged to use XmlElement when passing XML arguments in ASMX 1.x though there are faster ways!
[1] See a great summary of .NET XML technologies by Scott Hanselman here. The BCL 1.x definitive word by Aaron Skonnard here. Don Box on using XPathDocument for argument passing here (video). See Dare's BCL 1.x definitive Best Practices for Representing XML in the .NET Framework.
[2] Aaron Skonnard's PDC 2003 Report explains the move to XPathDocument. Required reading!
Of course I haven't answered the question definitively on which way my colleague should go. However, I think that he can be confident that there are performant ways to handle the plethora of IDs he's contemplating.
Daniel Cazzulino shares five tips to high performance XML [Via Don Box who likes #3]
- Dynamic XPath expression compilation
- XPath execution tips & using XPathCache
- WebService XML sans XmlDocument
- Subtree transformations without re-parsing
- In-memory XML Schema validation without re-parsing
Here's a Microsoft best practices document on how to use the new XML features of Yukon.
Kirk Evans has an important post on XML design patterns for XML Schema (XSD). One should keep improving his XML vabulary design skills. Even if you are hiding your schema behind ASMX classes, you should still be thinking about what the resulting vocabulary will be.