first_page

The XSLT-based Version of CleanXHTML

Office 2003 XML When Brian Jones of the Microsoft Office System team began to Blog, he caused a chain reaction with huge positive implications for data management solutions at SonghaySystem.com. One of his Blog posts attracted a comment from Evan Lenz, co-author of Office 2003 XML. This comment led me to the sample chapter, “The WordprocessingML Vocabulary.”

Reading this chapter allowed me for the first time to look at the WordprocessingML files as something other than a jumble of strings that happens to be well-formed XML—strings more for machines than humans. The clear, precise, masterful discussion of the format made it—well, clear and precise. The scales fell from my eyes! I have written an XSLT template that extracts a clean subset of XHTML. This was done less than two weeks. When I took on the Microsoft Word COM-based object model in earlier versions of this product to perform the same task, it took over two years! I look forward to a future version of this book covering Office 12 formats. Such a book will be a required reference for the Songhay shelf.

What remains now, is how to redesign CleanXHTML for an XSLT-based solution. I am so used to brawling with the Word object model. What is needed now is a lighter touch…

rasx()