first_page

The Pretty Pink Pages of Peter Sefton

I don’t like the colors of his Web site (cyan and yellow?) but I like his healthy respect for word processing and XHTML. A Peter Sefton Blog entry:

However, for your average office document, Microsoft Word is in my opinion closer to Tim Bray’s vision than the sun-sponsored OpenOffice.org. Why? In a rare stroke of (near) genius, Microsoft Word 2000 and upwards (is that version 8?) offers a ‘save as html’ option which is nearly but not quite XML but is also a complete Word document. It’s a few simple transforms, described in this PDF to turn the office html format into XML. It would be less than a hundred lines of script in Perl or Python. Once you have it in XML, you can take it to XHTML, and use the web ready image renditions that Word has also generated for you. Or, you can change your original document or generate a new one and round-trip the XML back to that weird Microsoft format. As far as I’m concerned this is more useful than WordML, and more practical that custom XML schemas. I can get good web pages out of it, and make changes to, or create, documents. If my Word template maps to XHTML then I have everything I need for most publishing systems.

And let’s emphasize that last sentence: “If my Word template maps to XHTML then I have everything I need for most publishing systems.” This is why I have spent years struggling with this issue and cannot begin to realize the potential power introduced to me in my article “XHTML Schemas in Word 2003 Documents.” Since I was laying out hip-hop magazines in PageMaker on a 286, Microsoft Word served as the heart of my publishing system. For me, it has always been the first stop in that workflow diagram mapping out a publishing house (no love or envy for the Quark Publishing System here). I was not going to let something called the Internet stop this living tradition. And now with Office System 2003 and The .NET Framework, it is possible to author one Word document to be imported into a tool like InDesign for print and to route its content as automatic data-entry to a database for the Web.

I wonder what Peter would think when he discovers that people like me are writing tools designed to export XHTML not just on the document level but also on the Paragraph level. Would he welcome the ability to automatically markup Word documents with XHTML using the robust automation features of Office System 2003 and/or Visual Studio .NET? Would he find routing nodes of XHTML through a Web service from Word useful? What about saving bits of a Word document as XHTML by selecting text and sending it marked up to the Clipboard?

Peter Sefton details his post-processing of Word HTML in “Word to XML and Back Again” at XML.com.

Meanwhile, not far from Redmond (and XAML) there is an exploration of the possibility of making an XHTML-CSS word processor.

rasx()