There’s been a lot of noise these last few days about the Microsoft Office XML file formats; the world doesn’t need my opinion again. I’d vaguely noted that Mac Office would be a little behind on the new XML, then Simon Phipps shot me links to a couple of closer looks, which shed an instructive light.
Andrew Shebanow does some numbers based on what we’re hearing and tries to estimate how many person-years it would take to implement MSOXML; yow!
The impact is that for the next few months, Mac people, just like Linux and Solaris people and everyone else who doesn’t use Windows, aren’t going to be able to read Office’s native file format. That’s OK, Sheridan Jones suggests a workaround: “For now, we recommend that Mac users advise their friends and colleagues using Office 2007 to save their documents as a ‘Word/Excel/PowerPoint 97-2003 Document’ (.doc, .xls, .ppt) to ensure the documents can be shared across platforms.” Right, then.
Students of message management will be amused at the conversation launched by Erik Schwiebert in Conversion factors. He dives deep on the Mac Office file-format issues. One fairly astounding statement was “there are certainly a variety of XML parsers out there, including libxml, but the only one that ships on Mac OS by default is libxml and it doesn’t support everything that the new file formats need.” Now that’s guaranteed to raise eyebrows in markup-land. Predictably, the comments got a little heated, and it didn’t help when Erik added “libxml didn’t handle the latest open standards that the XML spec details”. I’m trying hard to find a way to see that as anything other than a blatant lie, but it’s tough.
Eventually Rick Schaut (who seems like a Real Smart Guy) pulled aside the curtain of marketing weasel-speak and laid out the actual real engineering issues regarding libxml and MSXML and they’re not surprising or nefarious; but you rarely see such a nakedly exposed linguistic framing gradient.
I should close by saying that I’m a huge fan of the MacBU and think that Mac Office is probably the single best piece of software that Microsoft ships, and that I’ll probably end up buying it.
Comment feed for ongoing:
From: Cameron Watters (Dec 07 2006, at 23:24)
I'm no expert on this stuff, but I think his comment about libxml is limited to the version of libxml that ships with Panther (OS X 10.3). In his comment, he references another developer's post on the issue in which he specifically spells out "the version of libXML that ships on Panther doesn't support SAX 2.0's namespace changes" (ref: http://blogs.msdn.com/rick_schaut/archive/2005/06/01/424086.aspx)
If that's the case, and the libxml version shipping with Panther is dated, then it seems to be a fair statement.
Unfortunately, because I'm not inclined to go figure it out, and there's no specific refutation of the specific issue pointed out, I have absolutely no idea whether or not what I said helps at all.
As such, this comment is worth what you paid for it (or less).
[link]
From: Xavier Borderie (Dec 08 2006, at 02:12)
Funny thing is, even contacts at MS are starting sending .docx file. "Could you please send that back to me in a readable format ?", I answered.
[link]
From: Chris Ryland (Dec 08 2006, at 06:23)
The saddest part of all this is that both of the "open" office XML formats enshrine the 30-year-old "computer as smart typewriter" document model. Time for some forward thinking from the industry!
[link]
From: Kevin Hamilton (Dec 08 2006, at 06:57)
Speaking from very recent experience (this week) where I was getting repeated Segmentation Faults when running XPath queries against an Excel 2003 XML Document, I can vouch for the fact that libxml2 as late as 2.6.19 would be unreliable. (Note: Reportedly ( http://bugs.php.net/bug.php?id=32912 ) it is now fixed.
According to http://www.explain.com.au/oss/libxml2xslt.html Mac OS X 10.4 (Tiger) ships with libxml2 2.6.16. So if that is a target platform for the new Mac Office, I can see how they would be unable to depend on libxml.
[link]
From: Schwieb (Dec 08 2006, at 07:25)
I suppose my statement was a little astounding, but it was based on my understanding of the Panther version of libxml, which was the lowest common demoninator at the time we were considering which XML library made the most sense to use. If you read through the comments chain on that post on my blog, you can see where Rick Schaut detailed more specifically what we needed and why. Apparently Toger and Leopard versions of libxml do have more of what we need, but we'd already taken the MSXML fork in the road.
[link]
From: Keith Fahlgren (Dec 08 2006, at 10:33)
Thankfully, it's very easy to write some XSLT to get all the value out of documents in the new XML format:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="@*|node()"/>
</xsl:stylesheet>
[link]