'
[Irl-dean] Converting WORD Documents to DAISY Format (Review)
Eoin Campbell
ecampbell at xmlw.ie
Wed May 28 12:49:48 IST 2008
As a self-confessed expert on Word to XML/XHTML conversion, I downloaded
and installed the Daisy converter and had a look at the output.
I was pleasantly surprised at the quality of the output, although it is
far from perfect.
Headings, lists, tables (including spanned cells and heading rows), links
and footnotes are all converted properly, as far as I can tell.
Page numbers are included sometimes, but not always.
My 175 page test document contained had only 4 page numbers
generated. I'm not sure if this feature is important for DAISY book readers.
The markup is clean and does not contain all the gunk usually generated
by Microsoft.
However it is not well-formed.
a. The encoding declaration seems to always be UTF-8, while the text
is ISO-8859-1. Manually changing the XML declaration fixed this.
b. Ampersands inside a/@href attributes are not escaped using the &
entity,
although ampersands appearing in the text are.
Fixing these errors gave a valid document against the DTB DTD
http://www.daisy.org/z3986/2005/dtbook-2005-3.dtd
The output is a single XML file, and no NCC file is generated
(cf. http://www.daisy.org/z3986/specifications/daisy_202.html#ncc).
Presumably there is a utility somewhere out there
which can create an NCC file from a DTB XML file.
The documentation is poor, and the user interface is also simple, but poor.
Errors were reported, but a file was generated anyway, and the reported
errors turned out not to be errors at all once the issues mentioned above
were corrected.
My test file had hierarchically structured headings, and properly styled
lists,
so the results were very good. Obviously this is crucial to the output
quality.
I won't be throwing away our YAWC Online converter (www.yawconline.com)
yet, but any tool that rewards authors/editors for using Word styles
properly has
to be good.
--
Eoin Campbell, Technical Director, XML Workshop Ltd.
10 Greenmount Industrial Estate, Harolds Cross, Dublin, Ireland.
Phone: +353 1 4547811; fax: +353 1 4496299.
Email: ecampbell at xmlw.ie; web: www.xmlw.ie
YAWC: One-click web publishing from Word!
YAWC Online: www.yawconline.com
More information about the CEUD-ICT
mailing list