'

[Irl-dean] Converting WORD Documents to DAISY Format (Review)

Eoin Campbell ecampbell at xmlw.ie
Wed May 28 15:06:15 IST 2008


Correction re validation (encoding, ampersands in href attributes):
When the Word to DAISY converter reports that the document
has been successfully converted, then the character encoding
of the output document is correct, and ampersands
inside href attributes are also escaped properly.

I tested this out with small documents.
When I try to convert fragments of my original larger
document, the converter displays a dialog box,
but then that disappears and no error/success message is
displayed, and no DTB XML file is created.

So it is all a bit flaky at the moment.

The conversion process seems to be as follows:

1. Save Word document in .docx format
    (the Word 2007 Zipped XML file format).

2. Prompt for some details (Title, Creator, etc.)
    for which default values are extracted from the Word file
    properties.

3. Run XSLT script to convert OpenXML into DTB XML.

4. Inspect Word file and warn about loss of fidelity issues,
    where Word content is not converted properly.

5. Validate DTB XML file against DTD, and report errors, if any.









-- 
Eoin Campbell, Technical Director, XML Workshop Ltd.
10 Greenmount Industrial Estate, Harolds Cross, Dublin, Ireland.
Phone: +353 1 4547811; fax: +353 1 4496299.
Email: ecampbell at xmlw.ie; web: www.xmlw.ie
YAWC: One-click web publishing from Word!
YAWC Online: www.yawconline.com





More information about the CEUD-ICT mailing list