'

[CEUD-ICT] Tools for conversion of Ms Word documents to HTML

Eoin Campbell ecampbell.xmlw at gmail.com
Thu Feb 12 14:14:11 GMT 2009


Hi all,
I had a look at the tools Donal mentioned, and below are my comments.
Note that my company offers a subscription-based Word to HTML conversion service
which competes with these tools, so feel free to disregard all my
points.

I am always interested in reviewing new Word to HTML conversion tools,
hoping to find a good one, but I have to say I invariably end up with
a feeling of 'schadenfreude', as the output quality of most tools is
not very good from an accessibility perspective, and the 3 tools mentioned
are typical in this regard.

There are a few simple checks that quickly indicate whether a converter
is any good:
1. Are styled Word headings converted to corresponding HTML heading elements?
2. Are lists (including nested lists) converted into hierarchical HTML list elements?
3. Are table heading rows converted into heading cells in the output?
    (i.e. <table><thead><tr><th>Column heading</th>... )
4. Is the output well-formed (and preferably valid) XHTML?

For Zapadoo Word Cleaner 4.2, the results are 1. Yes, 2. Sort of, 3. No, 4. Yes (valid)
For Textism Word HTML Cleaner the results are 1. Yes, 2. Sort of, 3. No, 4. Yes (well-formed)

I don't have results for Virtual508 Accessible Web Publishing Wizard to
hand, but the results are somewhat similar, if I recall correctly.

For lists, Zapadoo and Textism convert styled Word lists into HTML list elements,
but don't nest sublists properly.
Lists created using Words' built-in list styles ("List Number", "List Bullet")
are converted into paragraphs, not lists.






Donal J. Rice wrote:
> Folks,
> We're doing some research into tools for the conversion of content in MS
> Word documents into accessible (X)HTML.  One of the biggest issues faced by
> public bodies, ourselves included, is the large scale publishing of such
> content to the a website and the cost and effort in conversions.  At the
> moment I am researching free or commercially available tools.  I know of a
> couple of offerings in Ireland including the YAWC facility and the
> Riverdocs converter, although the latter is no longer available.  My
> colleague Alan is looking at the following:
>    "Word Cleaner" software from http://www.zapadoo.com
>    "Word HTML Cleaner" at http://www.textism.com/wordcleaner/,
>    "Virtual508.com Accessible Web Publishing Wizard" from
>    http://www.virtual508.com/download.html (as recommended by WebAIM
>    http://webaim.org/techniques/word/ )
>    Word's own "Save as Web Page, Filtered" option
> 
> What are people's experience of trying these or other tools.  Main issues
> in my experience with this work is that MS Word documents are typically
> poorly structured so that the web author has a lot of work to do on
> repurposing the MS Word prior to conversion or the HTML post conversion.
> Also for larger reports it is desirable that the output is broken into a
> number of HTML pages, with some form of navigation within the page.
> 
> Looking forward to hearing people's experiences.

-- 
Eoin Campbell
ecampbell.xmlw at gmail.com


More information about the CEUD-ICT mailing list