'

[Irl-dean] Accessibility of PDF Format Resources?

Barry McMullin mcmullin at eeng.dcu.ie
Mon Nov 20 21:59:24 GMT 2006


Many thanks to Eoin for his excellent and insightful comments.  I
will react a little to the fine detail, but before doing that I'd
like to just make clear that I think we are probably all
completely agreed on about 99% of what is at issue here.

Specifically:

1. The *big*, critical, gap is between *any* inaccessible
   document and *any* accessible document, completely independent
   of the specific format of either. That's where the major
   challenge lies, and I would say it probably counts for about
   90% of the job that needs to be done here. So if we can move
   progressively more publishers into providing accessible
   documents - *even* if, for whatever reasons, that means
   "accessible PDF" - then I, for one, will still be very very
   happy with that progress!

2. If a publisher is willing and able to do so, my *ideal*
   scenario (in the current state of browsers, plugins, assistive
   technology etc.) is for documents to be provided in *both*
   "accessible HTML" *and* "accessible PDF". Each *does* have
   some distinctive advantages in certain circumstances, which
   the other lacks. It is almost always good (for accessibility
   and general usability) to give consumers *extra* options.  So
   I put this at the notional 9% of what could be done over and
   above 1 above. (The last 1% would be for going completely
   above and beyond the strict call of duty ... for example,
   *also* providing a Daisy format version, a zipped bundle of
   the HTML etc...)

So what we are debating here is *only* scenarios somewhere in
between 1 and 2; and where they might score in accessibility
terms on this (admittedly very ill-defined!) scale between 90%
and 99%.

Roughly then, I am saying that I think "accessible HTML",
without *any* PDF version, might typically score about 95%;
and "accessible PDF" without any HTML version is more like 90%.
So *if*, for whatever reason, a publisher is *only* going to
provide one or the other, then I would advocate HTML.

*Whereas* I think Eoin is inclined, "on average" to give that
scoring the other way around (?); and conclude that, if only one
"accessible" version is to be provided, then it should be
(accessible) PDF.

I think this is a fair point, and probably deserves some closer
debate.  Though I suspect that we'll have to conclude just that "it
depends" - on the particular circumstances of a particular
document.  Sometimes "accessible HTML" might win, sometimes
"accessible PDF".

BUT: I think all this *should* anyway be a fairly temporary state
of affairs.  While tool development is still immature in this
area, I don't think there is any serious technical obstacle to
more or less *completely automated* conversion from "accessible
PDF" to "accessible HTML" and even vice versa.  It wouldn't give
quite as good a result as a process that still has some manual,
format-specific, "post-production"; but for many many practical
(accessibility) purposes it would be perfectly good enough. So if
and when such tools do become generally available, then there
will really be no need for any publisher to even think about
"only" publishing one or the other, so we will be straight back
up to the 99% level of case 2 above (yeah!).

Well ... OK: there is at least one exception to this rosy
scenario: it doesn't get off the ground with any PDF document
that uses encryption, AKA "digital rights management".  But
that's a whole other can of worms that I'd prefer not to open at
this point!  I'll just note that nobody *makes* any publisher
encrypt their documents, and no document consumer has ever, ever,
spontaneously *requested* that they should be restricted in what
they can do with a document.  And if you really still want to
chase this will o' the wisp, I'll just recommend this one link:

 <http://lquilter.net/blog/archives/2004/09/22/computer-industry-lied>

So, that all said, let me just react to Eoin's specific points;
but emphasising that that this is really (for me) down in relatively
fine details, not remotely as important as the original step from
"inaccessible" to "accessible" documents.

> Here are the advantages of accessible PDF documents over their HTML
> equivalent,
> in particular for longer reports:

> - Easier navigation: a  PDF document is much easier to navigate
>   via the auotmatically-created PDF bookmark mechanism, unless a
> considerable effort
>   is made to include document-specific
>  document-specific navigation in the HTML version.

Well ... there are indeed various cases here, depending on the
size and complexity of the document, and whether the (proposed)
HTML version is a single file or split across multiple files.  In
the single file case, the firefox "document map" extension gives
exactly the same functionality as the PDF bookmark mechanism.  Of
course that assumes using a particular browser and that a
particular extension is installed.  But using "accessible" PDF
*also* makes quite severe assumptions about the reader <wink>.
In any case, I'll just say that any HTML document publishing
toolchain worth its salt should be able to automatically generate
as much (or as little) of this navigation as is desired (whether
it's a single file or multi-file output) without any manual
intervention.  So no, I can't really see this as something
involving "considerable effort" (unless I have misunderstood what
Eoin is referring to).

> - Easier searching: it is much easier to search a PDF document for a
> particular term and get
>    a set of results limited to the document itself. If a long report is
> a single HTML page, browsers
>    offer a limited in-page search.

So in the "single HTML page" case, HTML and PDF are just the
same. But:

> If it [the HTML version] is split into multiple pages,
> it is difficult to limit the search to
>    a particular document.

Well ... if the nature of the document is such that this is a
real problem, one might offer in-document server side search; or,
even simpler, offer both "single-page" and "multi-page" *HTML*
versions.  Again, the technical requirements to automatically
interconvert between these, completely automatically, are
trivial.  (No, I'm not suggesting that individual, non-technical,
authors could do this; but an organisation or agency of any size
certainly could!)

> - Easier downloading for offline electronic reading: A single PDF is a
> convenient container for
>   downloading a report to read and navigate offline. A long report in
> multiple HTML pages is
>    very difficult to download as a self-contained package.

Well ... yes and no.

If this *is* a significant use case (for a particular document),
the publisher can easily (yes, I do mean "easily") provide a zip
archive HTML download for offline reading.  This *would* be more
complex to handle than a PDF. It requires the user to understand,
and go through at least one extra step after the download ... but
I don't think I could agree that this would be "very difficult".

And on the other hand, a PDF is typically 5-10 times bigger than
a corresponding zipped HTML bundle. Hmmm.

I think I might still call that one a tie.

> - Easier printing: It is easier to print a single PDF document than
> multiple HTML pages, and it
>   prints on much fewer pages (in general).

Absolutely.  This is what PDF was designed for, and it is very
good at it.  Indeed, I make extensive use of PDF for this very
purpose (but never without providing HTML also). But note that,
for *this* narrow purpose, there is no particular benefit in
"accessible" PDF, so it's perfectly OK to rely on naive,
completely automatic, PDF generation tools.

> So in my view, a report in HTML format _should also_ be made available
> in accessible PDF
> format, so that readers can benefit from the many advantages PDF offers
> to people with or
> without disabilities.

And so (thankfully) we get back to where we started.  While we do
have slight differences in preference between PDF and HTML, that
only matters *if* we are forced to choose.  But far and away the
best situation is where this choice is *not* forced, both
accessible HTML and accessible PDF versions are on offer, and
each of us can enjoy which works better for us in each particular
case.

(And ... maybe I need to re-phrase the "summary recommendation"
of my white paper yet again.  Still, this is very helpful, so
thanks for the feedback.)

Best - Barry.




More information about the CEUD-ICT mailing list