'

[Irl-dean] Accessibility of PDF Format Resources?

Barry McMullin mcmullin at eeng.dcu.ie
Thu Nov 23 09:42:22 GMT 2006


[Warning: this is yet another quite long and meandering response
on my part; so apologies to anybody who has already had quite
enough of this subject.  But ... it *is* an important and topical
one, so I would still like to probe it a little further!]

On Tue, 21 Nov 2006, Eoin Campbell wrote:

> The class of documents under discussion is "documents currently
> being published on the web in PDF format". In general, this
> class of documents encompasses longish reports of from 50 to
> 500 pages, which are also published in print format, and which
> are then made available electronically for the publishers
> convenience (saving print costs, allowing people serve
> themselves, etc.).

Um ... not quite.  There are a *lot* of documents being published
on the Web in PDF that are much shorter - less than 10 pages.
These are pamphlets, brochures, posters, information leaflets,
press releases, speeches, etc. etc.  But, I'm happy to agree that
the considerations we are talking about do vary according to the
nature (particularly length and complexity) of the document in
question.  The e-Access lab white paper deliberately did not
explore that distinction, just in the interests of brevity.  But
maybe it should.  For the moment at any rate, I'm very happy to
explore the specific category of "long, complex" documents more
carefully here on irl-dean.

> The question in my mind is then: "What is the best way to
> publish a long report electronically on a website?"

> Another way of phrasing this question is to ask: "What do I
> want to do with the information in the report?"  As a reader, I
> want to be able to do the following things.

> a. Print out the full report so I can read it on the bus, stick
> post-its on important pages, underline sections, etc.

Yes but ... let's just emphasise that this use case obviously
doesn't apply to *all* readers.  Some will be completely unable
to access print materials.  Others may be able to (and wish to)
use print, but need special accommodation - large print, braille
printing, high contrast, specific colour scheme etc. Others may
be able to use print, but find it much harder than on-screen
reading - perhaps because they need easy access to dictionary
lookup, thesaurus, acromnym expansion, the need to follow
hypertext links to referenced resources etc.

So yes, conventional printing is an important use case, and one
that should definitely be provided for.  But it would be bad for
that to dominate considerations; unfortunately, in many cases I'm
afraid that that is exactly what happens...

> b. Search it so I can look for mentions of particular topics in the report
>     (and only in the report).

Agreed.

But this is at least as easy in a "single page" HTML version as
in a PDF version.  Arguably its even easier, because it uses the
"normal" browser user interface, rather than switching to a
potentially different user interface in a plug-in or external
reader.

> c. Download it so I can read it online, even when not connected
> to the Internet (because I only have dial-up access at home).

Agreed.

But again ... the "single page" HTML version is very nearly as
easy for this purpose as a PDF version.  "Save as complete
document" functionality is pretty standard in browsers
now. Publishers may also provide zip-packaged versions
specifically for download/offline use, which will work equally
as "single page" or "multi-page"".  There is essentially zero
cost to doing this (i.e., given that a HTML version available at
all, *also* offering a zip bundle download option is essentially
free). And a HTML-based version will almost always be a noticably
faster download on a dial-up connection.

> d. Navigate it easily so I can jump around the document (particularly for
>     reference documents that I use a lot).

Agreed. Also, as already noted, to navigate outside the
document.

But the ease of this is not (?) determined by the format, but by
the way the document has been implemented in any particular
format.

Eoin raised a particular comment about the "PDF bookmark"
capability.  But given a single-page HTML document with correct
header structure, this is easily accomplished either client side
(e.g., with firefox "document map" extension) or server side with
a short XSLT script (or perl or almost any scripting
environment), either dynamically or statically.  The same applies
with a multi-page format: given a properly marked up source, it
is technically trivial to generate this level of navigation
support. No, I don't suggest individual authors could do this,
but any modest sized organisation, or "web" agency (as opposed to
"print" agency!) worth its salt, should certainly be able to.

> Purely as a reader, regardless of my level of ability, and even
> though my company offers the service of converting such
> publications into accessible HTML, I honestly believe that for
> these particular requirements, accessible PDF better suits my
> needs than accessible HTML.

Agreed ... if you do specifically mean "my" as in "Eoin
Campbell's" <wink>.

But seriously, yes, I would probably agree that a significant
number of "mainstream" web users, perhaps even a majority, would
share this preference of Eoin's for a PDF version of such (long,
complex) documents.  I say this despite that fact that I have
argued above that *even* for "long complex" documents (the most
favourable scenario for PDF usage?) the differential between PDF
and HTML is not *that* big.

But I don't think that is actually quite the point at issue.  I
would say that the question is whether there is a significant
category of people, particularly people with various
disabilities, who would find the HTML better suited their needs?
If there *is*, then that justifies my advice to make an
accessible HTML version available, specifically on accessibility
grounds (but absolutely not *exclusively* available: by all
means, provide PDF *also*, to still address all the use cases
Eoin is identifying!).

Of course, I don't actually "know" in any "scientific" sense,
whether there is such a "significant category" of users who would
prefer HTML, and that is why I welcome further input on the
question from irl-dean members.  It is a very difficult question
precisely because there is currently so *little* "accessible PDF"
out there.  So while it is certainly the case that I hear a lot
of negative comment from people with disabilities about PDF
content, it is not clear how much of that is attributable just to
the use of PDF; how much to *inaccessible* use of PDF; and how
much to the use of even slightly older PDF viewers or assistive
technologies that don't yet support accessible PDF anyway.  As I
said before, I would guess, very roughly, that "accessible PDF"
would probably "fix" of the order of 90% of the problem - at
least in principle, as soon as users have access to compatible
readers and AT; but, for the reasons I've explained, I still
argue that there is a further, non-negligable, chunk of the
problem that even "accessible PDF" will not fix, but that
"accessible HTML" would address.

But again, at the risk of repeating myself *too* many times, I
have absolutely nothing against PDF per se (I use it all the time
myself, albeit only for certain, strictly limited, use cases);
and certainly nothing against "accessible PDF".  I think
"accessible PDF" is a very valuable innovation, and I would love
to see all existing PDF resources displaced, overnight, by
"accessible PDF" resources, if that were somehow, magically,
possible!  And I definitely do not want PDF format resources
removed from the web - precisely because I agree with Eoin that
there are many people (including many people with disabilities)
who will find this the optimal format for many purposes.

My *only* plea is that any deployment of "accessible PDF" should
not be at the expense of, or to the exclusion of, an "accessible"
HTML version.

> The other issue that I want to raise is regarding cost.  For
> long reports, the cost of preparing accessible versions varies
> wildly, depending on the tool used.
[...]

I agree.

The claim of the white paper was (deliberately) a bit abstract,
or even idealistic, in this respect.  It was saying that since,
"in principle", essentially the same things need to be done to
make an "accessible PDF" as to make an "accessible HTML"
document, then the costs *should* generally be much the same. I
still stand over that as a generalisation, and indeed, have
*some* experience and anecdotal evidence that at least mildly
supports it. But I also agree that for any *particular* designer,
or any *particular* agency, using any *particular* tools, for a
*particular* document, then there may well be significant
cost variation between "accessible HTML" and "accessible PDF".

But: all of these things - the qualifications and skills of
authors and designers, the choice of tools, the choice of
agencies - all interact with what clients *ask for*.  So I am
very wary of having the tail wag the dog here.  I think it is
important for any person and organisation commissioning document
design for web publication to firstly articulate what they would
ideally *like*, and only secondarily (if at all) to accept
current constraints of particular individual designers, agencies
or tools.  This is the only way that designers, agencies and tool
developers will ever be encouraged to meet the "ideal"
requirements. ("Maybe not today; maybe not tomorrow; but some
day...")

And from that perspective, I (still!) don't hesitate in
recommending the ideal (for "long complex" documents) as being to
provide *both* "accessible HTML" and "accessible PDF".

A very close second best (for me!) would be "accessible HTML" +
"presentational PDF". Indeed, this is a model I myself use quite
a lot; and my own "excuse" for not going all the way to
accessible PDF also is precisely a reflection of limitations of
my own current (heavily customised and idiosyncratic) tool set!

Third would be "accessible HTML" only. (This level is surely
appropriate for most shorter, simpler, documents, where PDF
really doesn't bring anything extra to the party. But wherever
PDF *does* offer any significant added value, it seems to me to
be easy, and of negligable cost, to shift from this level "up" to
the previous level of also offering "presentational PDF"....)

Fourth would be "accessible PDF" only.

Depending on the document - its length and complexity - I guess
Eoin might suggest swapping the order of the previous two. I
think that is probably a reasonable difference of opinion or
emphasis between us; and we will probably have to agree to
differ.  But I hope we can agree that nothing *very* much hangs
on this particular ranking? That is, in any case where this
judgement might actually come in to play, I think we might both
really be encouraging organisations to go up to at least the
level of "accessible HTML" + "presentational PDF", if not to
"accessible HTML" + "accessible PDF"?

In any case, a *long* way behind all of these would be
"presentational HTML" + "presentational PDF", and the very last
option would be "presentational PDF" only.  Unfortunately, this
last case is *way* too common, which is where the whole
conversation started...

[...]
> This is a very important issue when considering what advice to
> give to Irish public sector organisations. Telling them that
> the best delivery format for the web is accessible HTML isn't
> much use when they have a Quark-generated PDF file in their
> hand.

Hmmm.

Well I, for one, am not saying that "the best delivery format for
the web is accessible HTML".  I'm saying something more like "the
best delivery format for long complex documents on the web is
accessible HTML *plus* accessible PDF". (OK, for shorter, simpler
documents, any kind of PDF version, even as an option, becomes
progressively less useful: so yes, at *some* point on the
spectrum I would revert to the simpler formulation of "accessible
HTML is best"; but Eoin framed this particular discussion as
being at the other end of the spectrum.)

But most importantly, I want to say this *before* an organisation
has a "Quark-generated PDF file in their hand" in the first
place...

> We should focus more on getting these organisations to adopt
> publishing procedures and guidelines that will make it easier
> for them to publish information in an accessible way, [...]

Absolutely agree.

> [...] by
> a) ensuring that the report is prepared using an
> application that supports the creation of accessible PDFs
> (whether prepared internally or externally), and

Well ... I *want* to say something more like:

  a) ensuring that the report is prepared using an application
  that supports the creation of a single source accessible master
  that can be automatically transformed into multiple,
  accessible, target formats, including at least accessible HTML
  (in both single and multi-page formats) and accessible PDF; and
  also, ideally, large print, embossed braille, accessible DAISY
  and additional new mainstream "ebook" formats as any of these
  achieve significant adoption rates.

Technically this is not rocket science.  But yes, as is clear
from the discussion here, the tools are still immature, and still
fall at least somewhat short of this ideal in various ways.
That's why I have now presented above a sort of "sliding scale"
of desirability.  And yes, the tradeoff is complex, and will
certainly vary with the particular document type.  So, let me
acknowledge that the formulation in the current text of the
e-Access lab white paper may be a little dangerous, if not
positively misleading, in trying to give relatively simplistic
"catch all" advice...

So, many thanks to Eoin, and everybody else, for their
comments. I've certainly learned a few useful things. I *will*
re-visit the text of the white paper at least one more time to
see if I can better reflect the complexities of the issue!

Best regards,

- Barry.




More information about the CEUD-ICT mailing list