This article attempts to explain the concept of "reading order" in PDF files. Why is this necessary?
Many have come to use the term "reading order" as functionally synonymous with the logical order provided by PDF tags, but this interpretation is incorrect.
When you create a PDF, youre painting a picture. Your "paintbrush" is the combined effect of the software used to create the source document and the software youve chosen to convert your source document into PDF.
Like brushstrokes, each character, each line and each image is created independently, but interact to produce particular visual effects. On a PDF page, objects are connected by a coordinate system and little else. Theres no logical connection between the letters comprising a word; characters simply happen at a series of locations on the rendered page.
As originally designed, PDF is a system for painting on a page. There's no innate concept of words, sentences, paragraphs, columns, headings, images, tables, lists, footnotes - any of the semantic structures that distinguish a "document" from a heap of letters, shapes and colors. PDF is fundamentally about how the document appears on the page, not how it looks when abstracted from the page.
(Originally posted on appligent.com. Read the rest of the article there)