Saturday, September 25, 2010

The TeX engine as a solution for dynamically typesetting ebooks

During the last month I watched a couple of conference presentations (i.e., William Cheswick on TeX and the iPad and Kaveh Bazargan on TeX as an ebook reader) that discussed the possibility of using the TeX typesetting system on the current generation of e-book reading devices, and in particular on the iPad.  LaTeX in particular has traditionally been used to typeset mathematical, scientific and technical publications for electronic and print distribution.  (TeX, the base engine for which LaTeX is a front-end markup language, was invented by Donald Knuth over thirty years ago in order to address the problem of typesetting equations for his The Art of Computer Programming).  But the language more than adequately handles typesetting for books in the humanities and social sciences, and many reviewers believe that TeX kerns words better than Adobe Pagemaker/InDesign or Quark Xpress.  (Notably, you will be able to see from the copyright page that Cambridge University Press uses LaTeX to typeset many of its more recently published books.)

What I find really interesting about Cheswick's and Bazargan's proposals is that they try to solve one of the fundamental problems that has confronted publishers of electronic texts.  Unlike the static PDF files that InDesign and Quark produce, which fix forever a document's pagination and fonts, TeX is capable of dynamically generating pretty-print text in order to fit different orientations for an ebook reader, or to accommodate a reader's preference for a larger font size (which means, in essence, that TeX instantly generates a new DVI or PDF file as needed).  Of course, one of the traditional strengths of electronic texts (such as the plain text ebooks that one can download from Project Gutenberg) has been this kind of plasticity: it is easy to open a TXT file in a word processor and to customize it to one's heart's content.  But as anyone who has tried to read a very long TXT document on their computer knows, these texts are not very pretty.  The standard kerning and tracking between characters, especially for a basic monospace font, is very crude.  Plain ASCII text also has no support for a host of typographical conventions that have informed how we have read the codex book for the last five centuries, including footnotes, sidenotes, glosses and various textual ornaments.  This is why typeset PDFs are preferable in many ways for electronic reading.  But these texts have never been very plastic; even zooming in on a page to make the font bigger entails constant panning from left to right and up and down.  This can be particularly tedious if you are reading on a small screen, such as that of an iPhone or iPod Touch.

It is great that there are researchers who are thinking about how to achieve the best of both worlds.  In a way this use of TeX is an extension of HTML, since that markup language has also in a more limited way supported balancing page design features (such as tables, different fonts or block quotes) with the ability to dynamically reset the text in order to fit between the vertical boundaries of a web-browser window.  (Incidentally, I have learned in my own experience formatting ebooks for the Amazon Kindle that the best results are achieved by submitting a file in HTML for conversion.  I would thus not be surprised if Amazon's proprietary AZW format was using something very close to HTML for formatting its books.  This also explains why the Kindle presents a fairly decent typographical reading experience: HTML is more competent for reading than plain text.  But this also explains why Kindle books are not as pretty as the text, say, in a Folio Society book)

I found it interesting that Bazargan said that his implementation of TeX on the iPhone/iPod Touch did not support pagination.  It is clear from the presentation that the software typesets a document as one long page, so that the reader can freely scroll and up and down the document.  This design decision solves the problems implicit in letting software, no matter how smart, automatically break text, figures or equations over two pages.  From my experience using InDesign, I can say that it does take a human eye in order to decide how best to set text around a page break.

No comments:

Post a Comment