Saturday, September 25, 2010

The TeX engine as a solution for dynamically typesetting ebooks

During the last month I watched a couple of conference presentations (i.e., William Cheswick on TeX and the iPad and Kaveh Bazargan on TeX as an ebook reader) that discussed the possibility of using the TeX typesetting system on the current generation of e-book reading devices, and in particular on the iPad.  LaTeX in particular has traditionally been used to typeset mathematical, scientific and technical publications for electronic and print distribution.  (TeX, the base engine for which LaTeX is a front-end markup language, was invented by Donald Knuth over thirty years ago in order to address the problem of typesetting equations for his The Art of Computer Programming).  But the language more than adequately handles typesetting for books in the humanities and social sciences, and many reviewers believe that TeX kerns words better than Adobe Pagemaker/InDesign or Quark Xpress.  (Notably, you will be able to see from the copyright page that Cambridge University Press uses LaTeX to typeset many of its more recently published books.)

What I find really interesting about Cheswick's and Bazargan's proposals is that they try to solve one of the fundamental problems that has confronted publishers of electronic texts.  Unlike the static PDF files that InDesign and Quark produce, which fix forever a document's pagination and fonts, TeX is capable of dynamically generating pretty-print text in order to fit different orientations for an ebook reader, or to accommodate a reader's preference for a larger font size (which means, in essence, that TeX instantly generates a new DVI or PDF file as needed).  Of course, one of the traditional strengths of electronic texts (such as the plain text ebooks that one can download from Project Gutenberg) has been this kind of plasticity: it is easy to open a TXT file in a word processor and to customize it to one's heart's content.  But as anyone who has tried to read a very long TXT document on their computer knows, these texts are not very pretty.  The standard kerning and tracking between characters, especially for a basic monospace font, is very crude.  Plain ASCII text also has no support for a host of typographical conventions that have informed how we have read the codex book for the last five centuries, including footnotes, sidenotes, glosses and various textual ornaments.  This is why typeset PDFs are preferable in many ways for electronic reading.  But these texts have never been very plastic; even zooming in on a page to make the font bigger entails constant panning from left to right and up and down.  This can be particularly tedious if you are reading on a small screen, such as that of an iPhone or iPod Touch.

It is great that there are researchers who are thinking about how to achieve the best of both worlds.  In a way this use of TeX is an extension of HTML, since that markup language has also in a more limited way supported balancing page design features (such as tables, different fonts or block quotes) with the ability to dynamically reset the text in order to fit between the vertical boundaries of a web-browser window.  (Incidentally, I have learned in my own experience formatting ebooks for the Amazon Kindle that the best results are achieved by submitting a file in HTML for conversion.  I would thus not be surprised if Amazon's proprietary AZW format was using something very close to HTML for formatting its books.  This also explains why the Kindle presents a fairly decent typographical reading experience: HTML is more competent for reading than plain text.  But this also explains why Kindle books are not as pretty as the text, say, in a Folio Society book)

I found it interesting that Bazargan said that his implementation of TeX on the iPhone/iPod Touch did not support pagination.  It is clear from the presentation that the software typesets a document as one long page, so that the reader can freely scroll and up and down the document.  This design decision solves the problems implicit in letting software, no matter how smart, automatically break text, figures or equations over two pages.  From my experience using InDesign, I can say that it does take a human eye in order to decide how best to set text around a page break.

Tuesday, September 14, 2010

The problem of making searchable PDFs

It has been a very laborious process trying to discover a free (or at least cheap) solution for making image-based PDFs searchable using software that runs in either Mac OS X or Linux.  This is a particularly pressing need for me given the number of books and other paper-based documents that I scan on a regular basis.  Interestingly, the packaged software that came with my flatbed scanner,a Canoscan LIDE 70, was able to effortlessly add a text-layer to my scans under Windows XP.  However, since I changed computers and operating systems, I have been using VueScan as my scanning app.  While the version of this software (8.6.23) that I have been using has the ability to OCR text and write the output to a TXT file, it is not able to produce searchable PDFs.  (I just noticed that a newer version (8.6.33) released this past May actually does add support for creating searchable PDFs.  I will definitely download this.  I should also note in passing that VueScan adds functionality that Canon's packaged drivers and software lacked, such as the ability to operate continuously through a multi-page scan, eliminating the need to constantly hit the scan button).

In any case, I need a solution for converting the numerous files that I have already produced that are simply image-based.  My goal has been to find a way out of buying an expensive OCR and PDF creation suite, such as OmniPage Pro or ABBYY Finewriter, which can create searchable PDFs.  Most of the free software that I have been able to find through Googling has been designed to work from the Linux command-line.  I am willing to use this software as solution because I have an older laptop on which I have installed Ubuntu 9.10, and I am not against shuffling PDFs between my MacBook Pro and the machine in order to post-process my scans.  (This workflow seems also to be the engineering solution of choice, especially in larger networked settings, since there is a Live-CD based Linux distro designed just for handling this task).

The first software that I tried was pdfocr.  I was able to successfully install all the necessary packages.  I was initially encouraged that the software processed the first PDF that I fed it page-by-page without balking.  However, the script constantly complained that each page image was not at an anticipated resolution of 300 DPI.  There does not seem to be a command-line variable which allows for this variable to be changed.  (Most of the book scans that I have done are at 150 DPI, mostly because this resolution is usable for screen reading and it speeds up the scanning process.  At resolutions of 300 DPI and above the scanning head on my scanner simply crawls). The final output was disappointing.  Though pdfocr successfully added an OCR layer to each page, the underlying text was set at way too many points and thus out of all proportion to the image text.  This layer is not at all usable either for highlighting using PDF annotation software or for searching to find where a word or phrase specifically occurs.

The second command-line based software that I tried, a custom bash script described in this blog post, suffered from the same problem.  This script also uses both the same OCR engine, Cuneiform, and OCR data format, hOCR, as the first software I tried.  This tells me that whatever its OCR accuracy, Cuneiform and hOCR may not be suitable for this application.  At the very least, a programmer with more knowledge than me needs to create more robust options in order to work with my set of files.

Given that Google Book search is able to use its Tesseract OCR software to produce accurate (and accurately placed) text data for page scans, it should not be that difficult to find a free and efficient solution to use on my own computer.

Friday, August 20, 2010

Review of my first Kindle, the Kindle 1

In 2007 I received a first-generation Kindle.  In summary, I would say that the Kindle matched the functionality of the Rocket eBook while adding a host of technological improvements and one key feature that NuvoMedia could never muster (i.e., seamless integration with an online bookstore).  In fact, the free, always-on connection to the cellphone tower gave the device access not only to the Kindle Store but also the World Wide Web.  Granted, the browser in the device was very crude -- suitable for displaying text-based websites only.  This made the Kindle a very good Wikipedia reader, for example.  The programmers included shortcuts in the search system that made using the web browser in this way easier.  For example, prefacing your search with the term "@wiki" would search Wikipedia for a specified term and automatically load the most relevant article.  Similarly, "@web" allowed for quick Google searches.

The Kindle was most special for utilizing E-Ink for its display technology.  The screen went a long way toward relieving eyestrain by mimicking the properties of a printed page.  Unlike CRT or LCD computer monitors, which project light out to you, ambient light illuminates the E-Ink display.  This is why a Kindle reads very well in direct sunlight or under a reading light.  Of course, in darkness it can be a hassle to always have a reading light.  Perhaps one of the advantages of an older e-reader, like the Rocket eBook, or even a laptop, is that they provide their own backlight illumination.

The Kindle was also the first e-reader that I was able to finagle into displaying foreign language texts.  Mind you, this was not because the Kindle came with any native support for foreign alphabets (The Kindle 1 only supported the ISO 8859-1 (Latin 1) character set).  I was only able to read Anna Karenina in Russian on my Kindle due to the fact that the device has a hidden image viewing application that can be used to display page images.  Follow these directions in order to reproduce my workflow for preparing a text: 

1. Download a foreign-language text in HTML or plain text. 
2. Typeset it in a modern word processor (I use OpenOffice) using the custom page dimensions 3.5" X 5" (which approximates the size of the Kindle display).  The margins on all sides should be .1"
3. Export a PDF of the document.
4. Use an application like PDF2PNG to create a batch set of image files from the PDF representing each page of the text.  These files should be placed inside a file folder labled with the title of the work.  This will be the title that displays on Kindle's main menu.
5. Drag this folder to a "pictures" folder on the Kindle.
5. Press the keys "Alt-Z" while at the home screen to make the book you added appear in the list of available reading matter.

Unlike what was true of the Rocket eBook, the Kindle made it easy to extract your textual annotations to your computer for use in other applications.  All annotations were collected into a plain text file that could easily be copied to the computer when the Kindle was attached via USB port.  As of last year it also became possible to sync and view these annotations online at Amazon's website.  Of course, this is not the same as being able to transfer text and annotations together and, in turn, view them together outside of the device.  I do not think that these ways of recording and presenting notes compare favorably to what is possible with good PDF annotation software on a computer (see my earlier post).

I should also mention that the Kindle was a much more flimsy device than the Rocket eBook.  In actual fact I broke the screen of my first Kindle within weeks of receiving it (I had mistakenly placed the device under a heavy book which cracked the screen).  Thankfully, Amazon replaced the device free of charge.  The second device that I was then sent in early 2008 has lasted to the present.  However, I have had to replace the battery once, and most recently the modem has started to work only intermittently, forcing me to use the e-reader via USB if I want to be able to reliably transfer documents and books.

I no longer use this Kindle as my primary e-book reader, having purchased late last year a Kindle 2.  However, I will not discuss this device separately since it has many of the same features and functionality as the Kindle 1.

Saturday, August 14, 2010

Remembering the Rocket eBook, the true pioneer of eBook readers

I was thinking that this blog would be an appropriate venue to discuss eBook readers, especially since in recent years they have really started to come into their own as separate appliances.  Certainly it could be argued that these devices have reached a tipping point in the mass consciousness.  I have actually used an eBook reader of one sort or another on and off for the last ten years.  For much of that time I used a Rocket eBook 1000.  This device was by no measure common and really did not gain a wide following.  It appears that the page that I linked to is an advertisement from circa 2000 (I would link to a Wikipedia article, but there is none).  It is amusing that the page boasts that the now defunct NuvoMedia has sold "tens of thousands" of the reader.  Note that Amazon has sold three or four million Kindles, and this is also supposedly a niche device for serious readers.

In late high school and early college I used the reader to take advantage of Project Gutenberg public domain texts.  Especially at that time, reading a whole book on a curved, CRT monitor was a much more daunting prospect than reading on a modern, flat, high-resolution LCD screen.  The reader's low-resolution black-on-green display was as good as a Palm Pilot's, and yet the screen was large enough (as large as the Kindle's, in fact) to be able to read comfortably for hours at a time.

My Rocket eBook was the way in which I read all of the Constance Garnett translations of Russian literature, including War and PeaceAnna KareninaThe GamblerCrime and Punishment and Dead Souls.  I made many annotations and underlined just as many passages from these works.  The only problem was that at the end of the reader's life it was difficult to transfer this information back to my computer.  For that matter, it was difficult getting any information, including the actual books themselves, off the device.  Naturally, the reader was not very good for any kind of reading where one could expect to incorporate annotations into a Word document on a computer, for example.

The device could display only ASCII text.  This means that trying to use the reader for reading anything but English-language texts was nigh impossible.  After I started learning Russian I racked my brains trying to figure out a way to trick the device into displaying Cyrillic.  (Since the reader could display GIF-based images, I even experimented converting pages of Russian text into small image files.  This, alas, did not really work very well.  I will talk about how I implemented this solution on my first-generation Kindle in my next post).

The Rocket eBook anticipated Apple's current generation of mobile devices by basing the whole interface around a touch screen.  You selected text with the stylus in order to made underlines, and tapped an on-screen keyboard to enter notes.  (The handwriting recognition, like the Palm's, was truly awful).  And also like the iPad, iPhone or iPod Touch, the device could display text either in portrait or landscape modes.

For its time the Rocket eBook was a very nice appliance.  It was built using hard plastics that I do not see in many consumer electronics today.  The fact that it survived from 2000 to 2007 through near daily use speaks to the quality of its construction.  (The fact that the screen showed nary a scratch after seven years of tapping and dragging with the stylus is perhaps more impressive).  I only retired it because I received a Kindle for Christmas 2007.  I fetched a handsome price for the Rocket eBook when I sold it on eBay (the reader does indeed have a small following of devoted fans), and the lady who won the auction wrote me an email afterwards describing how much she loved her first Rocket eBook.

Monday, August 9, 2010

OS X and the Life of Reading

In my first post I wanted to evaluate the functionality Mac OS X in terms of my workflow of digital reading, which occurs mainly in the form of PDF and multi-page TIF files (the latter contain scanned journal articles which I write abstracts for).

1. The ability to annotate a text is necessary for any kind of serious reading, electronic or analog; the process of marking a text and writing notes is bound up in my ability to fully digest and comprehend a text.  (To judge by the extensive record of readers' marks and marginalia in printed books going back to the fifteenth century, this kind of active engagement with the text has long been a hallmark of reading for others too). Indeed, Preview is touted for its ability to annotate PDFs.  But I was not more than a day into using my new Mac last December before I realized that my PDF annotations were not being saved to the PDF file in such a way that other applications could read them.  In searching for alternatives I quickly came across Skim.  But Skim saves all annotations to a separate file in the same directory with the PDF.  Thankfully this app allows the user to save annotations permanently to the PDF file so that other software can see them.  But this option is not automatic and must be manually selected from a menu.

When I was using Windows I really fell in love with PDF-XChange.  This was the first PDF viewer that I used which could truly annotate my readings.  Like other enhanced third-party viewers and editors, as well as Adobe Acrobat itself, PDF-XChange can highlight text and add notes, whether in the form of embedded speech-balloons or direct writing in the margins.  But unlike many other apps, PDFXChange allows you to draw semi-transparent boxes that can be used to highlight text in image-based PDFs.  All highlights can be double-clicked in order to add embedded notes.  Everything that I do in PDF-XChange appears without a hitch in other PDF viewers, including Preview and Adobe Acrobat Reader.

I knew from my experience using Ubuntu that PDF-XChange ran very well under Wine in Linux.  As it turns out, it also runs very well in under Wine in OS X.  And it is currently my PDF viewer and annotator of choice.  (N.B. This tutorial provides a good explanation for how to install Wine in OS X).

Another feature that is important for PDF readers that are used for long-form reading is the ability of the software to remember your place in a document between sessions.  PDFXChange is generally very good at this, but I noticed this evening that if you move your PDF document into another directory on your computer the program will proceed to forget your place in the document.  Of course, for more serious marking of one's place in the text it is also possible to bookmark.

2. TIF handling seems to particularly poor in OS X.  If you open a PDF file in Preview, you can set a zoom level and the option to display pages continuously and not just one at a time.  However, a multi-page TIF file must be viewed page-by-page, and the zoom has to be reset for each new page.  One natural solution to this predicament is to just convert TIF files to PDFs.  Preview in fact gives you this option from the "Save As" menu.  However, the app invariably crashes when it tries to convert large TIF files running to a hundred pages or more.  I initially tried to find an alternative native application that would allow me better TIF viewing and PDF conversion.  Nothing (e.g., CocoViewX) seemed to work any better than Preview.  Finally I found a Windows application, Advanced TIFF Editor, which I was able to run via Wine instead.  This works perfectly without a hitch!  It can easily rotate images and convert TIF files to PDF.

3. One way that I record excerpts from books on the fly when I don't have use of my scanner is by taking pictures using a digital camera.  This method obviously requires post-processing of the image files, and in particular the ability to rotate images.  Preview has difficulty rotating a set of images en masse.  I have not found much discussion of this problem around the Internet.  (I am using 10.5.  I realize that Snow Leopard may have fixed this problem).  Preview has no problem rotating JPG or GIF images individually.  However, if I try to open a series of these files at once, select all of them and press command-R (or -L), the images all appear to rotate.  I then select "Save All" from the file menu, and it appears that each file is being saved.  However, if I try to open any of these files after closing the current Preview window, the images are still all unrotated.