Making Images Zaurus Friendly

How many times do you download an image file to your Zaurus, only to have your pdf viewer or image viewer crash or tell you the image cannot be displayed? This could be because the image file is corrupt, because you do not have software capable of reading that particular image format, because the image is too big or has more pixels than your combination of hardware and software can display, or just because you do not have enough free cpu or internal memory to successfully view the file.

I have learned to always make sure I have plenty of free buffer space on my Collie before attempting to open an image file, at least 2000 k, and more is better. You can get this information from the Memory section of the System Info tab on ROM 2.38. But free buffer space is not displayed in the SysInfo tab on ROM 1.12 on the sl6000. So, on ROMs that do not display free buffer space in the Sysinfo Tab, you need to enter the "free" command from a terminal to get that information.

Since the output of "free" can be a little work to figure out, especially if you want to use the results in a script, and opening the Sysinfo Tab on the Collie uses more time and cpu, I have written an alias that is easier to decipher. To see exactly how to install and use the alias that does this, read the introduction and the very first example on my page about aliases.

Now, before attempting to view an image on my Collie, I also check to make sure I have at least as many free bytes of internal memory as the size of the file, although twice as much space is preferable as many image files need to expand when they load. To get a copy of the short script I use to do this without having to do the math in my head, check out my fb script.

Remember that the cache of your console display can take up a lot of space in some ROMs, so be sure to close the console and then open it up again to clear that cache. While memory management differs on various Zauruses, however it works, it still can improve your ability to view images to close as many other applications as possible before viewing images, and to clear the cache of the console, by closing it and then opening it up again if you need to use it simultaneously.

Speaking of clearing caches, it also can help to clear your browser's cache. If you have Opera and it is not open, you can clear significant disk space by removing the file .opera/vlink4. And if you have installed Netfront, and the "clear cache" button does not work, you can clear it's cache by removing all files with the .dcf extension in ~/Applications/netfront3/cache/. Do NOT remove all files in the Netfront cache directory, one of these files contains important settings and if you accidentally remove it, then Netfront will revert to Japanese. If you accidentally remove the file, you will need to restore it off of a backup or extract a fresh version of cache.fat from the ipk.

Most of you know that some files that can neither be viewed with your browser, nor with the built-in image viewer, can be viewed by adding a pdf viewer, such as qpdf or qpdf2, or a tiff viewer.

But what do you do, if you have an appropriate viewer, and maybe even plenty of free buffer space and internal memory, and still are having problems viewing an image? You can cheat by examining and modifying the image files on your Zaurus or on a desktop machine! And while the modifications may need to be made on the desktop if you do not have the required commands installed on your Zaurus, if you have syncing, ssh or ftp access to the desktop, you can do this all in the comfort of your lounge chair, in the back yard, with your Zaurus, using various command-line image utilities.

The commands and scripts I am about to discuss utilize the Linux "identify," "pdfimages", and "convert" commands, and have been tested by me using ssh from both my Collie with ROM 2.38, and my sl6000L running ROM 1.12, to connect to Redhat Fedora and Ubuntu servers.

I discovered at OESF that it is possible to install these image processing commands on the Zaurus if you are running Angstrom or pdaxiii3, and after hunting around, it looks like they are available for other ROMs as well. So, many of us may be able to run at least some of these commands and scripts directly on our Zaurii instead of via ssh with a host computer. Although it is going to be a while before I can finish checking out other ports, if you want to install ImageMagick on your Zaurus, see my page listing places where you can find ImageMagick for the Zaurus.

Another approach to dealing with difficult images is to use the perl script written by Lyndon Hill, which was written for the SL3200. The script can run through a directory resizing all the images in it to fit the Zaurus display, and is dependent on the "identify" and "convert" commands from ImageMagick. You can read about and download Koan's resize script here. I do not know what it would do about processing multimegabyte images or images requiring more workspace than may be available on the older ROMs, and I am a little wary of this approach to resizing when there is visual data requiring no loss of resolution, but it is a great idea to try when detail is less important than the size of the image and efficiency in loading it to view.

Anyhow, whatever distro you are running, if you have done much online genealogy, especially if you are using ROM 2.38 or earlier, you probably know how difficult it can be to download and/or view census images from Heritage Quest and commercial genealogy sites such as Fold3 and Ancestry.com. The tiff files from Heritage Quest crash my tiff viewer, pdf files crash qpdf, and the jpg files are often much too big to download or view comfortably.

But my buddy Quickening has written a simple script to convert pdf files, such as those from Heritage Quest, to tiff files that are both compatible and small enough for me to open up with tiffviewer on my Collie and my Tosa. The script must be run on a Linux box which has both Glyph & Cog's "pdfimages" command from http://www.foolabs.com/xpdf/about.html, and ImageMagick's "convert" command from http://www.imagemagick.org/ installed. He called the script persi. It will take a black and white pdf file and create a tiff file sampled at 50%. If you want an html-free copy of the script, which also changes negative images to positive ones and supports conversion of pdfs containing more than one page, click here. If not, you can copy the following, simpler version of the script, which will only convert one pdf page:

# this script may be distributed under terms of the GPL
# expect pdf filename in 1st arg

# extract images
pdfimages $1 $base

# assume only 1 image. only tiff supports 2 color
convert $base-000.pbm -sample 50% -depth 1 $base.tiff

# clean up
rm -f $base-000.pbm

The script expects just one argument, the name of the file to be converted, and creates a tiff file with the same basename. If the sampling rate of 50% is not legible enough, you can change it to suit your taste.

In the following example, persi will create the file "mycensusimage.tiff":

bash# persi mycensusimage.pdf

Again, as noted above, if your pdf contains more than one page, or looks like a negative, you can download a copy of the enhanced version of "persi" from http://www.sdjf.esmartdesign.com/files/persi.

In some cases, however, even the enhanced version of persi produces an image that my Zaurus renders with white characters on a black background, and can be very hard to read. I have this problem with fax images, and with newer pdfs downloaded from HeritageQuest. If you find this is a problem, you can download and try using the alternately enhanced version of "persi" which samples at 99% and negates the image. I call this version "persineg" and you can get it from http://www.sdjf.esmartdesign.com/files/persineg.

If you have a pdf file that contains text rather than an image or picture, then "persi" will not be able to process them, and you will instead need to use the very simple pdftotext or pdftohtml utilities that come in the ImageMagick package.

Please note that "persi" will NOT convert color pdf image files to tiffs. It was written specifically for the conversion of black and white pdf image files, and will fail if you try to run it on a pdf that is multicolored. I am sure some of you might want to run it on a colored pdf, but that level of flexiblility would require more complex coding.

I have found that many of the poor resolution census images I find at HeritageQuest are too difficult to decipher at 50% sampling. So, I personally prefer running persi at 99%, and then using the "convert" utility or my crop.sh script to pare it down further, if necessary, once I see the size of the resulting tiff file. To change the default sampling percentage in persi, I put "persi" into a text editor, and changed the 50% to 99%. If you are nervous about or wary of editing the persi script yourself, you can get a copy of the versions I use for 99% sampling, by clicking here for persi99 and here for persineg at 99%.

In some cases, persi will crash without completing conversion, but you may be able to retrieve bitmap files and convert them to jpg or tiff format yourself. Use "ls -t" to view the most recent files created, and if you see any files with "pbm" as the file extension, use the convert command to convert them to the desired format. Here is an example, all you need to do is give the input and output file names as arguments to convert:

convert image001.pbm image001.jpg

But many census, as well as other data image files, are just way too big for me to view without my Zaurus either crashing or refusing to open them up, even after conversion using "persi", so I have also written a complementary script that enables me to pick the piece of the jpeg, tiff, gif, or png file I want to read. I call it "crop.sh", and you can read about it here.

Another great trick for paring down the size of an image file is to use "convert" to take a sample of it. The quality of the image is decreased, but if you could not otherwise view the image, then it is better than not viewing the image on your Zaurus at all. There are exceptions, but in general, as a rule of thumb, I find that sampling an image with the "convert" command at 80%, creates an image at a little more than half the size of the original image without losing significant quality in most cases. The legibility after sampling at various percentages depends on the quality of the original image.

Again, please note that I am talking about running "convert" on a host box, where there is plenty of space to process images before transferring them to the Zaurus. Some options of "convert" work beautifully on the Zaurus itself, but I am finding that sampling is not one of them because "convert" runs out of memory, unless the image to be processed is quite small to begin with, at least with my early Sharp ROMs.

Here are examples, showing the sizes of the files before and after sampling at 50% and 75% via ssh to a host box. The first example is a tiff file of nearly 3 megabytes:

bash# ls -l morris-1910SC.jpg
-rw-r--r-- 1 sdjf sdjf 2901466 May 27 08:48 morris-1910SC.jpg

Converting it at 75% sampling gives me a file of nearly 2 megabytes, and sampling the file at 50% gives me a file of about 1 megabyte:

bash# convert morris-1910SC.jpg -sample
50% morris-1910SC50%.jpg
bash# convert morris-1910SC.jpg -sample
75% morris-1910SC75%.jpg
bash# ls -l morris-1910SC*
-rw-r--r-- 1 sdjf sdjf  914138 Aug 26 10:36 morris-1910SC50%.jpg
-rw-r--r-- 1 sdjf sdjf 1821542 Aug 26 10:36 morris-1910SC75%.jpg
-rw-r--r-- 1 sdjf sdjf 2901466 May 27 08:48 morris-1910SC.jpg

The syntax for the "convert" command is as follows, where "nn" is the percent sampling rate desired. Be sure to include a proper image filetype in the file names, such as jpg, png, or tiff:

convert name-of-file-to-convert -sample nn% name-of-output-file

And here are some examples showing conversion of an approximately 1.5 megabyte file to 60% and to 80%.

bash# ls -l JosephJones-1900.tiff
-rw-r--r-- 1 sdjf sdjf 1386620 Jul 29 00:10 JosephJones-1900.tiff

Sampling at 60% takes this image down to about half a megabyte, and sampling at 80% takes it down to just under a megabyte.

bash# convert JosephJones-1900.tiff -sample 60% Abraham-1900-60%.tiff
bash# convert JosephJones-1900.tiff -sample 80% Abraham-1900-80%.tiff
bash# ls -l JosephJones*tiff
-rw-r--r-- 1 sdjf sdjf  500100 Aug 26 10:47 JosephJones-1900-60%.tiff
-rw-r--r-- 1 sdjf sdjf  888014 Aug 26 10:48 JosephJones-1900-80%.tiff
-rw-r--r-- 1 sdjf sdjf 1386620 Jul 29 00:10 JosephJones-1900.tiff

The "convert" utility also has another option to "scale" an image. The usage and approximate amounts for reducing the image are about the same. In some cases, it results in a clearer image of reduced size than the option to "sample". Syntax for this option:

convert name-of-file-to-convert -scale nn% name-of-output-file

Example using "scale" option follows. I like putting the percentage in the name of the output file, but you can name the output file anything that you want.

convert picture.jpg -scale 70% picture-70%.jpg

Another neat trick is that the "convert" utility takes the end of the name of the output file (i.e., the extension, such as tiff, jpg, png) as a directive for what image type to use for the new file. So, if you want to convert a png file to a jpg and sample it at 60%, here is an example that does that:

convert picture.png -scale 60% picture-60%.jpg

Or, if you just want to change the image format, you can do that, without doing any sampling or scaling:

convert picture.png picture.jpg

For more information about the multitude of options for the "convert" command, read it's man page, or read the online documentation at http://www.imagemagick.org/.

Another possibility to consider, if you need to pare down the size of image files that contain only printed, typewritten or handwritten data, is optical character recognition (OCR). I have heard that high caliber OCR software can decipher handwriting, and even elaborate handwritten script, but is very expensive. To do OCR, I use, via ssh to a Redhat Fedora server from my Zaurus, the free Linux OCR utility called gocr.

However, if accuracy is essential, OCR cannot take the place of viewing the original image, as even the very best OCR software makes transcription or translation errors.

As I mentioned above, all these scripts were written to be run on a remote computer with the purpose of paring down the size of image files to make them Zaurus compatible. Possibly the OCR, and definitely the ImageMagick image utilities have been ported to be run on Cacko, pdaXrom, Angstrom and possibly other ROMs on the Zaurus itself, but the memory requirements may make it an impractical endeavor for some of us to run directly, especially for very large images.

However, they are great tools and still available via ssh onto host boxes. If you cannot run them on your Zaurus or home computer, try getting a free account at the Public Access Unix System http://sdf.org, which I understand also has these utilities available for members. Just be forewarned that the initial Korn shell which members are given access to is quite limited, and you cannot get access to a bash shell until after you make a donation or are validated by another member.

While I have now installed ImageMagick on my sl6000 running Sharp ROM, it is very slow and can run out of memory and crash even on my Tosa, so I am going to continue to do a lot of image processing using the above commands and scripts via ssh.

Have fun!

Revised December 29, 2011