A primary reason I have so little free K bytes in internal memory on my Zaurus is that I want easy access to all the files and email I have written or downloaded, no matter which SD card I happen to be using, and whether or not I am using my CF modem at the moment. I do not want to have to putz around, hunting through backups and then having to extract the files, I want them quickly and easily.
But I have gotten very tired of the lack of space, and started wondering if I could find a way to quickly and easily locate old files I need and extract them. I have not found a perfect solution yet, not one good enough to make me feel comfortable about erasing a bunch of files I think I might need access to on a moment's notice, but I thought I would share what I have done so far, as I have found some techniques which I have found quite helpful when I need to examine the contents of a backup file.
Yes, theoretically I should be able to put these files
on my main SD card, but that is also fairly full.
Since the card is fat16, which means the space used by a file is increased in 16k increments (i.e., a 1k file uses 16k bytes on the card, a 14k file uses 16k, and a 17k file uses 32k, for example), it would use a lot of unnecessary space
that I would rather use other ways unless I put everything
in big tar
files, and there we are again, looking
at backups. The card is full of other things, I
already have been moving my old backups off of it on to
a CF card so I can use the space for research data
I am gathering.
So, I asked a Linux pal what he would do, and he suggested extracting all the files on the backup to "stdout" and grepping those resultant virtual files, but I want to try identifying the file first with "grep," and then extract the file or files I definitely need. Using "stdout" might work if space was not an issue, I can think of a slick way to do it, but I do not have that kind of extra space. If I did, I would not be writing this page!
Okay, so here is what I am doing, I am looking at
the backup file with "grep," and having it extract and number
all the lines that contain the filename of the file
being examined, and all the lines that contain
my keyword or phrase. Then I pipe
those results
through "grep" again, having it select out just those
lines pertaining to files containing my keyword(s),
along with printing the lines containing my keyword(s).
I have used "-wn" to tell grep to look for just
whole word phrases (which should make it run more quickly), and to number the lines. I am using "ustar" as a keyword because I know it appears on every tar
header for every file in my backups, and is an easy way to tell grep to look for the lines containing
the file names. The three backslashes in a row, just before the pipe,
are needed to tell grep to find all lines which contain
either my keyword or the header. The number of backslashes required might be different if you put this code into a script, but it is what works for me on the command line.
bash-2.05# grep -wn "Got your email"\\\|ustar mc/2005-12-18-21-39.backup| grep -B1 "Got your email"
60562:00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000home/root/Applications/qtmail/outbox.txt0000000000000000000000000000000000000000000000000000000000000100755?0000000?0000000?00001477576?10351417176?017531? 00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000ustar ?root0000000000000000000000000000root000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000?newmailid = 1008
60886:subject = Got your email
--
119007:00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000home/root/Applications/qtmail/ooutbox1205bad.txt00000000000000000000000000000000000000000000000000000100755?0000000?0000000?00001000607?10345076665?020647? 00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000ustar ?root0000000000000000000000000000root000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000?newmailid = 1008
119331:subject = Got your email
--
132595:0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000?home/root/Applications/qtmail/CPoutbox1214.txt0000000000000000000000000000000000000000000000000000000100755?0000000?0000000?00001500561?10350046524?020233? 00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000ustar ?root0000000000000000000000000000root000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000?newmailid = 1008
132919:subject = Got your email
--
160843:0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000home/root/Applications/qtmail/Copy of outbox.txt00000000000000000000000000000000000000000000000000000100755?0000000?0000000?00001501070?10351237505?020756? 00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000ustar ?root0000000000000000000000000000root000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000?newmailid = 1008
161167:subject = Got your email
bash-2.05#
I considered using the grep command to examine more than one line in the file before deciding which file to extract, but that did not work. When I tried using "grep" to filter the results by adding additional lines with the "-A" option, I could not retrieve the file name. The following did not give me the file names:
bash-2.05# grep -wn -A1 "Got your email"\\\|ustar mc/2005-12-18-21-39.backup| grep -B1 -A1 "Got your email"
60886:subject = Got your email
60887-date = Sun Nov 13 2005
--
--
119331:subject = Got your email
119332-date = Sun Nov 13 2005
--
--
132919:subject = Got your email
132920-date = Sun Nov 13 2005
--
--
161167:subject = Got your email
161168-date = Sun Nov 13 2005
bash-2.05#
But that makes sense. I know from the earlier example that the header for the file of interest is on the 60562th line of the backup, and both examples show that line I am interested in is the 60886th line. Simple math with the "expr" command will tell me the line number of my keyword or phrase in the file of interest. Yes, I could have used the calculator on my Zaurus, but it was much faster and easier to just paste the numbers in to a command:
bash-2.05# expr 60886 - 60562
324
bash-2.05#
This tells me that adding "-A1" to the first grep
did not work because my keyword phrase is the 324th line
in the file of interest. So, I suggest that, if you want to examine more than one line in the
file before deciding which file to extract, you
should probably pipe
the results of the first
grep through the "less"
or "more"
to
read it.
It would take some fairly complex code to completely mechanize this type of ideal search for a file in a backup, and I have not decided how I want to approach the problem. I think I would have to use a tool like sed to select the lines I want, and do not think it will be easy for me to to get the filename, along with the content, without a lot more work. So, for now, I think I will not be erasing very many files, since I have not yet come up with an easy, automated method of retrieving the exact files I want, quickly and easily.
In the meantime, I do know that whenever I erase any files from internal memory, it will be easy for me to extract those files that I decide I want to retrieve from the tar backup files.
At that point, I will be able to use the method described
in detail in my post about how to extract a single
file from a tar
backup.
By the way, if you just want to find a few lines, and do not
care about which file they are in, you could do
something like the following, where I asked tar
to just feed to grep all the files in my tkcMail
"cur" subdirectories. I limited output to the first
30 lines, using the "head"
command.
(NOTE: I have modified the server numbers shown in
the example below for security reasons)
bash-2.05# tar -xOf /mnt/cf/2008-04-27-20-39.backup home/root/tkcMail/*/cur/* | grep -B1 -A1 server| head -n30
> > bash-2.05# less /etc/resolv.conf
> > nameserver 69.16.159.116
> > nameserver 66.71.1.254
> > bash-2.05# ping 69.16.159.116
--
> pages and then upload them to my
> server. On the homepage I have
> several links to different pages. No
--
> up dead stuff, so to speak. Now,
> with the code that the server adds, I
> get a gazillion validation errors.
--
Received: from mx2.internal (mx2.internal [10.202.2.201])
by server1.messagingengine.com (Cyrus v2.3-alpha) with LMTPA;
Tue, 07 Feb 2006 01:42:36 -0500
--
Received: from mx2.internal (mx2.internal [10.202.2.201])
by server1.messagingengine.com (Cyrus v2.3-alpha) with LMTPA;
Wed, 08 Feb 2006 14:21:05 -0500
--
> Browsing the web, I run out of memory? Same thing with retreiving my
> email -- even with attachments sitting on the server.
--
use it
with my IMAP server but it wants to pull down headers from *every* e-mail I
have stored on the server and thus crashing with out-of-memory errors. That
bash-2.05#
If you want to examine more lines before, change "-B1" to "-B2" or whatever number you want, and if you want to see more lines after, then change "-A1" to "-A2" or whatever number you want.
If you want to
view all the results, you can use the "more"
or "less"
commands. Here is the syntax for "more"
:
tar -xOf exact-path-to-tarfile exact-path-list-of-files-to-examine | grep -B1 -A1 your-keyword-phrase | more
But if there are many lines or files, and you need
to examine more than that one line, I recommend
using "less"
along with asking grep for additional
lines before and after the line containing the keyword
or phrase, as in:
tar -xOf exact-path-to-tarfile exact-path-list-of-files-to-examine | grep -B1 -A1 your-keyword-phrase | less
When you pipe the results through "less,"
you can
search back and forth through the results, instead
of having to painfully scroll up and down using
the space key to go forward and your stylus to scroll
backwards, reading every line. Instead, you can
enter a "g" to go back to the first window, a "G"
to go to the last window, a "w" to go back one window,
or a slash along with a keyword or phrase "/keyword" to look for a specific keyword
or phrase in the output. "less"
does not wrap around, so just be sure to start your search from the beginning of the file (using ":g" to go back there).
The "less"
command is not built in to Sharp ROMs,
but you can find out more about it on my "less"
command
page which is here.