grep for pdfs

Did you ever miss the functionality to perform a full text search in multiple pdf files from the command line in linux?

With the linux command grep one can search for a given text in multiple files.  If you don´t know it already you can find some information about grep here.  Sadly it can not be used for searching in pdf files, which is certainly an important task. Imagine you have some thousand pdf files archived on your harddrive and you are looking fore some information contained in them. It is far to much work to open each of them in your pdf viewer and search for the needed information. In this situation a tool like grep is quite handy.

A few days ago I found the interesting tool pdfgrep. It works similar to grep, but can search in pdf files. You can download it from SourceForge. Then build pdfgrep from source.

For gentoo users, as usual there is a more easy way. I wrote a simple  ebuild for pdfgrep. You can download the  ebuild here: [download#41]

To use the ebuild, just copy it to/usr/local/portage/app-text/pdfgrep/. You probably have to create the directory. Then run

ebuild /usr/local/portage/app-text/pdfgrep/pdfgrep-1.1.ebuild digest

Be sure to include the following line in your /etc/make.conf.

PORTDIR_OVERLAY=”/usr/local/portage”

Afterwards just emerge pdfgrep.

Sadly pdfgrep is not capable of recursively searching complete directory structures like one can do with egrep -r. This would enable one to search complete pdf collections. Not a big problem. Just use the following line of code:

find -name “*.pdf”  -exec pdfgrep -C50 -Hni $1 ‘{}’ ‘;’

For convenient use place it into a script file:

echo “find -name \”*.pdf\”  -exec pdfgrep -C50 -Hni \$1 ‘{}’ ‘;'” > /usr/local/bin/pdfrgrep

And make it executable:

chmod +x /usr/local/bin/pdfrgrep

Now you can just cd to the directory of your pdf collection and search it by entering:

pdfgrep [searchterm]

Regards

Jürgen

1 Star2 Stars3 Stars4 Stars5 Stars (1 votes, average: 5.00 out of 5)
Loading...

ImportError: No module named layman.config

Today, when I tried to sync the portage overlays on my gentoo linux boxes, I got the error:

ImportError: No module named layman.config

The recent update from python-2.5 to python-2.6 has broken several applications. This issue was solved by running python-updater, which re-emerges all broken python packages. Besides from layman not working, the python upgrade may cause several other problems in portage and in the whole system. Thus just be sure to run python-updater after the python-upgrade.

Jürgen

1 Star2 Stars3 Stars4 Stars5 Stars (No Ratings Yet)
Loading...

check cd/dvd script

I sometimes have the problem that k3b fails to compare a burned disc with an ISO-image because it fails to reload the disk. So I wrote a small script that compares a CD or DVD against an ISO-Image. Since it can be quite handy, I like to share the script with you.

It takes the ISO-image to compare against as first parameter. The second (optional) parameter it takes is the drive, the disc is located in (i.e. /dev/sr0). Commands it needs and that therefore need to be installed on your system are awk, md5sum and pv. Chances are good, that there exist precompiled packages for your distribution. When the comparison is done it prints out whether it succeeded or not and returns 0 on success and   -1 on failure for use from within other programs.

You may download the script from here:

[download#11] gplv3-127x51

Jürgen

1 Star2 Stars3 Stars4 Stars5 Stars (No Ratings Yet)
Loading...

siteinfo

Translator