wgrep

Wgrep is a tiny Perl script which makes hyper-link lists from the grep output. Wgrep (and other scripts on this page) is a Public Domain Script, so you can use or distribute it freely.

As wgrep uses grep and w3m, you have to get w3m. W3m is a character base WWW browser. Because of w3m's function which can receive HTML documents from the standard input, wgrep script is strikingly simple.

As wgrep sends its command line options directly to grep, the usage of the wgrep is almost the same as that of grep. eg.,

$ wgrep keyword *

makes an hyper-linked list of the files of the current working directory which contains keyword.

Wgrep consists of following two scripts, wgrep and lf.pl. Lf.pl is a perl script which makes HTML document from the grep output. You must let them executable with 'chmod +x' shell command.

wgrep

#!/bin/sh

exec grep $@ | lf.pl | w3m -T text/html

# wgrep

lf.pl

#!/usr/bin/perl

while (<>) {
    /^(.*?):(.*)/;
    $url = $1;
    $line = $2;
    $line =~ s/\&/\&amp\;/g;
    $line =~ s/\</\&lt\;/g;
    $line =~ s/\>/\&gt\;/g;
    $line =~ s/\"/\&quot\;/g;
    print "<a href=\"$url\">$url</a><br>\n$line<br><br>\n";
}

# lf.pl

You can use lynx instead of w3m by using following lgrep and lf_lynx.pl scripts.

lgrep

#!/bin/sh

grep $@ | lf_lynx.pl > ~/.lgrep.html
exec lynx ~/.lgrep.html

# lgrep

lf_lynx.pl

#!/usr/bin/perl

$a = `qwd`; chop($a);
$pwd = $a . '/';
while (<>) {
	/^(.*?):(.*)/;
	$url = $1;
	$line = $2;
	$line =~ s/\&/\&amp\;/g;
	$line =~ s/\</\&lt\;/g;
	$line =~ s/\>/\&gt\;/g;
	$line =~ s/\"/\&quot\;/g;
	if ($url =~ /\//) {
		print "<a href=\"$url\">$url</a><br>\n$line<br><br>\n";
	} else {
		print "<a href=\"$qwd$url\">$url</a><br>\n$line<br><br>\n";
	}
}

# lf_lynx.pl

wlocate

The output of the file and/or directory search program locate can be hyper-linkable with the same idea of wgrep. Wlocate is comprised of two scripts, wlocate and locf.pl. The usage of wlocate is similar to that of locate, that is,

$ wlocate (part of)filename

If matched list is too long, you can filter if with grep, by using locf.pl directly, as follows,

$ locate php | grep manual.html | locf.pl | w3m -T text/html

wlocate

#!/bin/sh

exec locate $@ | locf.pl | w3m -T text/html

# wlocate

locf.pl

#!/usr/bin/perl

while (<>) {
    chop;
    print "<a href=\"$_\">$_</a><br>\n"
}

# locf.pl

You can use lynx instead of w3m by using following llocate script.

llocate

#!/bin/sh

locate $@ | locf.pl > ~/.locf.html
exec lynx ~/.locf.html

# llocate

Namazu (the catfish in Japanese)

Namazu is a text retrieval program similar to WAIS, but the index of the documents must be created manually. It is not an inconvenience, because the usage becomes more simple than WAIS.

Assume you have HTML documents under ~/mydoc directory and you want to search them with the keyword. First you have to do is to create a directory which will contain index files of the HTML files such as ~/index/mydoc_index. Then move to ~/index/mydoc_index by the cd command. After you have changed the present working directory to ~/index/mydoc_index/, type the following command on the command line.

$ mknmz ~/mydoc/

Then mknmz makes indexfiles of the HTML files under ~/mydoc directory and put them into the ~/index/mydoc_index/ directory. And THATS ALL YOU HAVE TO DO before retrieving HTML documents with arbitrary keywords. Now type following command on the command line, and get the list of HTML files including keyword

$ namazu keyword ~/index/mydoc_index/

You can also get the hyperlinked document lists retrieved by namazu by some trick. Make following shell script mynmz ( the name of the script is arbitrary ).

#!/bin/sh
namazu -h $@ ~/index/mydoc_index/ | w3m -T text/html
# mynmz

You need w3m to let this script work. Then type the following line on the console and get the hyperlinked list to the HTML documents which contain the keyword.

$ mynmz keyword

The default number of items of the list is 20. If you want to show the list of all documents which include the keyword, use -a option.

$ mynmz -a keyword

If you want to show 10 list items beginning with the 5th list item, use -w and -n command line options.

$ mynmz -w 4 -n 10 keyword

The updation of the index is performed by the same way of making a new index. It is done only to the new HTML documents to spare time.

Have fun!

Tomokiyo Nomura (Nobeoka, Japan), Dec 10, 2000