February 11, 2008

"Find in Files" in the *Nix Shell

Another language I'm slowly learning is the *nix shell. It is terribly difficult to learn and terribly powerful at the same time. Its difficulty is the reason it is powerful and its power is the reason it is difficult. As we curse the *nix shell one must remember it was created by programmers for programmers. So the *nix shell does not prescribe to Cooper's (and others') teaching that in software less is more. Anyway, on to the point...

As with most programmers, a common thing I have to do all the time is find a specific piece of text a huge list of files. There are probably hundreds of ways to do this on the nix command line but I've recently realized what I think is the most simple and concise way for my common needs. Assume we need to search all files ending in .cs in all subdirectories of the current one for the text zip. We can do this by entering the following at the command line:

egrep zip `find ./ -iname "*.cs" -type f`

Note that zip is my regular expression (albeit a very simple one) and that find is a command with arguments surrounded on both sides by the grave accent character. Its worth noting that I alias egrep on my machine in my profile to mark matches in color which makes viewing the output much easier.

Now, the only problem with this more simple expression above is that it doesn't work with large lists of files. Instead you need to do a loop like the following to use huge lists of files:

for myfile in `find ~/projects/mySourceDir -iname "*.cs" -type f -print`
do
egrep -H zip "$myfile"
done

In this case I'm doing a loop to pass only one file at a time to egrep. I'm also using a more qualified path to get find to return a fully qualified pathname (I'm sure find can do that a better way, I just haven't figured out how). The -H options just makes sure the filename is printed too. The only thing I have a problem with in the second one is path names containing spaces aren't grabbed by egrep.

While trying to solve a problem with spaces in directory names I found you can use xargs (xargs is a real a gem) to sweeten this syntax a bit more:

find ./ -iname "*.cs" type f -print0 | xargs -0 egrep --color=always -H zip

The trick to solving the spaces in filenames problem is using the -print0 option of find and the -0 option of xargs. -print0 uses null characters instead of spaces to separate file names, and -0 tells xargs to expect arguments to be separated by null instead of space. I added --color because xargs apparently won't use aliased commands (where I added --color=always) and goes directly to egrep instead.

Read about standard input/output streams and command substitution, grep, find, regular expressions, our saving grace xargs, and spend some time trying to let the nix command line spirit flow into your brain, and eventually it will become obvious why all of these things work (and do the same thing).

0 comments: