Writing Utilities in Python: fang

fangWhile working at Salesforce, I hacked a nice little utility in perl called fang, which is short for “find and grep”. It had some nice features, and I’ve often wanted something like it that I could use at home and share freely. I started hacking in python, and have some of the major features implemented, with a test program. I’m going to walk through the code in this post, because I think it makes a nice simple example of a python utility. From the top:

#!/usr/bin/python

import argparse
import os
import re
import sys

The shebang at the top tells Linux (or OSX) to use the python interpreter /usr/bin/python to execute the utility when you run it. The imports are all standard python modules:

  • argparse is the standard command line argument parser. It’s not as nice as docopt, but it’s built in to the language.
  • os is the operating system interface, giving access to files. Most utilities need this.
  • re is the regular expression module, allowing powerful patten matching.
  • sys is the system interface. Most utilities need this.

argumentParser = argparse.ArgumentParser( description=”List the files in a directory”
+ ” (default .), or the lines of files matching a pattern if specified”)
fileCriteria = argumentParser.add_mutually_exclusive_group()
fileCriteria.add_argument(“-e”, “–extension”,
help=”file names must end in the string ‘EXTENSION'”)
fileCriteria.add_argument(“-f”, “–file-pat”,
help=”file name must match the regular expression /FILE_PAT/”)
argumentParser.add_argument(“-N”, “–no-line-numbers”, action=’store_true’,
help=”don’t include line numbers in output”)
argumentParser.add_argument(“directory”, help=”directory to search (defaults to ‘.’)”,
default=”.”, nargs=’?’)
argumentParser.add_argument(“pattern”, nargs=’?’,
help=”pattern to search for (list files by default)”)
arguments = argumentParser.parse_args()

All of the above code is setting up to parse the command line arguments. Here’s the usage message you get when you run the utility with the -h option:

usage: fang [-h] [-e EXTENSION | -f FILE_PAT] [-N] [directory] [pattern]

List the files in a directory (default .), or the lines of files matching a
pattern if specified

positional arguments:
directory             directory to search (defaults to ‘.’)
pattern               pattern to search for (list files by default)

optional arguments:
-h, –help            show this help message and exit
-e EXTENSION, –extension EXTENSION
file names must end in the string ‘EXTENSION’
-f FILE_PAT, –file-pat FILE_PAT
file name must match the regular expression /FILE_PAT/
-N, –no-line-numbers
don’t include line numbers in output

The arguments you pass get stored in the variable arguments.

pattern = re.compile(arguments.pattern) if arguments.pattern else None

The above line precompiles the pattern to look for in the files (if any). It’s important to do this for efficiency, because in a big file tree, the pattern will be matched against thousands of lines.

def fileIsWanted(filePath, fileName=None):
if arguments.extension:
if not filePath.endswith(arguments.extension):
return False

elif arguments.file_pat:
if not filePat.match(fileName if fileName else os.path.basename(filePath)):
return False

return True

This little function above takes a file path and a file name (the “basename” of the file path, which, if you don’t provide it, will be derived if needed) and determines whether they meet the filtering criteria specified on the command line.

directories = os.walk(arguments.directory, followlinks=True)

for directory in directories:
# if not grepping, apply filters and if met, list the directory

if not pattern and fileIsWanted(directory[0]):
print directory[0]

Finding of all the files under a directory is done using the os.walk function. It returns the directories, visited in depth first order. Each is a triple of (directory-path, list-of-subdirectories, list-of-files). If a pattern to look for in files isn’t specified, the directories themselves may need to be listed, which is what the above if statement is doing.

        for fileName in sorted(directory[2]):
filePath = directory[0] + “/” + fileName

if not fileIsWanted(filePath, fileName):
continue

The above code begins looping through the files in the directory in sorted order. If the file doesn’t meet the criteria you specified, it’s skipped.

                # If not grepping, just print the file name

if not pattern:
print filePath
continue

The above code is the last of the code that just lists files (what the Linux find command does).

with open(filePath) as fileStream:
fileLines = fileStream.readlines()

The above code slurps a file in as a list of lines.

        for (lineNumber, line) in enumerate(fileLines):
if pattern.search(line):
print (filePath + (“” if arguments.no_line_numbers else “:” + str(lineNumber))
+ “:” + line),

The above final bit of code, each line is searched for the pattern. If found, the line is printed, prefixed by the file path and line number. This is what the Linux grep command does.

A very common command line I run on Linux to look for something in every C file (including C header files):

find directory -name *.[ch] | xargs grep -n pattern

Can be replaced with:

fang -f .*\.[ch] directory pattern

Some of the features I have yet to implement, like suppression of searching in comments and ability to ignore generated code and binary files, are the real reasons why I built this utility in the first place, and I’ll probably add the in future. The latest version of fang, along with a unit test program, is available for free download from GitHub: https://github.com/jimbelton/jools

Advertisements

About jimbelton

I'm a software developer, and a writer of both fiction and non-fiction, and I blog about movies, books, and philosophy. My interest in religious philosophy and the search for the truth inspires much of my writing.
This entry was posted in programming and tagged , , , , . Bookmark the permalink.

Leave a Comment

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s