While working at Salesforce, I hacked a nice little utility in perl called fang, which is short for “find and grep”. It had some nice features, and I’ve often wanted something like it that I could use at home and share freely. I started hacking in python, and have some of the major features implemented, with a test program. I’m going to walk through the code in this post, because I think it makes a nice simple example of a python utility. From the top:
The shebang at the top tells Linux (or OSX) to use the python interpreter /usr/bin/python to execute the utility when you run it. The imports are all standard python modules:
- argparse is the standard command line argument parser. It’s not as nice as docopt, but it’s built in to the language.
- os is the operating system interface, giving access to files. Most utilities need this.
- re is the regular expression module, allowing powerful patten matching.
- sys is the system interface. Most utilities need this.
argumentParser = argparse.ArgumentParser( description=”List the files in a directory”
+ ” (default .), or the lines of files matching a pattern if specified”)
fileCriteria = argumentParser.add_mutually_exclusive_group()
help=”file names must end in the string ‘EXTENSION'”)
help=”file name must match the regular expression /FILE_PAT/”)
argumentParser.add_argument(“-N”, “–no-line-numbers”, action=’store_true’,
help=”don’t include line numbers in output”)
argumentParser.add_argument(“directory”, help=”directory to search (defaults to ‘.’)”,
help=”pattern to search for (list files by default)”)
arguments = argumentParser.parse_args()
All of the above code is setting up to parse the command line arguments. Here’s the usage message you get when you run the utility with the -h option:
usage: fang [-h] [-e EXTENSION | -f FILE_PAT] [-N] [directory] [pattern]
List the files in a directory (default .), or the lines of files matching a
pattern if specified
directory directory to search (defaults to ‘.’)
pattern pattern to search for (list files by default)
-h, –help show this help message and exit
-e EXTENSION, –extension EXTENSION
file names must end in the string ‘EXTENSION’
-f FILE_PAT, –file-pat FILE_PAT
file name must match the regular expression /FILE_PAT/
don’t include line numbers in output
The arguments you pass get stored in the variable arguments.
pattern = re.compile(arguments.pattern) if arguments.pattern else None
The above line precompiles the pattern to look for in the files (if any). It’s important to do this for efficiency, because in a big file tree, the pattern will be matched against thousands of lines.
def fileIsWanted(filePath, fileName=None):
if not filePath.endswith(arguments.extension):
if not filePat.match(fileName if fileName else os.path.basename(filePath)):
This little function above takes a file path and a file name (the “basename” of the file path, which, if you don’t provide it, will be derived if needed) and determines whether they meet the filtering criteria specified on the command line.
directories = os.walk(arguments.directory, followlinks=True)
for directory in directories:
# if not grepping, apply filters and if met, list the directory
if not pattern and fileIsWanted(directory):
Finding of all the files under a directory is done using the os.walk function. It returns the directories, visited in depth first order. Each is a triple of (directory-path, list-of-subdirectories, list-of-files). If a pattern to look for in files isn’t specified, the directories themselves may need to be listed, which is what the above if statement is doing.
for fileName in sorted(directory):
filePath = directory + “/” + fileName
if not fileIsWanted(filePath, fileName):
The above code begins looping through the files in the directory in sorted order. If the file doesn’t meet the criteria you specified, it’s skipped.
# If not grepping, just print the file name
if not pattern:
The above code is the last of the code that just lists files (what the Linux find command does).
with open(filePath) as fileStream:
fileLines = fileStream.readlines()
The above code slurps a file in as a list of lines.
for (lineNumber, line) in enumerate(fileLines):
print (filePath + (“” if arguments.no_line_numbers else “:” + str(lineNumber))
+ “:” + line),
The above final bit of code, each line is searched for the pattern. If found, the line is printed, prefixed by the file path and line number. This is what the Linux grep command does.
A very common command line I run on Linux to look for something in every C file (including C header files):
find directory -name *.[ch] | xargs grep -n pattern
Can be replaced with:
fang -f .*\.[ch] directory pattern
Some of the features I have yet to implement, like suppression of searching in comments and ability to ignore generated code and binary files, are the real reasons why I built this utility in the first place, and I’ll probably add the in future. The latest version of fang, along with a unit test program, is available for free download from GitHub: https://github.com/jimbelton/jools