Linux Format

Scripts for finding files

John Schwartzma­n shows how to write a Bash shell script that will find your files and directorie­s, as well as text in those files.

-

John Schwartzma­n shows us how to write a multi-purpose Bash shell script that will find your files and directorie­s, as well as text in your files.

With this tutorial you’re going to see how you can build file and text searching tools and how you can reuse those tools to search other object types without too much duplicatio­n of effort.

For example, sooner or later every developer asks themselves the question: “Where did I define the class Strstrmbuf?” And “How many times is Strstrmbuf used in the C++ code, and in what C++ files?”

The answers to these questions are shown in Figures 1 and 2. The first command is findh --match ‘class Strstrmbuf’ . In plain English, that means: in all the C/ C++ header files (*.h files) that you can see from here, show me all the matches ( -m or --match ) that contain the string ‘class Strstrmbuf’ . Notice you put the string you’re searching for in quotes, since it contains a space. If you want to see a little more of the files containing the matching strings, you can add the -k or --context option to the previous command. That will display the line before the match, the matching line and the line after the match. That’s the second command shown in Figure 1.

The third command (Figure 2) is findcpp –-count Strstrmbuf . In other words: in all the *.cpp files that you can see, in how many files ( -c or --count ) did you find the string Strstrmbuf ? For the fourth command you remove the --count option to just show the C++ files that contain Strstrmbuf . If you wanted to highlight all of the occurrence­s of ‘Strstrmbuf’, you could add the -match option to the findcpp command.

The fifth command adds the –q or --query option to the fourth command, and we get the rather long answer partly shown in Figure 2:

find -type f -regextype posix-egrep -regex ‘^.+\. (cpp|cc)$’ 2>/dev/null | xargs grep -1 --color ‘Strstrmbuf’ 2>/dev/null

So, if you wanted to look for matches to ‘Strstrmbuf’ in other files, you’d know exactly how to ask the question. Copy the --query output to the command line and modify it to search for something else (like *.hpp or *.h files). In fact, why not simply add those other file types for which you want to search to your arsenal of file and text searching commands?

Chances are, they’re already in findit. Use findit --help to see all of its aliases. Our file searching and text matching command is called findit; it uses find and sometimes grep (with a little help from xargs) to do its work. The findh, findc, findcpp, findxml, findjava, findtxt and all related tools are six-byte symbolic links or aliases to the findit Bash shell script, which itself weighs in at 34K – see Figure 3.

Devs who’ve seen this have asked “What on earth are you doing with a 34K script?” For them, we’ve divided

findit into four separate shell script files of around 5K-16K that we stitch together at build time. That makes the program slightly more manageable to work on. But we firmly believe that you can write well-crafted shell scripts that aren’t “write once, read never”!

We’ll start by building findit and its symlinks or aliases. Since you are going to be writing to the /usr/ local/bin directory, you must issue the build command as root. First, make sure that your copy of makefindit. sh is executable. If it’s not, issue the command mx makefindit.sh in your working directory. mx is an alias (defined by alias mx=’chmod +x’ ) that makes a file

executable for everyone. Change to the directory where you are going to build the project and issue:

$ mx makefindit.sh

$ sudo make

Hopefully, everything built properly and makefindit. sh reported success. Issue this command to find files in the current directory:

$ findit --level=1 --extended

The -e or -extended option means show file details (user, group, type, permission­s, size) and colour-code the output. The --level=1 option tells findit to only search in the current directory.

As you can see in Figure 3, the project consists of a template file, findit-template.sh; a usage file, finditusag­e.sh; a getoptions file, findit-getoptions.sh; a getscript file, findit-getscript.sh; and finally a makefile, makefindit.sh, with another makefile called Makefile to build and deploy the program.

Why do we need Makefile when we have makefindit.sh? Makefindit.sh is the tool we use to build findit, but make and Makefile have the intelligen­ce to decide whether findit needs to be rebuilt. make will only rebuild findit if one of the dependenci­es of findit is newer than the version of findit in /usr/local/bin. If so, Makefile will invoke ./makefindit.sh to perform the build. Running sudo make creates findit and, at last count, 64 symbolic links to it. Running sudo make clean removes findit and its 64 symlinks from /usr/ local/bin. We can see more info about the script files in the directory by asking which files are writable:

$ findsh --extended --writable

And which files are not writable:

$ findsh --extended --notwritabl­e

The outputs from these commands are shown in Figure 3. They make sense because the source files are owned by user js, and he’s the one asking the questions. The output file findit.sh, however, was built in a root process, so it isn’t writable by a mere js!

While looking at permission­s, let’s go a little deeper and ask in which *.sh files the user has read, write and execute permission­s (see Figure 4 over the page):

$ findsh --extended --permission -u+rwx

The minus sign in -u+rwx means that you don’t care about permission­s of ‘group’ or other. You can also invert the question and ask in which *.sh files the user doesn’t have read, write and execute permission. As you can see in Figure 4, only in the executable files does the user js have execute permission. In the source files, the user js only has read and write permission­s. Now, invert the question:

$ findsh --extended --nopermissi­on -u+rwx

Now use findit to look at the outputs of our build:

$ findlink --dir=/usr/local/bin --linkto=findit --extended $ findx --dir=/usr/local/bin --Name=findit --extended

The first command shows all of the symbolic links in the directory /usr/local/bin that link to findit. The second command shows all of the files in that directory that are executable ( findx ) that have the full name findit . The outputs of the two commands are shown in Figure 5. The --Name=findit option tells findit that you’re looking for one specific file. If you leave out the --NAME option, findx will find all of the executable files in /usr/local/bin.

The build assembles findit.sh from the template file which contains placeholde­rs for its three sub-files. It creates all of the files shown in Figure 5. The build file, makefindit.sh, contains some interestin­g features, so we’ll look at it first. It declares some variables and a simple array called FILES that contains all of the symbolic link names. (If you add an alias to findit, remember to add the alias name in the FILES array

of makefindit.sh).

The build then checks that the user has root privileges ( checkforro­ot() ) and then it checks for a known OS type ( checkos() ). So far, findit has only been built on Linux and macos – it would very much like to be tried out on a UNIX system and a Cygwin system. You can see that makefindit.sh modifies some variables based on the OS type. makefindit.sh

knows that the other components of the program are located in the same directory as itself, so it gets the name of its directory from its command line argument $0 and does a cd to that directory ( buildfindi­t() ). cd $(dirname $0)

DEVDIR=$PWD

The variable DEVDIR represents the source or developmen­t directory. It’s the working directory where you’re building the program. The variable BINDIR is set to /usr/local/bin. It represents the output or destinatio­n directory. Why use /usr/local/bin rather than ~/bin? Well, /usr/local/bin is accessible to any user of your computer, while ~/bin is accessible only to you. You are going to use $TEMPLATE_FILE (filegrepte­mplate.sh) as the pattern for the output file $SHSCRIPT.SH (that is, the file which will be findit.sh). You’ll build $SHSCRIPT.SH in $DEVDIR before copying it to $BINDIR and renaming it to findit.

The editor we use to copy $TEMPLATE_FILE into

$SHSCRIPT is the stream editor sed. This opens the input stream $TEMPLATE_FILE and directs its output to the output stream $SHSCRIPT.SH , both of which are in $DEVDIR . The sed option -n suppresses automatic printing of the pattern space, the s option substitute­s s/this/ for that/ and the p option follows what you’re going to print to the output stream. The first time you use sed is to copy lines 1 to 15 of $TEMPLATE_FILE to $SHSCRIPT.SH . Notice that we use the > symbol to create $SHSCRIPT.SH . After the first step, we’ll use the >> symbol to append to $SHSCRIPT.SH .

# copy lines 1 to 15 of the template file to findit.sh sed -n ‘1,15p’ $TEMPLATE_FILE > $SHSCRIPT.SH

sed then looks at line 16 of $TEMPLATE_FILE and strips out <> and replaces it with the variables builddate and OSTYPE .

# replace the temp with the build date and OSTYPE sed -n “16S/<>/$BUILDDATE for OSTYPE = $OSTYPE/P” \ $TEMPLATE_FILE >> $SHSCRIPT.SH

sed then copies some declaratio­ns from lines 17 to 33 of $TEMPLATE_FILE to $SHSCRIPT.SH . The next thing that happens is that you echo some newly created declaratio­ns that are based on the operating system type to $SHSCRIPT.SH . You then use sed to copy lines 34 to 44.

You now look for three placeholde­rs, #<>, #<> in the template file and replace them with USAGE_FILE (findit-usage. sh), GETSCRIPT_FILE (findit-getscript.sh) and GETOPTIONS_FILE (findit-getoptions.sh). The functions in these files do most of findit’s work. Replacing the first of the placeholde­rs is done like this:

# replace the #<> place holder on line 45 of TEMPLATE_FILE

# with nothing and then cat USAGE_FILE to findit.sh sed -n “45S/#<>//P” $TEMPLATE_FILE >> $SHSCRIPT.SH cat $USAGE_FILE >> $SHSCRIPT.SH

The second and third placeholde­r are handled in the same way. Notice that all of the placeholde­rs are placed inside comments. That’s important. You’ll get all kinds of errors if you don’t put them in comments – we that out the hard way. Using sed, you then copy lines 48 to the end of $TEMPLATE_FILE to $SHSCRIPT.SH and then cd to $BINDIR .

makefindit.sh then deletes the old symbolic links in $BINDIR and the old findit in $BINDIR (the removefind­it() function). It then copies DEVDIR/ SHSCRIPT.SH to BINDIR/SHSCRIPT (this is copyfindit() ) and creates the new symbolic links to

findit (this is createsyml­inks() ). You can follow the code in makefindit.sh and see the output in Figure 5. You now need to look at the functions you’ve put into

findit to see what they do. The <> functions consist of the usual usage() function, an alias()

function which shows all the aliases to findit and supplement­s usage() , plus the usual version() function. The usage() and alias() functions use less text and an alternate screen to display their output without mucking up the text already on the screen. findit-usage.sh also contains a few helper functions.

<> contains the getoptions()

function, which processes all the command line options that you can add to a findit alias command. There are a prodigious number of options available and we use GNU

getopt, the central component of getoptions() , to handle them. It also contains some helper functions which are used to decide if the user input option syntax and content is correct. Finally, <> contains the getscript() function.

getscript() examines the script variable ${0##*/} , which contains the filename portion of the path and indicates which alias was used to invoke findit – for example. findh, findcpp, findxml and so on. getscript() contains a case statement with all the possible aliases to findit. getscript() may populate an extension variable, $ext , with the last portion of a regular expression (regex) used to find files with a particular extension. The findjava symbolic link is used to search for files with ext=’\.java$’, while the findasm symbolic link is used to search for assembly language files that have ext=’\.(asm|s)$’ (*.asm or *.s files) . The findsh alias is used to find shell scripts that have ext=’\.sh$’.

For each alias, getscript() initialise­s a file or directory descriptio­n variable called fdesc , which is used by the usage() and alias() functions. For the

findphp alias, it’s fdesc=’php files’. For the findgit alias, fdesc=’git repositori­es’ . In the case of findgit, the case statement sets type=’-type d’ as an option which tells find to search for directorie­s. The default type is type=’-type f’ which tells find to search for files. The

findlink alias has a type=’-type l’ to search for links. Many find commands have a regular expression that describes the entire path of a file or directory or link. In the case of findgit, regex=’^.?/([^/]+/)*\.git$’ . This tells

find to search for a path expression that has any number of directorie­s but has the final (hidden) directory .git. There is also a findhfile alias for finding hidden files. findhfile uses regex=’^.+/\..+$’ . Matching files will have any number (greater than or equal to 1) of characters, but will have a final name starting with a ..

There is also a findhdir alias for finding hidden directorie­s. Its regex is ‘^.+/\..+$’ . findhdir matches any number of characters followed by a final directory name that begins with . . In Linux and Unix, a directory or file whose name starts with . is hidden in a normal

ls operation. To see it you must use ls -a . In the world of regex or regular expression­s, the symbol ^ stands for the beginning of the entire search expression and the symbol $ stands for the end of the entire search expression. When we create a regex expression, our search must encompass the entire path and not just the part of the path we care about!

Many aliases define an ext variable that will later be turned into a regex expression if no regex variable has been defined. Some aliases, like findshell, construct a multiple part extension variable such as ext=’\. (sh|pl|py|rb)$’ . This is used to form a regex expression that matches any number of initial directorie­s but looks for shell scripts which are either Perl, Python, Ruby or plain old sh files. findtxt finds text files with ext=’\. te?xt$’ . It will look for files ending with .txt or .text .

The getoptions() function is used to reconcile the myriad options you can pass to a findit alias. There are options to find executable files, or links, or pipes, or sockets, or directorie­s, or just plain files. You can specify a starting directory with -d thing or --dir=thing . The default starting directory is $PWD , but you can also set the starting directory to . which will use $PWD but will give you relative paths from $PWD starting with ./ instead of starting with the root of the file system.

You can specify whether or not you are interested in finding either files or pattern matches in files (using

grep). You can specify whether you want to see matches in the files you find ( -m ‘text’ or --match=’text’ ) or simply the count of the number of files that contain a match ( -c or --count ). You can specify whether you want to ignore case in a filename

( -I or --ignore-case-find ) or in a pattern match ( -i or --ignore-case-grep ). You can specify -l or --level

which is the maximum depth to search. You can specify a partial ( -n or --name ) or complete name ( -N or --NAME ) to match. Finally, there is the -q or --query

option. This doesn’t search, but instead displays the complete find (or find and grep) search expression assembled by findit.

If findit doesn’t fit your needs exactly, the --query

option provides you with a search expression starting point that you can modify and use on the command line. If it turns out to be useful, you can easily modify

findit to add a new alias (see the box, top-right).

After invoking getscript() and getoptions() , findit will either perform your search, or show you its query, or inform you that your provided options don’t make sense. This could be something like specifying --match with no pattern to match, or specifying --match and

--count together – findit can show counts or matches, but not both. But, as they say in infomercia­ls, there’s more. You can look at your computer’s block devices by using the alias findblock --extended . The result is shown in Figure 6. Change the command to findchar – extended and you’ll see those familiar character devices, /dev/null, /dev/zero, /dev/urandom and /dev/tty0 to /dev/tty9, and so on. Try reducing it by specifying --group=lp and as you can see in Figure 6, we see all of the line printers.

Try using findsh (or any alias) with -h or --help to see what findit has to say about itself, its options and its aliases. Most of all, please examine findit.sh or its component files, whichever you find easier, to see how it works. I hope you’ll find findit to be a useful tool in its own right, but also a platform that is extremely easy to modify and expand. Have fun!

 ??  ??
 ??  ?? Figure 1: The findit command at work.
Figure 1: The findit command at work.
 ??  ?? Figure 2: Some of the questions you can ask findit.
Figure 2: Some of the questions you can ask findit.
 ??  ?? Figure 3: More questions you can ask findit!
Figure 3: More questions you can ask findit!
 ??  ?? Figure 4: Still more questions you can ask findit…
Figure 4: Still more questions you can ask findit…
 ??  ?? Figure 6: Your computer’s block devices and character devices listed.
Figure 6: Your computer’s block devices and character devices listed.
 ??  ?? Figure 5: The many, many files created by sudo make.
Figure 5: The many, many files created by sudo make.

Newspapers in English

Newspapers from Australia