Automating Simple Tasks on Linux (Shell Scripts and other Simple Tools) Shells

Monty Stein Mar 24, 2000

There are 2 major groupings of shells in common use. Bourne derived shells evolved from the first Unix shell. Bourne (sh), Korn (ksh) and Bourne Again Shell (bash) are the major variants in use. The other evolutionary branch are the C shells from the early Berkeley Unix systems. csh is the form that most people see. Apparently it is a necessary feature of any shell to have the name form a pun.

Everything below will use Bourne style syntax since it will work on the broadest set of shell programs.

When using a shell it isn’t directly apparent how much power you have at your fingertips. There is a language that shells understand (embodied in shell scripts) that is also available when you are typing in front of one in interactive mode or put into files as scripts.
Globbing
Globbing is the step that is performed on wildcards in filenames to expand them into real filenames. Thus *.log is expanded to a list of all the files in the directory that match the pattern.

*
matches any number of characters
?
matches any one characters
[]
matches a range of characters (like [0-9] for all the numbers)

For example:

echo * # when you don’t have enough of
# a system left to run ls

Input/Output
There are 3 I/O streams that are set up by the shell for any program that it starts. Standard Input (stdin or stream number 0), Standard Output (stdout or #1), and Standard Error (stderr or #2). Normally the program sees it’s input from stdin, sends output to stdout and reports any errors to stderr. These streams can be manipulated before the program sees them by redirecting them. Normal forms of redirection are simply to and from files:

someprog output_file

In this case the input and output are handled by files and any errors would likely be reported to the screen. (Programs, of course, are not limited to use only these streams. However, most shell programming uses support programs of this form.)
Streams can be joined by referencing them by number. Thus:

someprog output.log 2>&1

joins stderr (#2) and stdout (#1) together and put them into output.log. Ordering is critical with this form. The steps read from right to left.

There is one more form of stream and it is called a “Here Document”. Its use is rare. Check the manual page for more information.
Variables
Environment variables are most commonly used to pass some tidbit of information to a program that needs login or machine specific configuration (such as what X display your programs should display to).

To set a variable, the name of the variable is used with an equals sign immediately following (naming convention for environment variables is to use upper case). To access the contents of the variable, use a dollar sign and then the variable name:

/tmp-> VAR=contents /tmp-> echo $VAR contents
In places where the name of the variable would touch another alphanumeric character, the variable can be bracketed by curly braces to force the correct behavior:

mv $VAR ${VAR}old
(in this case, the shell would be looking for a variable VARold if the braces were not used)

Variables can be exportable or not. Exported variables are passed to programs that the shell starts (and to any that they start). Unexported variables are restricted to the current shell. Any number of variables can be exported with the export command:
/tmp-> VAR=contents /tmp-> sh -c 'echo $VAR' # start a subshell as another process

/tmp-> export VAR /tmp-> sh -c 'echo $VAR' # start a subshell with the exported value contents
Remember, under the design of the Unix process model, programs spawned by the shell cannot set variables in the parent shell. The way around this is to source a file in the current shell:
/tmp-> . somefile
would run somefile as if it was typed in at that point.

Shorts:

env
will print all the variables that the shell knows about.
export
without any arguments will print all exported variables.
unset
will remove a variable.

Special Variables

$?
return code of the last program
$!
PID of last background process spawned
$$
PID of the current shell, useful for creating temp files
$*
all the arguments to this shell
$0
what this shell was called by
$1 … $9
Arguments to the shell script (more args are there, just call shift to get to them)

Quoting
The shell supports a rich variety of common quoting types (all right, 5).
Double quotes bracket strings and allow variables to be expanded.
Single quotes bracket strings and do not allow any shell string operations inside.
Back quotes will run the contents in a separate shell and return the output IN PLACE.
A backslash will quote a single character.
A pound sign “#” will comment out the rest of the line
/tmp-> VAR=contents /tmp-> echo "this string has $VAR" this string has contents /tmp-> echo "this string has \$VAR" this string has $VAR /tmp-> echo 'this string has $VAR' this string has $VAR /tmp-> echo `echo $VAR|tr a-z A-Z` CONTENTS
Job Control
Programs can be run in the background by putting an ampersand “&” at the end of the line. Last backgrounded job process ID is in the “!” variable. wait will wait on all background jobs to finish when called with no arguments, or by PID when it is given as the argument. Wait returns the exit code of the process that it waited on.
Testing and Control Structures
Commands that have completed successfully return a value of 0. This way errors can be identified by a rich set of return codes (thus TRUE is equal to 0 and FALSE is everything else). The last command’s return value is stored in the “?” variable.

The test command sets its return code based on an expression that can return information about a file or compare strings. The test command itself can be accessed by the “[” “]” pair. (For the historically minded: The original Bourne shell didn’t have the square bracket shortcuts and the test program had to be called directly. That is why if you want to get a list of all that test can test you have to run man test.)
A call to test would look something like:
[ -r /tmp/output ]

This would read: If the file /tmp/output is readable, return a TRUE exit status.
Note: When testing the contents of variables, always put them in double quotes. This avoids the problem of:
VAR="" [ $VAR = junk ]
The shell sees:
[ = junk ]
if $VAR is quoted, the empty string would be visible to the shell.
Check the manual page for test for all the different options that it takes. Note: Some of the test options are platform dependent. Keep them simple for portability sake, use -r (readable) instead of -e (exists) which does not work with HP-UX’s test.
if/then/elif/else/fi
if expression then ... elif expression then ... else ... fi
The classical if statement. If the result of expression is TRUE, then execute the then block. The expression can be a call to test or any other program.
if [ ! -d tempdir ] # create a temporary directory then # if it doesn't already exist mkdir tempdir fi
Lazy Evaluation
Lazy evaluation takes advantage of the fact that if A and B must be true in order for something to happen and A is not true, there isn’t any point in evaluating B (the converse for “or” also applies, it just flips the logic). It can shorten simple tests in scripts.
For example: This:
if [ -r somefile ] then cat somefile fi
Will run identically to:
[ -r somefile ] && cat somefile
For “or” the logic inverts. If the first command is not TRUE, execute the second:
[ $RETURNCODE -eq 0 ] || echo "command failed"
case/esac
A case statement allow simplification of lots of nested if/then blocks. It takes the form of:
case value in pattern1) ... ;; pattern2) ... ;; esac
The patterns are matched against the value using the same expansion rules that would be used for filename globbing.
Functions
Functions encompass small, often called portions of a script. From the outside the function looks like another program. From the inside the function everything looks like it is running in a separate shell except that all the variables from the parent script are available to the function for reading and writing.
The return (exit) value is passed with a call to return.

Functions look like:function somefunc { echo "hello $1" return 0 }
And are called like:
somefunc "world"
Looping
for/do/done
For loops iterate over a list of values and execute a block for each entry in the list. They look like
for variable in list do ... done
list can be the output from a program or a globbed list. For example:
for filename in *.c do cp $filename $filename.backup done
while/do/done
While loops look and act a lot like for loops, but will loop indefinitely until an expression returns FALSE. It looks like:
while expression do ... done
For example:
while [ ! -r "STABLE" ] do echo "waiting on STABLE flag" sleep 60 done
Common Commands
ls
ls will LiSt the contents of a directory. Since most people learn about this command fairly quickly, I’ll focus on the more useful flags:
-S sort by size (GNU only)
-t sort by time last modified
-r reverse sort
-a all files
-d do not enter directory
cut
cut will cut lines of text by column or by delimiter.
-c10-24 would output columns 10 through 24
-d: -f1,3 would output the second and fourth fields delimited by a colon

this:that:the other

returns

this:the other

sed
Stream EDitor. Will run ed style commands on files. The most common way to use it is for search and replace on the fly.

the first line

sed s/first/second/g returns

the second line

tr
character TRanslator. tr can translate one set of characters into another as well as suppress duplicate input characters.

lower case

tr a-z A-Z returns

LOWER CASE

the -s switch will force the suppression of duplicated sequences of characters

this that the other

tr -s ‘ ‘ returns

this that the other

(useful to preprocess a tabular report into something that cut can work on) The -d switch will delete characters (can be very useful with the -c complement switch to return only a given set of characters).

wc -l somefile|tr -cd “0-9” # gives the number of lines
# w/ no other chars

sort
sorts files. The most common way to use this is sort -u to suppress duplicated lines after the sort. A -r will reverse the sort, -n will attempt to convert string based numbers into machine numbers for the sort.
find
Find recursively descends into filesystems and (in the simplest form) prints filenames based on certain criteria.

find . -type f -print

will print the names of all files below the current directory

find . -newer /etc/lastbackup -type d -print

will print the names of all directories that have had files added or deleted since the file /etc/lastbackup was last modified.
xargs
xargs will build up command lines from standard input. When supplied a command to run, it will execute that command with as many arguments built up from it’s input as the OS will allow. -n num will limit the number of args passed to each command invocation

find . -type f -print|xargs grep “your keys”

would search all files below and in the current directory for the string “your keys”
Running Jobs at Certain Times
cron
The crond daemon runs once a minute and runs any jobs scheduled by crontab or at. It normally handles all the recurring jobs that maintain the system. It can also be a huge security hole (there was a notable problem with the vixiecron system in RH5 series). Because of the problems that it use can cause, the cron system has built in to it a way of restricting its use by the allow and deny files that are stored in /etc.
crontab
Each user (that is allowed to) has a crontab file that is read/written with the crontab command. Crontabs are used for jobs that need to be run at regular recurring points in time. The crontab file has this structure:

minute hour month-day month weekday job

So, to fetch mail using a script called /usr/local/bin/getmymail every minute during business hours:

* 7-17 * * 1-5 /usr/local/bin/getmymail

Read as: for every minute between 7am to 5pm from Monday (day 1) to Friday (day 5) run the job /usr/local/bin/getmymail.

Use crontab -l to get the contents of your crontab entries. It is a very good idea to keep a master copy that you can edit and reload.

A possible edit session would be:

crontab -l >mycrontabfile

edit mycrontabfile

crontab mycrontabfile

The scripts that are run will not have any variables but the minimal user environment set. Any scripts that are run should set up any variables that they need (an expanded $PATH variable for example) or assume nothing about the environment they will be running in. Any output generated will be mailed to the user.
at
Runs a job at a specific time. Differs from crontab in that it will run the job only once and that all environment variables are carried through from the shell that called it. The script is to come in via standard input any output will be mailed to the user.

At allows easy setting of the time that the script is to be run:

echo “myscript”|at now + 5 hours

would run myscript 5 hours from now.

echo “someotherscript”|at 5 pm Tuesday

would run someotherscript at the next 5pm on a Tuesday. Be certain to double check the date that at reports when the job is scheduled so that it is what you expected.

At will also run jobs by date:

echo “were you fooled?”|at 5 pm april 1

at -l will list all pending jobs.
atrm (or at -r on some systems) will remove a numbered job.
batch
Close to at now, but holds the job until the load average falls below 0.8 as well as running the job at low priority. Play nice.