Automating Simple Tasks on Linux
(Shell Scripts and other Simple Tools)

Shells


There are 2 major groupings of shells in common use. Bourne derived shells evolved from the first Unix shell. Bourne (sh), Korn (ksh) and Bourne Again Shell (bash) are the major variants in use. The other evolutionary branch are the C shells from the early Berkeley Unix systems. csh is the form that most people see. Apparently it is a necessary feature of any shell to have the name form a pun.

Everything below will use Bourne style syntax since it will work on the broadest set of shell programs.


When using a shell it isn't directly apparent how much power you have at your fingertips. There is a language that shells understand (embodied in shell scripts) that is also available when you are typing in front of one in interactive mode or put into files as scripts.

Globbing

Globbing is the step that is performed on wildcards in filenames to expand them into real filenames. Thus *.log is expanded to a list of all the files in the directory that match the pattern.
*
matches any number of characters
?
matches any one characters
[]
matches a range of characters (like [0-9] for all the numbers)
For example:
     echo *    # when you don't have enough of
               # a system left to run ls

Input/Output

There are 3 I/O streams that are set up by the shell for any program that it starts. Standard Input (stdin or stream number 0), Standard Output (stdout or #1), and Standard Error (stderr or #2). Normally the program sees it's input from stdin, sends output to stdout and reports any errors to stderr. These streams can be manipulated before the program sees them by redirecting them. Normal forms of redirection are simply to and from files:
   someprog <input_file >output_file
In this case the input and output are handled by files and any errors would likely be reported to the screen. (Programs, of course, are not limited to use only these streams. However, most shell programming uses support programs of this form.)
Streams can be joined by referencing them by number. Thus:
someprog <input >output.log 2>&1
joins stderr (#2) and stdout (#1) together and put them into output.log. Ordering is critical with this form. The steps read from right to left.

There is one more form of stream and it is called a "Here Document". Its use is rare. Check the manual page for more information.

Variables

Environment variables are most commonly used to pass some tidbit of information to a program that needs login or machine specific configuration (such as what X display your programs should display to).

To set a variable, the name of the variable is used with an equals sign immediately following (naming convention for environment variables is to use upper case). To access the contents of the variable, use a dollar sign and then the variable name:
/tmp-> VAR=contents  
/tmp-> echo $VAR
contents
In places where the name of the variable would touch another alphanumeric character, the variable can be bracketed by curly braces to force the correct behavior:
mv $VAR ${VAR}old
(in this case, the shell would be looking for a variable VARold if the braces were not used)

Variables can be exportable or not. Exported variables are passed to programs that the shell starts (and to any that they start). Unexported variables are restricted to the current shell. Any number of variables can be exported with the export command:
/tmp-> VAR=contents
/tmp-> sh -c 'echo $VAR'  # start a subshell as another process

/tmp-> export VAR
/tmp-> sh -c 'echo $VAR'  # start a subshell with the exported value
contents
Remember, under the design of the Unix process model, programs spawned by the shell cannot set variables in the parent shell. The way around this is to source a file in the current shell:
/tmp-> . somefile
would run somefile as if it was typed in at that point.

Shorts:
env
will print all the variables that the shell knows about.
export
without any arguments will print all exported variables.
unset
will remove a variable.

Special Variables
$?
return code of the last program
$!
PID of last background process spawned
$$
PID of the current shell, useful for creating temp files
$*
all the arguments to this shell
$0
what this shell was called by
$1 ... $9
Arguments to the shell script (more args are there, just call shift to get to them)

Quoting

The shell supports a rich variety of common quoting types (all right, 5).
  • Double quotes bracket strings and allow variables to be expanded.
  • Single quotes bracket strings and do not allow any shell string operations inside.
  • Back quotes will run the contents in a separate shell and return the output IN PLACE.
  • A backslash will quote a single character.
  • A pound sign "#" will comment out the rest of the line
    /tmp-> VAR=contents
    /tmp-> echo "this string has $VAR"
    this string has contents
    /tmp-> echo "this string has \$VAR"
    this string has $VAR
    /tmp-> echo 'this string has $VAR'
    this string has $VAR
    /tmp-> echo `echo $VAR|tr a-z A-Z`
    CONTENTS
    

    Job Control

    Programs can be run in the background by putting an ampersand "&" at the end of the line. Last backgrounded job process ID is in the "!" variable. wait will wait on all background jobs to finish when called with no arguments, or by PID when it is given as the argument. Wait returns the exit code of the process that it waited on.

    Testing and Control Structures

    Commands that have completed successfully return a value of 0. This way errors can be identified by a rich set of return codes (thus TRUE is equal to 0 and FALSE is everything else). The last command's return value is stored in the "?" variable.

    The test command sets its return code based on an expression that can return information about a file or compare strings. The test command itself can be accessed by the "[" "]" pair. (For the historically minded: The original Bourne shell didn't have the square bracket shortcuts and the test program had to be called directly. That is why if you want to get a list of all that test can test you have to run man test.)
    A call to test would look something like:

       [ -r /tmp/output ]
    
    This would read: If the file /tmp/output is readable, return a TRUE exit status.
    Note: When testing the contents of variables, always put them in double quotes. This avoids the problem of:
       VAR=""
       [ $VAR = junk ]
    
    The shell sees:
       [ = junk ]
    
    if $VAR is quoted, the empty string would be visible to the shell.
    Check the manual page for test for all the different options that it takes. Note: Some of the test options are platform dependent. Keep them simple for portability sake, use -r (readable) instead of -e (exists) which does not work with HP-UX's test.

    if/then/elif/else/fi

        if expression
        then
          ...
        elif expression
        then
          ...
        else
          ...
        fi
    
    The classical if statement. If the result of expression is TRUE, then execute the then block. The expression can be a call to test or any other program.
        if [ ! -d tempdir ]   # create a temporary directory
        then                  # if it doesn't already exist
          mkdir tempdir
        fi
    

    Lazy Evaluation

    Lazy evaluation takes advantage of the fact that if A and B must be true in order for something to happen and A is not true, there isn't any point in evaluating B (the converse for "or" also applies, it just flips the logic). It can shorten simple tests in scripts.
    For example: This:
            if [ -r somefile ]
            then                           
              cat somefile
            fi
    
    Will run identically to:
           [ -r somefile ] && cat somefile
    
    For "or" the logic inverts. If the first command is not TRUE, execute the second:
          [ $RETURNCODE -eq 0 ] || echo "command failed" 
    

    case/esac

    A case statement allow simplification of lots of nested if/then blocks. It takes the form of:
        case value in
          pattern1)
            ...
            ;;
          pattern2)
            ...
            ;;
        esac
    
    The patterns are matched against the value using the same expansion rules that would be used for filename globbing.

    Functions

    Functions encompass small, often called portions of a script. From the outside the function looks like another program. From the inside the function everything looks like it is running in a separate shell except that all the variables from the parent script are available to the function for reading and writing.
    The return (exit) value is passed with a call to return.
        Functions look like:
            function somefunc {
              echo "hello $1"
              return 0
            }
    
    And are called like:
            somefunc "world" 
    

    Looping

    for/do/done

    For loops iterate over a list of values and execute a block for each entry in the list. They look like
            for variable in list
            do
              ...
            done
    
    list can be the output from a program or a globbed list. For example:
            for filename in *.c
            do
              cp $filename $filename.backup
            done
    

    while/do/done

    While loops look and act a lot like for loops, but will loop indefinitely until an expression returns FALSE. It looks like:
          while expression
          do
            ...
          done
    
    For example:
            while [ ! -r "STABLE" ]
            do
              echo "waiting on STABLE flag"
              sleep 60
            done 
    

    Common Commands

    ls

    ls will LiSt the contents of a directory. Since most people learn about this command fairly quickly, I'll focus on the more useful flags:
    -S    sort by size (GNU only)
    -t    sort by time last modified
    -r    reverse sort
    -a    all files
    -d    do not enter directory

    cut

    cut will cut lines of text by column or by delimiter.
    -c10-24    would output columns 10 through 24
    -d: -f1,3    would output the second and fourth fields delimited by a colon
       this:that:the other 
    
    returns
       this:the other
    

    sed

    Stream EDitor. Will run ed style commands on files. The most common way to use it is for search and replace on the fly.
        the first line
    
    sed s/first/second/g returns
        the second line
    

    tr

    character TRanslator. tr can translate one set of characters into another as well as suppress duplicate input characters.
        lower case
    
    tr a-z A-Z returns
        LOWER CASE
    
    the -s switch will force the suppression of duplicated sequences of characters
       this that  the  other
    
    tr -s ' ' returns
       this that the other
    
    (useful to preprocess a tabular report into something that cut can work on) The -d switch will delete characters (can be very useful with the -c complement switch to return only a given set of characters).
    wc -l somefile|tr -cd "0-9"   # gives the number of lines
                                  # w/ no other chars
    

    sort

    sorts files. The most common way to use this is sort -u to suppress duplicated lines after the sort. A -r will reverse the sort, -n will attempt to convert string based numbers into machine numbers for the sort.

    find

    Find recursively descends into filesystems and (in the simplest form) prints filenames based on certain criteria.
    find . -type f -print
    
    will print the names of all files below the current directory
    find . -newer /etc/lastbackup -type d -print
    
    will print the names of all directories that have had files added or deleted since the file /etc/lastbackup was last modified.

    xargs

    xargs will build up command lines from standard input. When supplied a command to run, it will execute that command with as many arguments built up from it's input as the OS will allow. -n num    will limit the number of args passed to each command invocation
    find . -type f -print|xargs grep "your keys"
    
    would search all files below and in the current directory for the string "your keys"

    Running Jobs at Certain Times

    cron

    The crond daemon runs once a minute and runs any jobs scheduled by crontab or at. It normally handles all the recurring jobs that maintain the system. It can also be a huge security hole (there was a notable problem with the vixiecron system in RH5 series). Because of the problems that it use can cause, the cron system has built in to it a way of restricting its use by the allow and deny files that are stored in /etc.

    crontab

    Each user (that is allowed to) has a crontab file that is read/written with the crontab command. Crontabs are used for jobs that need to be run at regular recurring points in time. The crontab file has this structure:
               minute hour month-day month weekday job
    
    So, to fetch mail using a script called /usr/local/bin/getmymail every minute during business hours:
      * 7-17 * * 1-5 /usr/local/bin/getmymail
    
    Read as: for every minute between 7am to 5pm from Monday (day 1) to Friday (day 5) run the job /usr/local/bin/getmymail.

    Use crontab -l to get the contents of your crontab entries. It is a very good idea to keep a master copy that you can edit and reload.

    A possible edit session would be:
         crontab -l >mycrontabfile
    
    edit mycrontabfile
         crontab mycrontabfile
    
    The scripts that are run will not have any variables but the minimal user environment set. Any scripts that are run should set up any variables that they need (an expanded $PATH variable for example) or assume nothing about the environment they will be running in. Any output generated will be mailed to the user.

    at

    Runs a job at a specific time. Differs from crontab in that it will run the job only once and that all environment variables are carried through from the shell that called it. The script is to come in via standard input any output will be mailed to the user.

    At allows easy setting of the time that the script is to be run:
        echo "myscript"|at now + 5 hours
    
    would run myscript 5 hours from now.
        echo "someotherscript"|at 5 pm Tuesday 
    
    would run someotherscript at the next 5pm on a Tuesday. Be certain to double check the date that at reports when the job is scheduled so that it is what you expected.

    At will also run jobs by date:
          echo "were you fooled?"|at 5 pm april 1
    
    at -l will list all pending jobs.
    atrm (or at -r on some systems) will remove a numbered job.

    batch

    Close to at now, but holds the job until the load average falls below 0.8 as well as running the job at low priority. Play nice.
    Monty Stein Mar 24, 2000