Goin' grep'in

Introduction

grep is one of the most useful tools with one of the strangest names. GREP stands for: "Globally search for the Regular Expression and Print". It can search for straight text strings as well as using a small language called Regular Expressions.

Warning: The interpretation of the more esoteric operators are not handled the same on all implementations of regular expression engines. Test your patterns when in doubt.


This is the text used for the examples (from H.G. Wells' War of the Worlds):
    some of the mental habits of those departed days.  At most
    terrestrial men fancied there might be other men upon Mars,
    perhaps inferior to themselves and ready to welcome a mis-
    sionary enterprise.  Yet across the gulf of space, minds that
    are to our minds as ours are to those of the beasts that perish,
    intellects vast and cool and unsympathetic, regarded this
    earth with envious eyes, and slowly and surely drew their
    plans against us.  And early in the twentieth century came
    the great disillusionment.

Basic string searching

Any character matches itself except if it is a metacharacter. Thus
    grep men demotext
returns:
    some of the mental habits of those departed days.  At most
    terrestrial men fancied there might be other men upon Mars,
    the great disillusionment.
The string "men" anywhere in the line (even in the middle of a word) will cause a match and the line to be output by grep.


Metacharacters

Metacharacters are reserved characters that mean more than a normal character would:
\ metacharacter quote
^ start of line
. any character
$ end of line
| infix operator (think: "or")
[] character range within the range:
^not (to match a caret, it cannot be first in the range)
-range (to match a dash, it should be last in the range)
.matches a dot
\< \> word boundaries (start and end)
( ) grouping
\w alphanumeric characters
\W non-alphanumeric characters
\b word boundary
\B non-word boundary

To search for a metacharacter as a normal character just prefix it with a backslash.

    grep . demotext
returns 9 lines (since any character will match), but
    grep "\." demotext
returns 4 lines. The quotes are needed since the backslash is a special character for the shell (where it serves the same purpose). By using the character range metacharacters, a range of valid characters will cause the match to succeed.
    grep "in[tf]" demotext
returns:
    perhaps inferior to themselves and ready to welcome a mis-
    intellects vast and cool and unsympathetic, regarded this
This is the same as writing
    egrep "int|inf" demotext
(more on the different flavors of grep later).

Grouping is handled by parentheses. Groups are referenced by \number. Thus

     egrep "(.)(.)\2\1" demotext 
returns all the lines that have words with 2 pair of letters where the second pair is the reverse of the first:
     terrestrial men fancied there might be other men upon Mars,
     intellects vast and cool and unsympathetic, regarded this

Pattern counting

The real power of regular expressions lies in the operators that limit or extend the number of occurrences that will make a pattern match. The operators for that are:
? match once at most
* match zero or more times
+ match one or more times
{n} match exactly n times
{n,} match n or more times
{,m} match at most m times
{n,m} match at least n times, but not more than m times
Thus:
     egrep "\<[0-9]+\>"
would match any number as a word on its own.

grep flavors

There are 3 flavors of grep (this is not counting the C, perl, awk, etc. engines).
fgrep
matches only strings and does no metacharacter operations at all. It is faster than grep or egrep.
grep
handles "basic" expressions where count operations and the more complex metacharacters are not recognized.
egrep
handles the full range of metacharacters and operators.
There is also zgrep which is a wrapper that allows grep to directly read compressed files.

Flags

There are a ton of flags that grep uses. The more common flags are:
-c count lines that match
-i ignore case
-l show filenames that have matches
-n prefix matched lines with a line number
-v invert, show lines that don't match
-- turn off flag processing, needed for patterns with a leading dash

Monty Stein - Sept 23, 1999