CSCI 431: Regular Expressions Using Egrep
- While context-free grammars can be used to describe the tokens (i.e.,
symbols) of a programming language (do you remember why?), regular
grammars offer a more convenient notation for describing their simple
- Regular grammars and regular expressions are equivalent. In this
handout we focus on regular expressions.
- Regular expression are formed using the alphabet of the language
plus the following symbols:
- "·" to concatenate items (juxtaposition is used for the
- "|" to separate alternatives (often "+" is used for the same purpose),
- "*" to indicate that the previous item may be repeated zero or
more times, and
- "(" and ")" for grouping.
Regular Expressions and Egrep
- The UNIX egrep command uses regular expressions to search for text.
- Why look at egrep? Egrep is not only one of the most useful
commands, but also, mastery of egrep opens the gates to
mastery of other tools such as awk, sed and
- egrep basically searches. More precisely,
egrep foo file returns all the lines that contain a string matching the expression "foo" in the file "file".
- We can think of an expression as a string. So egrep
returns all matching lines that contain foo as a substring.
- Another way of using egrep is to have it accept data
STDIN instead of having it search a file. For example,
ls | egrep blah lists all files in the current
directory containing the string "blah"
The Basics: Wildcards for egrep
- Egrep uses regular expressions which, as we know,
go beyond wildcards, but we'll start with wildcards.
- The canonical wildcard character is the dot "." Here is an example :
>egrep 'b.g' file
- Notice that boogy didn't match, since the "." matches exactly one character.
- To match arbitrary strings, use the star (the Kleene star), which
works in the following way:
the expression consisting of a character followed by a star matches any number
(possibly zero) of repetitions of that character. In particular, .* matches
any string, and hence acts as a "wildcard".
The File for These Examples
>egrep 'b.*g' file
>egrep 'b.*g.' file
>egrep 'ggg*' file
The "escape" character
- Now we move on to grouping expressions, in order to find a way of making an
expression to match
- We start with the ? operator.
an expression consisting of a character followed by a question mark
matches one or zero instances of that character.
matches all of the following:
- Now how to group expressions. In our example, we want to make
the string "ederic" following "Fred" optional, we don't just want one
An expression surrounded by parentheses is treated by a single character.
'Fred(eric)? Smith' matches
Fred Smith or
- Note that we have to be careful when our expressions contain white spaces.
When this happens, we need to enclose them in quotes so that the shell does not
mis-interpret the command. So to use our example above, we would need to
egrep 'Fred(eric)? Smith' file
Other useful operators
- To match a selection of characters, use .
'[Hh]ello' matches lines containing
- Ranges of characters are also permitted.
[0-3] is the same as
[a-k] is the same as
[A-C] is the same as
[A-Ca-k] is the same as
- The  may be used to search for non-matches. This is done by
putting a carat ^ as the first character inside the square brackets.
egrep '[^aeiou]' file returns any line containing a
consonant---not very useful.
The Start of the Line and End of the Line
- Suppose you want to search for lines containing a line consisting
of white space, then the word hello, then the end of the line. Let us
start with an example.
>egrep 'hello' file
- This is not what we wanted. So what went wrong ?
- The problem is that egrep searches for lines containing the string
"hello", and all the lines specified contain this. To get around this
problem, we introduce the end and beginning of line characters
The $ character matches the end of the line. The ^ character matches the beginning of the
- Returning to our previous example,
egrep '^[ ]*hello[ ]*$' file
- This does what we want (only returns one line)
- Another example:
egrep '^[^aeiou]*$' file Returns all lines that contain
Matching one of two strings
The expression consisting of two expressions separated by the or operator
matches lines containing either of those two expressions.
- Note that you must enclose this inside single or double quotes.
egrep 'cat|dog' file
matches lines containing the word "cat" or the word "dog"
egrep 'I am a (cat|dog)' matches lines containing the string
"I am a cat" or the string "I am a dog".
- In egrep, the following characters are considered special:
? \ . [ ] ^ $ * ( )
- A closing square bracket loses its special meaning if placed first in a list.
12] matches ] , 1, or 2.
- A carat ^ loses it's special meaning if it is not placed first
- Most special characters lose their meaning inside square brackets
- Single quotes are the safest to use, because they protect your
regular expression from the shell. For example,
egrep "!" file
will often produce an error (since the shell thinks that "!" is referring
to the shell command history) while
egrep '!' file
- When should you use double quotes ?
- The answer is this: if you want to use shell variables, you need
double quotes. For example,
searches file for the name of your home directory, while
egrep "$HOME" file
searches for the string $HOME
egrep '$HOME' file
For another source of information on using egrep see An introduction to
UNIX by Dean Brock