Using sed and awk Together (sed & awk, Second Edition)
2.5. Using sed and awk Together
In UNIX, pipes can be used to pass the output from one program as
input to the next program. Let's look at a few examples that combine
sed and awk to produce a report. The sed script that replaced the
postal abbreviation of a state with its full name is general enough
that it might be used again as a script file named
nameState:
$ cat nameState
s/ CA/, California/
s/ MA/, Massachusetts/
s/ OK/, Oklahoma/
s/ PA/, Pennsylvania/
s/ VA/, Virginia/
Of course, you'd want to handle all states, not just five, and if you
were running it on documents other than mailing lists, you should make
sure that it does not make unwanted replacements.
The output for this program, using the input file
list, is the same as we have already seen. In
the next example, the output produced by nameState
is piped to an awk program that extracts the name of the state from
each record.
$ sed -f nameState list | awk -F, '{ print $4 }'
Massachusetts
Virginia
Oklahoma
Pennsylvania
Massachusetts
Virginia
California
Massachusetts
The awk program is processing the output produced by the sed script.
Remember that the sed script replaces the abbreviation with a comma
and the full name of the state. In effect, it splits the third field
containing the city and state into two fields. "$4" references the
fourth field.
What we are doing here could be done completely in sed, but probably
with more difficulty and less generality. Also, since awk allows you
to replace the string you match, you could achieve this result
entirely with an awk script.
While the result of this program is not very useful, it could
be passed to sort | uniq -c, which would sort the states
into an alphabetical list with a count of the number of occurrences
of each state.
Now we are
going to do something more interesting. We want to produce a report
that sorts the names by state and lists the name of the state followed
by the name of each person residing in that state. The following
example shows the byState program.
#! /bin/sh
awk -F, '{
print $4 ", " $0
}' $* |
sort |
awk -F, '
$1 == LastState {
print "\t" $2
}
$1 != LastState {
LastState = $1
print $1
print "\t" $2
}'
This shell script has three parts. The program invokes awk to produce
input for the sort program and then invokes awk
again to test the sorted input and determine if the name of the state
in the current record is the same as in the previous record. Let's see
the script in action:
$ sed -f nameState list | byState
California
Amy Wilde
Massachusetts
Eric Adams
John Daggett
Sal Carpenter
Oklahoma
Orville Thomas
Pennsylvania
Terry Kalkas
Virginia
Alice Ford
Hubert Sims
The names are sorted by state. This is a typical example of using
awk to generate a report from structured data.
To examine how the byState program works, let's
look at each part separately. It's designed to read input
from the nameState program and expects "$4" to be
the name of the state. Look at the output produced by the first
line of the program:
$ sed -f nameState list | awk -F, '{ print $4 ", " $0 }'
Massachusetts, John Daggett, 341 King Road, Plymouth, Massachusetts
Virginia, Alice Ford, 22 East Broadway, Richmond, Virginia
Oklahoma, Orville Thomas, 11345 Oak Bridge Road, Tulsa, Oklahoma
Pennsylvania, Terry Kalkas, 402 Lans Road, Beaver Falls, Pennsylvania
Massachusetts, Eric Adams, 20 Post Road, Sudbury, Massachusetts
Virginia, Hubert Sims, 328A Brook Road, Roanoke, Virginia
California, Amy Wilde, 334 Bayshore Pkwy, Mountain View, California
Massachusetts, Sal Carpenter, 73 6th Street, Boston, Massachusetts
The sort program, by default, sorts lines in
alphabetical order, looking at characters from left to right. In
order to sort records by state, and not names, we insert the state as
a sort key at the beginning of the record. Now the
sort program can do its work for us. (Notice that
using the sort utility saves us from having to
write sort routines inside awk.)
The second time awk is invoked we perform a programming task. The
script looks at the first field of each record (the state) to
determine if it is the same as in the previous record. If it is not
the same, the name of the state is printed followed by the person's
name. If it is the same, then only the person's name is printed.
$1 == LastState {
print "\t" $2
}
$1 != LastState {
LastState = $1
print $1
print "\t" $2
}'
There are a few significant things here, including
assigning a variable, testing the first field of each
input line to see if it contains a variable string, and printing
a tab to align the output data. Note that we don't have to assign
to a variable before using it (because awk variables are initialized
to the empty string). This is a small script, but you'll see the same
kind of routine used to compare index entries in a much larger
indexing program in Chapter 12, "Full-Featured Applications". However, for now,
don't worry too much about understanding what each statement is
doing. Our point here is to give you an overview of what sed and
awk can do.
In this chapter, we have covered the basic operations of sed and awk.
We have looked at important command-line options and introduced you to
scripting. In the next chapter, we are going to look at regular
expressions, something both programs use to match patterns in the
input.
2.4. Using awk3. Understanding Regular Expression Syntax
Copyright © 2003 O'Reilly & Associates. All rights reserved.
Wyszukiwarka
Podobne podstrony:
ch02ch02ch02 (7)ch02ch02ch02ch02ch02ch02ch02ch02ch02 (17)ch02ch02Ch02 The Fed or Absorptive Statech02ch02ch02ch02 (2)więcej podobnych podstron