Language Summary for awk (sed & awk, Second Edition)
B.2. Language Summary for awk
This section summarizes how awk processes input records and
describes the various syntactic elements that make up an awk program.
B.2.1. Records and Fields
Each line of input is split into fields. By default, the field
delimiter is one or more spaces and/or tabs. You can change the
field separator by using the -F command-line
option. Doing so also sets the value of FS. The
following command-line changes the field separator to a colon:
awk -F: -f awkscr /etc/passwd
You can also assign the delimiter to the system variable FS.
This is typically done in the BEGIN procedure, but can also
be passed as a parameter on the command line.
awk -f awkscr FS=: /etc/passwd
Each input line forms a record containing any number of fields. Each
field can be referenced by its position in the record. "$1" refers to
the value of the first field; "$2" to the second field, and so on.
"$0" refers to the entire record. The following action prints the
first field of each input line:
{ print $1 }
The default record separator is a newline. The following procedure
sets FS and RS so that awk
interprets an input record as any number of lines up to a blank line,
with each line being a separate field.
BEGIN { FS = "\n"; RS = "" }
It is important to know that when RS is set to the
empty string, newline always separates fields, in
addition to whatever value FS may have. This is
discussed in more detail in both The AWK Programming
Language and Effective AWK
Programming.
B.2.2. Format of a Script
An awk script is a set of pattern-matching rules and
actions:
pattern { action }An action is one or more statements that will be performed on those
input lines that match the pattern. If no pattern is specified, the
action is performed for every input line. The following example uses
the print statement to print each line in the input
file:
{ print }
If only a pattern is specified, then the default action consists of
the print statement, as shown above.
Function definitions can also appear:
function name (parameter list) { statements }This syntax defines the function name, making
available the list of parameters for processing in the body of the
function. Variables specified in the parameter-list are treated as
local variables within the function. All other variables are global
and can be accessed outside the function. When calling a user-defined
function, no space is permitted between the name of the function and
the opening parenthesis. Spaces are allowed in the function's
definition. User-defined functions are described in Chapter 9, "Functions".
B.2.2.1. Line termination
A line in an awk script is terminated by a newline or a semicolon.
Using semicolons to put multiple statements on a line, while
permitted, reduces the readability of most programs. Blank lines are
permitted between statements.
Program control statements (do,
if, for, or
while) continue on the next line, where a dependent
statement is listed. If multiple dependent statements are specified,
they must be enclosed within braces.
if (NF > 1) {
name = $1
total += $2
}
You cannot use a semicolon to avoid using braces for multiple
statements.
You can type a single statement over multiple lines by escaping the
newline with a backslash (\). You can also break
lines following any of the following characters:
, { && ||
Gawk also allows you to continue a line after either a "?" or a ":".
Strings cannot be broken across a line (except in gawk,
using "\" followed by a newline).
B.2.2.2. Comments
A comment begins with a "#" and ends with a newline. It can appear on
a line by itself or at the end of a line. Comments are descriptive
remarks that explain the operation of the script. Comments cannot be
continued across lines by ending them with a backslash.
B.2.3. Patterns
A pattern can be any of the following:
/regular expression/
relational expression
BEGIN
END
pattern, pattern
Regular expressions use the extended set of metacharacters and must be
enclosed in slashes. For a full discussion of regular expressions,
see Chapter 3, "Understanding Regular Expression Syntax".
Relational expressions use the relational operators
listed under "Expressions" later in this chapter.The BEGIN pattern is applied before the first line
of input is read and the END pattern is applied
after the last line of input is read.Use ! to negate the match; i.e., to handle
lines not matching the pattern.You can address a range of lines, just as in sed:
pattern, patternPatterns, except BEGIN and END,
can be expressed in compound forms using the following operators:
&&
Logical And
||
Logical Or
Sun's version of nawk (SunOS 4.1.x) does not support treating regular
expressions as parts of a larger Boolean expression. E.g.,
"/cute/ && /sweet/" or "/fast/ || /quick/"
do not work.
In addition the C conditional operator ?:
(pattern ? pattern :
pattern) may be used in a pattern.Patterns can be placed in parentheses to ensure proper evaluation.BEGIN and END patterns must be
associated with actions. If multiple BEGIN and
END rules are written, they are merged into a
single rule before being applied.
B.2.4. Regular Expressions
Table B.1 summarizes the regular expressions
as described in Chapter 3, "Understanding Regular Expression Syntax". The metacharacters are
listed in order of precedence.
Table B.1. Regular Expression Metacharacters
Special
Characters
Usage
c
Matches any literal character c that is not a
metacharacter.
\
Escapes any metacharacter that follows, including itself.
^
Anchors following regular expression to the beginning of string.
$
Anchors preceding regular expression to the end of string.
.
Matches any single character, including newline.
[...]
Matches any one of the class of characters
enclosed between the brackets. A circumflex (^) as the first
character inside brackets reverses the match to all characters except
those listed in the class. A hyphen (-) is used to indicate a range
of characters. The close bracket (]) as the first
character in a class is a member of the class. All other
metacharacters lose their meaning when specified as members of a
class, except \, which can be used to escape ], even if it is not
first.
r1|r2
Between two regular expressions, r1 and
r2, it allows either of the regular
expressions to be matched.
(r1)(r2)
Used for concatenating regular expressions.
r*
Matches any number (including zero) of the regular expression
that immediately precedes it.
r+
Matches one or more occurrences of the preceding regular expression.
r?
Matches 0 or 1 occurrences of the preceding regular expression.
(r)
Used for grouping regular expressions.
Regular expressions can also make use of the escape sequences for
accessing special characters, as defined in Section B.2.5.2 later in this appendix.
Note that ^ and $ work on
strings; they do not match against newlines
embedded in a record or string.
Within a pair of brackets, POSIX allows special notations for
matching non-English characters. They are described in
Table B.2.
Table B.2. POSIX Character List Facilities
Notation
Facility
[.symbol.]
Collating symbols. A collating symbol is a multi-character sequence
that should be treated as a unit.
[=equiv=]
Equivalence classes. An equivalence class lists a set of characters
that should be considered equivalent, such as "e" and "è".
[:class:]
Character classes. Character class keywords describe different
classes of characters such as alphabetic characters, control
characters, and so on.
[:alnum:]
Alphanumeric characters
[:alpha:]
Alphabetic characters
[:blank:]
Space and tab characters
[:cntrl:]
Control characters
[:digit:]
Numeric characters
[:graph:]
Printable and visible (non-space) characters
[:lower:]
Lowercase characters
[:print:]
Printable characters
[:punct:]
Punctuation characters
[:space:]
Whitespace characters
[:upper:]
Uppercase characters
[:xdigit:]
Hexadecimal digits
Note that these facilities (as of this writing) are still not
widely implemented.
B.2.5. Expressions
An expression can be made up of constants, variables, operators and
functions. A constant is a string (any sequence of characters) or a
numeric value. A variable is a symbol that references a value. You
can think of it as a piece of information that retrieves a particular
numeric or string value.
B.2.5.1. Constants
There are two types of constants, string and numeric. A string
constant must be quoted while a numeric constant is not.
B.2.5.2. Escape sequences
The escape sequences described in Table B.3
can be used in strings and regular expressions.
Table B.3. Escape Sequences
Sequence
Description
\a
Alert character, usually ASCII BEL character
\b
Backspace
\f
Formfeed
\n
Newline
\r
Carriage return
\t
Horizontal tab
\v
Vertical tab
\ddd
Character represented as 1 to 3 digit octal value
\xhex
Character represented as hexadecimal value[91]
\c
Any literal character c (e.g.,
\" for ")[92]
[91]POSIX does not provide "\x", but it is commonly available.
[92]Like ANSI C, POSIX leaves it purposely undefined what you get when
you put a backslash before any character not listed in the table.
In most awks, you just get that character.
B.2.5.3. Variables
There are three kinds of variables: user-defined, built-in, and
fields. By convention, the names of built-in or system variables
consist of all capital letters.
The name of a variable cannot start with a digit.
Otherwise, it consists of letters, digits, and underscores.
Case is significant in variable names.
A variable does not need to be declared or initialized. A variable
can contain either a string or numeric value. An uninitialized
variable has the empty string ("") as its string value and 0
as its numeric value. Awk attempts to decide whether a value should
be processed as a string or a number depending upon the operation.
The assignment of a variable has the form:
var = exprIt assigns the value of the expression to
var. The following expression assigns a
value of 1 to the variable x.
x = 1
The name of the variable is used to reference the value:
{ print x }
prints the value of the variable x. In this case,
it would be 1.
See the later Section 2.2.5.5 for information on
built-in variables. A field variable is referenced using
$n, where
n is any number 0 to NF,
that references the field by position. It can be supplied by a
variable, such as $NF meaning the last field, or
constant, such as $1 meaning the first field.
B.2.5.4. Arrays
An array is a variable that can be used to store a set of values. The
following statement assigns a value to an element of an array:
array[index] = valueIn awk, all arrays are associative arrays. What
makes an associative array unique is that its index can be a string or
a number.
An associative array makes an "association" between the indices and
the elements of an array. For each element of the array, a pair of
values is maintained: the index of the element and the value of the
element. The elements are not stored in any particular order as in a
conventional array.
You can use the special for loop to read all the
elements of an associative array.
for (item in array)The index of the array is available as
item, while the value of an element of the
array can be referenced as
array[item].
You can use the operator in to test that an element
exists by testing to see if its index exists.
if (index in array)tests that
array[index]
exists, but you cannot use it to test the value of the element
referenced by
array[index].
You can also delete individual elements of the array using the
delete statement.
B.2.5.5. System variables
Awk defines a number of special variables that can be referenced or
reset inside a program, as shown in Table B.4 (defaults are listed in parentheses).
Table B.4. Awk System Variables
Variable
Description
ARGC
Number of arguments on command line
ARGV
An array containing the command-line arguments
CONVFMT
String conversion format for numbers (%.6g). (POSIX)
ENVIRON
An associative array of environment variables
FILENAME
Current filename
FNR
Like NR, but relative to the current file
FS
Field separator (a blank)
NF
Number of fields in current record
NR
Number of the current record
OFMT
Output format for numbers (%.6g)
OFS
Output field separator (a blank)
ORS
Output record separator (a newline)
RLENGTH
Length of the string matched by match() function
RS
Record separator (a newline)
RSTART
First position in the string matched by match() function
SUBSEP
Separator character for array subscripts (\034)
B.2.5.6. Operators
Table B.5 lists the operators
in the order of precedence (low to high) that are available in awk.
Table B.5. Operators
Operators
Description
= += -= *= /= %= ^= **=
Assignment
?:
C conditional expression
||
Logical OR
&&
Logical AND
~ !~
Match regular expression and negation
< <= > >= != ==
Relational operators
(blank)
Concatenation
+ -
Addition, subtraction
* / %
Multiplication, division, and modulus
+ - !
Unary plus and minus, and logical negation
^ **
Exponentiation
++ --
Increment and decrement, either prefix or postfix
$
Field reference
NOTE:
While "**" and "**=" are common extensions, they are not
part of POSIX awk.
B.2.6. Statements and Functions
An action is enclosed in braces and consists of one or more statements
and/or expressions. The difference between a statement and a function
is that a function returns a value, and its argument list is specified
within parentheses. (The formal syntactical difference does not always
hold true: printf is considered a statement, but
its argument list can be put in parentheses;
getline is a function that does not use
parentheses.)
Awk has a number of predefined arithmetic and string functions. A
function is typically called as follows:
return = function(arg1,arg2)where return is a variable created to hold
what the function returns. (In fact, the return value of a function
can be used anywhere in an expression, not just on the right-hand side
of an assignment.) Arguments to a function are specified as a
comma-separated list. The left parenthesis follows after the name of
the function. (With built-in functions, a space is permitted between
the function name and the parentheses.)
B. Quick Reference
for awkB.3. Command Summary for awk
Copyright © 2003 O'Reilly & Associates. All rights reserved.
Wyszukiwarka
Podobne podstrony:
appbappbappbappbappbappb (8)appbappbappbappbappb (4)appb (10)appbappbappbappbappbAPPBwięcej podobnych podstron