awk Command
Purpose
Finds lines in files that match a pattern and performs specified actions on those lines.
Syntax
awk [ -u ] [ -F Ere ] [ -v Assignment ] ... { -f ProgramFile | 'Program' } [ [ File ... | Assignment ... ] ]
...
Description
The awk command utilizes a set of user-supplied instructions to compare a set of files, one line at a time, to
extended regular expressions supplied by the user. Then actions are performed upon any line that matches the
extended regular expressions.
The pattern searching of the awk command is more general than that of the grep command, and it allows the user
to perform multiple actions on input text lines. The awk command programming language requires no compiling, and
allows the user to use variables, numeric functions, string functions, and logical operators.
The awk command is affected by the LANG, LC_ALL, LC_COLLATE, LC_CTYPE, LC_MESSAGES, LC_NUMERIC, NLSPATH, and
PATH environment variables.
The following topics are covered in this article:
* Input for the awk Command
* Output for the awk Command
* File Processing with Records and Fields
* The awk Command Programming Language
* Patterns
* Actions
* Variables
* Special Variables
* Flags
* Examples
Input for the awk Command
The awk command takes two types of input: input text files and program instructions.
Input Text Files
Searching and actions are performed on input text files. The files are specified by:
* Specifying the File variable on the command line.
* Modifying the special variables ARGV and ARGC.
* Providing standard input in the absence of the File variable.
If multiple files are specified with the File variable, the files are processed in the order specified.
Program Instructions
Instructions provided by the user control the actions of the awk command. These instructions come from either
the `Program' variable on the command line or from a file specified by the -f flag together with the ProgramFile
variable. If multiple program files are specified, the files are concatenated in the order specified and the
resultant order of instructions is used.
Output for the awk Command
The awk command produces three types of output from the data within the input text file:
* Selected data can be printed to standard output, without alteration to the input file.
* Selected portions of the input file can be altered.
* Selected data can be altered and printed to standard output, with or without altering the contents of the
input file.
All of these types of output can be performed on the same file. The programming language recognized by the awk
command allows the user to redirect output.
File Processing with Records and Fields
Files are processed in the following way:
1 The awk command scans its instructions and executes any actions specified to occur before the input file is
read.
The BEGIN statement in the awk programming language allows the user to specify a set of instructions to be
done before the first record is read. This is particularly useful for initializing special variables.
2 One record is read from the input file.
A record is a set of data separated by a record separator. The default value for the record separator is
the new-line character, which makes each line in the file a separate record. The record separator can be
changed by setting the RS special variable.
3 The record is compared against each pattern specified by the awk command's instructions.
The command instructions can specify that a specific field within the record be compared. By default,
fields are separated by white space (blanks or tabs). Each field is referred to by a field variable. The
first field in a record is assigned the $1 variable, the second field is assigned the $2 variable, and so
forth. The entire record is assigned to the $0 variable. The field separator can be changed by using the -F
flag on the command line or by setting the FS special variable. The FS special variable can be set to the
values of: blank, single character, or extended regular expression.
4 If the record matches a pattern, any actions associated with that pattern are performed on the record.
5 After the record is compared to each pattern, and all specified actions are performed, the next record is
read from input; the process is repeated until all records are read from the input file.
6 If multiple input files have been specified, the next file is then opened and the process repeated until
all input files have been read.
7 After the last record in the last file is read, the awk command executes any instructions specified to
occur after the input processing.
The END statement in the awk programming language allows the user to specify actions to be performed after
the last record is read. This is particularly useful for sending messages about what work was accomplished
by the awk command.
The awk Command Programming Language
The awk command programming language consists of statements in the form:
Pattern { Action }
If a record matches the specified pattern, or contains a field which matches the pattern, the associated action
is then performed. A pattern can be specified without an action, in which case the entire line containing the
pattern is written to standard output. An action specified without a pattern is performed for every input
record.
Patterns
There are four types of patterns used in the awk command language syntax:
* Regular Expressions
* Relational Expressions
* Combinations of Patterns
* BEGIN and END Patterns.
Regular Expressions
The extended regular expressions used by the awk command are similar to those used by the grep or egrep command.
The simplest form of an extended regular expression is a string of characters enclosed in slashes. For an
example, suppose a file named testfile had the following contents:
smawley, andy
smiley, allen
smith, alan
smithern, harry
smithhern, anne
smitters, alexis
Entering the following command line:
awk '/smi/' testfile
would print to standard output of all records that contained an occurrence of the string smi. In this example,
the program '/smi/' for the awk command is a pattern with no action. The output is:
smiley, allen
smith, alan
smithern, harry
smithhern, anne
smitters, alexis
The following special characters are used to form extended regular expressions:
Character
Function
+
Specifies that a string matches if one or more occurrences of the character or extended regular expression
that precedes the + (plus) are within the string. The command line:
awk '/smith+ern/' testfile
prints to standard output any record that contained a string with the characters smit, followed by one or
more h characters, and then ending with the characters ern. The output in this example is:
smithern, harry
smithhern, anne
?
Specifies that a string matches if zero or one occurrences of the character or extended regular expression
that precedes the ? (question mark) are within the string. The command line:
awk '/smith?/' testfile
prints to standard output of all records that contain the characters smit, followed by zero or one instance
of the h character. The output in this example is:
smith, alan
smithern, harry
smithhern, anne
smitters, alexis
|
Specifies that a string matches if either of the strings separated by the | (vertical line) are within the
string. The command line:
awk '/allen
|
alan /' testfile
prints to standard output of all records that contained the string allen or alan. The output in this
example is:
smiley, allen
smith, alan
( )
Groups strings together in regular expressions. The command line:
awk '/a(ll)?(nn)?e/' testfile
prints to standard output of all records with the string ae or alle or anne or allnne. The output in this
example is:
smiley, allen
smithhern, anne
{m}
Specifies that a string matches if exactly m occurrences of the pattern are within the string. The command
line:
awk '/l{2}/' testfile
prints to standard output
smiley, allen
{m,}
Specifies that a string matches if at least m occurrences of the pattern are within the string. The command
line:
awk '/t{2,}/' testfile
prints to standard output:
smitters, alexis
{m, n}
Specifies that a string matches if between m and n, inclusive, occurrences of the pattern are within the
string ( where m <= n). The command line:
awk '/er{1, 2}/' testfile
prints to standard output:
smithern, harry
smithern, anne
smitters, alexis
[String]
Signifies that the regular expression matches any characters specified by the String variable within the
square brackets. The command line:
awk '/sm[a-h]/' testfile
prints to standard output of all records with the characters sm followed by any character in alphabetical
order from a to h. The output in this example is:
smawley, andy
[^ String]
A ^ (caret) within the [ ] (square brackets) and at the beginning of the specified string indicates that
the regular expression does not match any characters within the square brackets. Thus, the command line:
awk '/sm[^a-h]/' testfile
prints to standard output:
smiley, allen
smith, alan
smithern, harry
smithhern, anne
smitters, alexis
~,!~
Signifies a conditional statement that a specified variable matches (tilde) or does not match (tilde,
exclamation point) the regular expression. The command line:
awk '$1 ~ /n/' testfile
prints to standard output of all records whose first field contained the character n. The output in this
example is:
smithern, harry
smithhern, anne
^
Signifies the beginning of a field or record. The command line:
awk '$2 ~ /^h/' testfile
prints to standard output of all records with the character h as the first character of the second field.
The output in this example is:
smithern, harry
$
Signifies the end of a field or record. The command line:
awk '$2 ~ /y$/' testfile
prints to standard output of all records with the character y as the last character of the second field.
The output in this example is:
smawley, andy
smithern, harry
. (period)
Signifies any one character except the terminal new-line character at the end of a space. The command line:
awk '/a..e/' testfile
prints to standard output of all records with the characters a and e separated by two characters. The
output in this example is:
smawley, andy
smiley, allen
smithhern, anne
*(asterisk)
Signifies zero or more of any characters. The command line:
awk '/a.*e/' testfile
prints to standard output of all records with the characters a and e separated by zero or more characters.
The output in this example is:
smawley, andy
smiley, allen
smithhern, anne
smitters, alexis
\ (backslash)
The escape character. When preceding any of the characters that have special meaning in extended regular
expressions, the escape character removes any special meaning for the character. For example, the command
line:
/a\/\//
would match the pattern a //, since the backslashes negate the usual meaning of the slash as a delimiter of
the regular expression. To specify the backslash itself as a character, use a double backslash. See the
following item on escape sequences for more information on the backslash and its uses.
Recognized Escape Sequences
The awk command recognizes most of the escape sequences used in C language conventions, as well as several that
are used as special characters by the awk command itself. The escape sequences are:
Escape Sequence
Character Represented
\"
\" (double-quotation) mark
\/
/ (slash) character
\ddd
Character whose encoding is represented by a one-, two- or three-digit octal integer, where d represents an
octal digit
\\
\ (backslash) character
\a
Alert character
\b
Backspace character
\f
Form-feed character
\n
New-line character (see following note)
\r
Carriage-return character
\t
Tab character
\v
Vertical tab.
Note: Except in the gsub, match, split, and sub built-in functions, the matching of extended regular
expressions is based on input records. Record-separator characters (the new-line character by default)
cannot be embedded in the expression, and no expression matches the record-separator character. If the
record separator is not the new-line character, then the new-line character can be matched. In the four
built-in functions specified, matching is based on text strings, and any character (including the record
separator) can be embedded in the pattern so that the pattern matches the appropriate character. However,
in all regular-expression matching with the awk command, the use of one or more NULL characters in the
pattern produces undefined results.
Relational Expressions
The relational operators < (less than), > (greater than), <= (less than or equal to), >= (greater than or equal
to), = = (equal to), and ! = (not equal to) can be used to form patterns. For example, the pattern:
$1 < $4
matches records where the first field is less than the fourth field. The relational operators also work with
string values. For example:
$1 =! "q"
matches all records where the first field is not a q. String values can also be matched on collation values. For
example:
$1 >= "d"
matches all records where the first field starts with a character that is a, b, c, or d. If no other information
is given, field variables are compared as string values.
Combinations of Patterns
Patterns can be combined using three options:
* Ranges are specified by two patterns separated with a , (comma). Actions are performed on every record
starting with the record that matches the first pattern, and continuing through and including the record
that matches the second pattern. For example:
/begin/,/end/
matches the record containing the string begin, and every record between it and the record containing the
string end, including the record containing the string end.
* Parentheses ( ) group patterns together.
* The boolean operators || (or), && (and), and ! (not) combine patterns into expressions that match if they
evaluate true, otherwise they do not match. For example, the pattern:
$1 == "al" && $2 == "123"
matches records where the first field is al and the second field is 123.
BEGIN and END Patterns
Actions specified with the BEGIN pattern are performed before any input is read. Actions specified with the END
pattern are performed after all input has been read. Multiple BEGIN and END patterns are allowed and processed
in the order specified. An END pattern can precede a BEGIN pattern within the program statements. If a program
consists only of BEGIN statements, the actions are performed and no input is read. If a program consists only of
END statements, all the input is read prior to any actions being taken.
Actions
There are several types of action statements:
* Action Statements
* Built-in Functions
* User-Defined Functions
* Conditional Statements
* Output Actions
Action Statements
Action statements are enclosed in { } (braces). If the statements are specified without a pattern, they are
performed on every record. Multiple actions can be specified within the braces, but must be separated by new-
line characters or ; (semicolons), and the statements are processed in the order they appear. Action statements
include:
Arithmetical Statements
The mathematical operators + (plus), - (minus), / (division), ^ (exponentiation), * (multiplication), %
(modulus) are used in the form:
Expression Operator Expression
Thus, the statement:
$2 = $1 ^ 3
assigns the value of the first field raised to the third power to the second field.
Unary Statements
The unary - (minus) and unary + (plus) operate as in the C programming language:
+Expression or -Expression
Increment and Decrement Statements
The pre-increment and pre-decrement statements operate as in the C programming language:
++Variable or --Variable
The post-increment and post-decrement statements operate as in the C programming language:
Variable++ or Variable--
Assignment Statements
The assignment operators += (addition), -= (subtraction), /= (division), and *= (multiplication) operate as in
the C programming language, with the form:
Variable += Expression
Variable -= Expression
Variable /= Expression
Variable *= Expression
For example, the statement:
$1 *= $2
multiplies the field variable $1 by the field variable $2 and then assigns the new value to $1.
The assignment operators ^= (exponentiation) and %= (modulus) have the form:
Variable1^=Expression1
AND
Variable2%=Expression2
and they are equivalent to the C programming language statements:
Variable1=pow(Variable1, Expression1)
AND
Variable2=fmod(Variable2, Expression2)
where pow is the pow subroutine and fmod is the fmod subroutine.
String Concatenation Statements
String values can be concatenated by stating them side by side. For example:
$3 = $1 $2
assigns the concatenation of the strings in the field variables $1 and $2 to the field variable $3.
Built-In Functions
The awk command language uses arithmetic functions, string functions, and general functions. The close
Subroutine statement is necessary if you intend to write a file, then read it later in the same program.
Arithmetic Functions
The following arithmetic functions perform the same actions as the C language subroutines by the same name:
atan2( y, x )
Returns arctangent of y/x.
cos( x )
Returns cosine of x; x is in radians.
sin( x )
Returns sin of x; x is in radians.
exp( x )
Returns the exponential function of x.
log( x )
Returns the natural logarithm of x.
sqrt( x )
Returns the square root of x.
int( x )
Returns the value of x truncated to an integer.
rand( )
Returns a random number n, with 0 <= n < 1.
srand( [Expr] )
Sets the seed value for the rand function to the value of the Expr parameter, or use the time of day if the
Expr parameter is omitted. The previous seed value is returned.