Quantcast
Channel: CSDN博客推荐文章
Viewing all articles
Browse latest Browse all 35570

LINUX AWK TUTORIALS : A PERFECT ONE

$
0
0


awk Command

Purpose

       Finds lines in files that match a pattern and performs specified actions on those lines.

Syntax

       awk [ -u  ] [ -F Ere ] [ -v Assignment ] ... { -f ProgramFile | 'Program' } [ [ File ... | Assignment ... ] ]
       ...

Description

       The awk command utilizes a set of user-supplied instructions to compare a set of files, one line at a time, to
       extended regular expressions supplied by the user. Then actions are performed upon any line that matches the
       extended regular expressions.

       The pattern searching of the awk command is more general than that of the grep command, and it allows the user
       to perform multiple actions on input text lines. The awk command programming language requires no compiling, and
       allows the user to use variables, numeric functions, string functions, and logical operators.

       The awk command is affected by the LANG, LC_ALL, LC_COLLATE, LC_CTYPE, LC_MESSAGES, LC_NUMERIC, NLSPATH, and
       PATH environment variables.

       The following topics are covered in this article:
       *    Input for the awk Command
       *    Output for the awk Command
       *    File Processing with Records and Fields
       *    The awk Command Programming Language
              *    Patterns
              *    Actions
              *    Variables
              *    Special Variables
       *    Flags
       *    Examples

Input for the awk Command

       The awk command takes two types of input: input text files and program instructions.

Input Text Files

       Searching and actions are performed on input text files. The files are specified by:
       *    Specifying the File variable on the command line.
       *    Modifying the special variables ARGV and ARGC.
       *    Providing standard input in the absence of the File variable.

       If multiple files are specified with the File variable, the files are processed in the order specified.

Program Instructions

       Instructions provided by the user control the actions of the awk command. These instructions come from either
       the `Program' variable on the command line or from a file specified by the -f flag together with the ProgramFile
       variable. If multiple program files are specified, the files are concatenated in the order specified and the
       resultant order of instructions is used.

Output for the awk Command

       The awk command produces three types of output from the data within the input text file:

       *    Selected data can be printed to standard output, without alteration to the input file.
       *    Selected portions of the input file can be altered.
       *    Selected data can be altered and printed to standard output, with or without altering the contents of the
            input file.

       All of these types of output can be performed on the same file. The programming language recognized by the awk
       command allows the user to redirect output.

File Processing with Records and Fields

       Files are processed in the following way:
       1    The awk command scans its instructions and executes any actions specified to occur before the input file is
            read.

            The BEGIN statement in the awk programming language allows the user to specify a set of instructions to be
            done before the first record is read. This is particularly useful for initializing special variables.
       2    One record is read from the input file.

            A record is a set of data separated by a record separator. The default value for the record separator is
            the new-line character, which makes each line in the file a separate record. The record separator can be
            changed by setting the RS special variable.
       3    The record is compared against each pattern specified by the awk command's instructions.

            The command instructions can specify that a specific field within the record be compared. By default,
            fields are separated by white space (blanks or tabs). Each field is referred to by a field variable. The
            first field in a record is assigned the $1 variable, the second field is assigned the $2 variable, and so
            forth. The entire record is assigned to the $0 variable. The field separator can be changed by using the -F
            flag on the command line or by setting the FS special variable. The FS special variable can be set to the
            values of: blank, single character, or extended regular expression.
       4    If the record matches a pattern, any actions associated with that pattern are performed on the record.
       5    After the record is compared to each pattern, and all specified actions are performed, the next record is
            read from input; the process is repeated until all records are read from the input file.
       6    If multiple input files have been specified, the next file is then opened and the process repeated until
            all input files have been read.
       7    After the last record in the last file is read, the awk command executes any instructions specified to
            occur after the input processing.

            The END statement in the awk programming language allows the user to specify actions to be performed after
            the last record is read. This is particularly useful for sending messages about what work was accomplished
            by the awk command.

The awk Command Programming Language

       The awk command programming language consists of statements in the form:

       Pattern { Action }

       If a record matches the specified pattern, or contains a field which matches the pattern, the associated action
       is then performed. A pattern can be specified without an action, in which case the entire line containing the
       pattern is written to standard output. An action specified without a pattern is performed for every input
       record.

Patterns

       There are four types of patterns used in the awk command language syntax:
       *    Regular Expressions
       *    Relational Expressions
       *    Combinations of Patterns

       *    BEGIN and END Patterns.

Regular Expressions

       The extended regular expressions used by the awk command are similar to those used by the grep or egrep command.
       The simplest form of an extended regular expression is a string of characters enclosed in slashes. For an
       example, suppose a file named testfile had the following contents:

       smawley, andy
       smiley, allen
       smith, alan
       smithern, harry
       smithhern, anne
       smitters, alexis

       Entering the following command line:

       awk '/smi/' testfile

       would print to standard output of all records that contained an occurrence of the string smi. In this example,
       the program '/smi/' for the awk command is a pattern with no action. The output is:

       smiley, allen
       smith, alan
       smithern, harry
       smithhern, anne
       smitters, alexis

       The following special characters are used to form extended regular expressions:
       Character
            Function
       +
            Specifies that a string matches if one or more occurrences of the character or extended regular expression
            that precedes the + (plus) are within the string. The command line:

            awk '/smith+ern/' testfile

            prints to standard output any record that contained a string with the characters smit, followed by one or
            more h characters, and then ending with the characters ern. The output in this example is:

            smithern, harry
            smithhern, anne
       ?
            Specifies that a string matches if zero or one occurrences of the character or extended regular expression
            that precedes the ? (question mark) are within the string. The command line:

            awk '/smith?/' testfile

            prints to standard output of all records that contain the characters smit, followed by zero or one instance
            of the h character. The output in this example is:

            smith, alan
            smithern, harry
            smithhern, anne
            smitters, alexis
       |
            Specifies that a string matches if either of the strings separated by the | (vertical line) are within the
            string. The command line:

            awk '/allen
            |
            alan /' testfile

            prints to standard output of all records that contained the string allen or alan. The output in this
            example is:

            smiley, allen
            smith, alan
       ( )
            Groups strings together in regular expressions. The command line:

            awk '/a(ll)?(nn)?e/' testfile

            prints to standard output of all records with the string ae or alle or anne or allnne. The output in this
            example is:

            smiley, allen
            smithhern, anne
       {m}
            Specifies that a string matches if exactly m occurrences of the pattern are within the string. The command
            line:

            awk '/l{2}/' testfile

            prints to standard output

            smiley, allen
       {m,}
            Specifies that a string matches if at least m occurrences of the pattern are within the string. The command
            line:

            awk '/t{2,}/' testfile

            prints to standard output:

            smitters, alexis
       {m, n}
            Specifies that a string matches if between m and n, inclusive, occurrences of the pattern are within the
            string ( where m <= n). The command line:

            awk '/er{1, 2}/' testfile

            prints to standard output:

            smithern, harry
            smithern, anne
            smitters, alexis
       [String]
            Signifies that the regular expression matches any characters specified by the String variable within the
            square brackets. The command line:

            awk '/sm[a-h]/' testfile

            prints to standard output of all records with the characters sm followed by any character in alphabetical
            order from a to h. The output in this example is:

            smawley, andy

       [^ String]
            A ^ (caret) within the [ ] (square brackets) and at the beginning of the specified string indicates that
            the regular expression does not match any characters within the square brackets. Thus, the command line:

            awk '/sm[^a-h]/' testfile

            prints to standard output:

            smiley, allen
            smith, alan
            smithern, harry
            smithhern, anne
            smitters, alexis
       ~,!~
            Signifies a conditional statement that a specified variable matches (tilde) or does not match (tilde,
            exclamation point) the regular expression. The command line:

            awk '$1 ~ /n/' testfile

            prints to standard output of all records whose first field contained the character n. The output in this
            example is:

            smithern, harry
            smithhern, anne
       ^
            Signifies the beginning of a field or record. The command line:

            awk '$2 ~ /^h/' testfile

            prints to standard output of all records with the character h as the first character of the second field.
            The output in this example is:

            smithern, harry
       $
            Signifies the end of a field or record. The command line:

            awk '$2 ~ /y$/' testfile

            prints to standard output of all records with the character y as the last character of the second field.
            The output in this example is:

            smawley, andy
            smithern, harry
       . (period)
            Signifies any one character except the terminal new-line character at the end of a space. The command line:

            awk '/a..e/' testfile

            prints to standard output of all records with the characters a and e separated by two characters. The
            output in this example is:

            smawley, andy
            smiley, allen
            smithhern, anne
       *(asterisk)
            Signifies zero or more of any characters. The command line:

            awk '/a.*e/' testfile

            prints to standard output of all records with the characters a and e separated by zero or more characters.
            The output in this example is:

            smawley, andy
            smiley, allen
            smithhern, anne
            smitters, alexis
       \ (backslash)
            The escape character. When preceding any of the characters that have special meaning in extended regular
            expressions, the escape character removes any special meaning for the character. For example, the command
            line:

            /a\/\//

            would match the pattern a //, since the backslashes negate the usual meaning of the slash as a delimiter of
            the regular expression. To specify the backslash itself as a character, use a double backslash. See the
            following item on escape sequences for more information on the backslash and its uses.

Recognized Escape Sequences

       The awk command recognizes most of the escape sequences used in C language conventions, as well as several that
       are used as special characters by the awk command itself. The escape sequences are:
       Escape Sequence
            Character Represented
       \"
            \" (double-quotation) mark
       \/
            / (slash) character
       \ddd
            Character whose encoding is represented by a one-, two- or three-digit octal integer, where d represents an
            octal digit
       \\
            \ (backslash) character
       \a
            Alert character
       \b
            Backspace character
       \f
            Form-feed character
       \n
            New-line character (see following note)
       \r
            Carriage-return character
       \t
            Tab character
       \v
            Vertical tab.

            Note: Except in the gsub, match, split, and sub built-in functions, the matching of extended regular
            expressions is based on input records. Record-separator characters (the new-line character by default)
            cannot be embedded in the expression, and no expression matches the record-separator character. If the
            record separator is not the new-line character, then the new-line character can be matched. In the four
            built-in functions specified, matching is based on text strings, and any character (including the record
            separator) can be embedded in the pattern so that the pattern matches the appropriate character. However,
            in all regular-expression matching with the awk command, the use of one or more NULL characters in the
            pattern produces undefined results.

Relational Expressions

       The relational operators < (less than), > (greater than), <= (less than or equal to), >= (greater than or equal
       to), = = (equal to), and ! = (not equal to) can be used to form patterns. For example, the pattern:

       $1 < $4

       matches records where the first field is less than the fourth field. The relational operators also work with
       string values. For example:

       $1 =! "q"

       matches all records where the first field is not a q. String values can also be matched on collation values. For
       example:

       $1 >= "d"

       matches all records where the first field starts with a character that is a, b, c, or d. If no other information
       is given, field variables are compared as string values.

Combinations of Patterns

       Patterns can be combined using three options:
       *    Ranges are specified by two patterns separated with a , (comma). Actions are performed on every record
            starting with the record that matches the first pattern, and continuing through and including the record
            that matches the second pattern. For example:

            /begin/,/end/

            matches the record containing the string begin, and every record between it and the record containing the
            string end, including the record containing the string end.
       *    Parentheses ( ) group patterns together.
       *    The boolean operators || (or), && (and), and ! (not) combine patterns into expressions that match if they
            evaluate true, otherwise they do not match. For example, the pattern:

            $1 == "al" && $2 == "123"

            matches records where the first field is al and the second field is 123.

BEGIN and END Patterns

       Actions specified with the BEGIN pattern are performed before any input is read. Actions specified with the END
       pattern are performed after all input has been read. Multiple BEGIN and END patterns are allowed and processed
       in the order specified. An END pattern can precede a BEGIN pattern within the program statements. If a program
       consists only of BEGIN statements, the actions are performed and no input is read. If a program consists only of
       END statements, all the input is read prior to any actions being taken.

Actions

       There are several types of action statements:
       *    Action Statements
       *    Built-in Functions
       *    User-Defined Functions
       *    Conditional Statements
       *    Output Actions

Action Statements

       Action statements are enclosed in { } (braces). If the statements are specified without a pattern, they are
       performed on every record. Multiple actions can be specified within the braces, but must be separated by new-
       line characters or ; (semicolons), and the statements are processed in the order they appear. Action statements

       include:
       Arithmetical Statements
       The mathematical operators + (plus), - (minus), / (division), ^ (exponentiation), * (multiplication), %
       (modulus) are used in the form:

       Expression Operator Expression

       Thus, the statement:

       $2 = $1 ^ 3

       assigns the value of the first field raised to the third power to the second field.
       Unary Statements
       The unary - (minus) and unary + (plus) operate as in the C programming language:

       +Expression or -Expression
       Increment and Decrement Statements
       The pre-increment and pre-decrement statements operate as in the C programming language:

       ++Variable or --Variable

       The post-increment and post-decrement statements operate as in the C programming language:

       Variable++ or Variable--
       Assignment Statements
       The assignment operators += (addition), -= (subtraction), /= (division), and *= (multiplication) operate as in
       the C programming language, with the form:

       Variable += Expression

       Variable -= Expression

       Variable /= Expression

       Variable *= Expression

       For example, the statement:

       $1 *= $2

       multiplies the field variable $1 by the field variable $2 and then assigns the new value to $1.

       The assignment operators ^= (exponentiation) and %= (modulus) have the form:

       Variable1^=Expression1

       AND

       Variable2%=Expression2

       and they are equivalent to the C programming language statements:

       Variable1=pow(Variable1, Expression1)

       AND

       Variable2=fmod(Variable2, Expression2)

       where pow is the pow subroutine and fmod is the fmod subroutine.

       String Concatenation Statements
       String values can be concatenated by stating them side by side. For example:

       $3 = $1 $2

       assigns the concatenation of the strings in the field variables $1 and $2 to the field variable $3.
Built-In Functions

       The awk command language uses arithmetic functions, string functions, and general functions. The close
       Subroutine statement is necessary if you intend to write a file, then read it later in the same program.

Arithmetic Functions

       The following arithmetic functions perform the same actions as the C language subroutines by the same name:
       atan2( y, x )
            Returns arctangent of y/x.
       cos( x )
            Returns cosine of x; x is in radians.
       sin( x )
            Returns sin of x; x is in radians.
       exp( x )
            Returns the exponential function of x.
       log( x )
            Returns the natural logarithm of x.
       sqrt( x )
            Returns the square root of x.
       int( x )
            Returns the value of x truncated to an integer.
       rand( )
            Returns a random number n, with 0 <= n < 1.
       srand( [Expr] )
            Sets the seed value for the rand function to the value of the Expr parameter, or use the time of day if the
            Expr parameter is omitted. The previous seed value is returned.

作者:universsky 发表于2013-5-15 10:35:49 原文链接
阅读:46 评论:0 查看评论

Viewing all articles
Browse latest Browse all 35570

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>