Quantcast
Channel: CSDN博客推荐文章
Viewing all articles
Browse latest Browse all 35570

AWK ---- (CONTINUED )

$
0
0

String Functions

       The string functions are:
       gsub( Ere, Repl, [ In ] )
            Performs exactly as the sub function, except that all occurrences of the regular expression are replaced.
       sub( Ere, Repl, [ In ] )
            Replaces the first occurrence of the extended regular expression specified by the Ere parameter in the
            string specified by the In parameter with the string specified by the Repl parameter. The sub function
            returns the number of substitutions. An & (ampersand) appearing in the string specified by the Repl
            parameter is replaced by the string in the In parameter that matches the extended regular expression
            specified by the Ere parameter. If no In parameter is specified, the default value is the entire record (
            the $0 record variable).
       index( String1, String2 )
            Returns the position, numbering from 1, within the string specified by the String1 parameter where the
            string specified by the String2 parameter occurs. If the String2 parameter does not occur in the String1
            parameter, a 0 (zero) is returned.
       length [(String)]
            Returns the length, in characters, of the string specified by the String parameter. If no String parameter
            is given, the length of the entire record (the $0 record variable) is returned.
       blength [(String)]
            Returns the length, in bytes, of the string specified by the String parameter. If no String parameter is
            given, the length of the entire record (the $0 record variable) is returned.
       substr( String, M, [ N ] )
            Returns a substring with the number of characters specified by the N parameter. The substring is taken from
            the string specified by the String parameter, starting with the character in the position specified by the

            M parameter. The M parameter is specified with the first character in the String parameter as number 1. If
            the N parameter is not specified, the length of the substring will be from the position specified by the M
            parameter until the end of the String parameter.
       match( String, Ere )
            Returns the position, in characters, numbering from 1, in the string specified by the String parameter
            where the extended regular expression specified by the Ere parameter occurs, or else returns a 0 (zero) if
            the Ere parameter does not occur. The RSTART special variable is set to the return value. The RLENGTH
            special variable is set to the length of the matched string, or to -1 (negative one) if no match is found.
       split( String, A, [Ere] )
            Splits the string specified by the String parameter into array elements A[1], A[2], . . ., A[n], and
            returns the value of the n variable. The separation is done with the extended regular expression specified
            by the Ere parameter or with the current field separator (the FS special variable) if the Ere parameter is
            not given. The elements in the A array are created with string values, unless context indicates a
            particular element should also have a numeric value.
       tolower( String )
            Returns the string specified by the String parameter, with each uppercase character in the string changed
            to lowercase. The uppercase and lowercase mapping is defined by the LC_CTYPE category of the current
            locale.
       toupper( String )
            Returns the string specified by the String parameter, with each lowercase character in the string changed
            to uppercase. The uppercase and lowercase mapping is defined by the LC_CTYPE category of the current
            locale.
       sprintf(Format, Expr, Expr, . . . )
            Formats the expressions specified by the Expr parameters according to the printf subroutine format string
            specified by the Format parameter and returns the resulting string.

General Functions

       The general functions are:
       close( Expression )
            Close the file or pipe opened by a print or printf statement or a call to the getline function with the
            same string-valued Expression parameter. If the file or pipe is successfully closed, a 0 is returned;
            otherwise a non-zero value is returned. The close statement is necessary if you intend to write a file,
            then read the file later in the same program.
       system(Command )
            Executes the command specified by the Command parameter and returns its exit status. Equivalent to the
            system subroutine.
       Expression | getline [ Variable ]
            Reads a record of input from a stream piped from the output of a command specified by the Expression
            parameter and assigns the value of the record to the variable specified by the Variable parameter. The
            stream is created if no stream is currently open with the value of the Expression parameter as its command
            name. The stream created is equivalent to one created by a call to the popen subroutine with the Command
            parameter taking the value of the Expression parameter and the Mode parameter set to a value of r. Each
            subsequent call to the getline function reads another record, as long as the stream remains open and the
            Expression parameter evaluates to the same string. If a Variable parameter is not specified, the $0 record
            variable and the NF special variable are set to the record read from the stream.
       getline [ Variable ] < Expression
            Reads the next record of input from the file named by the Expression parameter and sets the variable
            specified by the Variable parameter to the value of the record. Each subsequent call to the getline
            function reads another record, as long as the stream remains open and the Expression parameter evaluates to
            the same string. If a Variable parameter is not specified, the $0 record variable and the NF special
            variable are set to the record read from the stream.
       getline [ Variable ]
            Sets the variable specified by the Variable parameter to the next record of input from the current input
            file. If no Variable parameter is specified, $0 record variable is set to the value of the record, and the
            NF, NR, and FNR special variables are also set.

            Note: All forms of the getline function return 1 for successful input, zero for end of file, and -1 for an
            error.

User-Defined Functions

       User-defined functions are declared in the following form:

       function Name (Parameter, Parameter,...)  { Statements }

       A function can be referred to anywhere in an awk command program, and its use can precede its definition. The
       scope of the function is global.

       Function parameters can be either scalars or arrays. Parameter names are local to the function; all other
       variable names are global. The same name should not be used for different entities; for example, a parameter
       name should not be duplicated as a function name, or special variable. Variables with global scope should not
       share the name of a function. Scalars and arrays should not have the same name in the same scope.

       The number of parameters in the function definition does not have to match the number of parameters used when
       the function is called. Excess formal parameters can be used as local variables. Extra scalar parameters are
       initialized with a string value equivalent to the empty string and a numeric value of 0 (zero); extra array
       parameters are initialized as empty arrays.

       When invoking a function, no white space is placed between the function name and the opening parenthesis.
       Function calls can be nested and recursive. Upon return from any nested or recursive function call, the values
       of all the calling function's parameters shall be unchanged, except for array parameters passed by reference.
       The return statement can be used to return a value.

       Within a function definition, the new-line characters are optional before the opening { (brace) and after the
       closing } (brace).

       An example of a function definition is:

       function average ( g,n)
         {
               for (i in g)
                  sum=sum+g[i]
               avg=sum/n
               return avg
         }

       The function average is passed an array, g, and a variable, n, with the number of elements in the array. The
       function then obtains an average and returns it.

Conditional Statements

       Most conditional statements in the awk command programming language have the same syntax and function as
       conditional statements in the C programming language. All of the conditional statements allow the use of { }
       (braces) to group together statements. An optional new-line can be used between the expression portion and the
       statement portion of the conditional statement, and new-lines or ; (semicolon) are used to separate multiple
       statements in { } (braces). Six conditional statements in C language are:
       if
            Requires the following syntax:

            if ( Expression ) { Statement } [ else Action ]
       while
            Requires the following syntax:

            while ( Expression ) { Statement }
       for
            Requires the following syntax:

            for ( Expression ; Expression ; Expression ) { Statement }

       break
            Causes the program loop to be exited when the break statement is used in either a while or for statement.
       continue
            Causes the program loop to move to the next iteration when the continue statement is used in either a while
            or for statement.

       Five conditional statements in the awk command programming language that do not follow C-language rules are:
       for...in
            Requires the following syntax:

            for ( Variable in Array ) { Statement }

            The for...in statement sets the Variable parameter to each index value of the Array variable, one index at
            a time and in no particular order, and performs the action specified by the Statement parameter with each
            iteration. See the delete statement for an example of a for...in statement.
       if...in
            Requires the following syntax:

            if ( Variable in Array ) { Statement }

            The if...in statement searches for the existence of the Array element. The statement is performed if the
            Array element is found.
       delete
            Requires the following syntax:

            delete Array [ Expression ]

            The delete statement deletes both the array element specified by the Array parameter and the index
            specified by the Expression parameter. For example, the statements:

            for (i in g)
               delete g[i];

            would delete every element of the g[] array.
       exit
            Requires the following syntax:

            exit [ Expression ]

            The exit statement first invokes all END actions in the order they occur, then terminates the awk command
            with an exit status specified by the Expression parameter. No subsequent END actions are invoked if the
            exit statement occurs within an END action.
       #
            Requires the following syntax:

            # Comment

            The # statement places comments. Comments should always end with a new-line but can begin anywhere on a
            line.
       next
            Stops the processing of the current input record and proceeds with the next input record.

Output Statements

       Two output statements in the awk command programming language are:
       print
            Requires the following syntax:

            print [ ExpressionList ] [ Redirection ] [ Expression ]

            The print statement writes the value of each expression specified by the ExpressionList parameter to
            standard output. Each expression is separated by the current value of the OFS special variable, and each
            record is terminated by the current value of the ORS special variable.

            The output can be redirected using the Redirection parameter, which can specify the three output
            redirections with the > (greater than), >> (double greater than), and the | (pipe). The Redirection
            parameter specifies how the output is redirected, and the Expression parameter is either a path name to a
            file (when Redirection parameter is > or >> ) or the name of a command ( when the Redirection parameter is
            a | ).
       printf
            Requires the following syntax:

            printf Format [ , ExpressionList ] [ Redirection ] [ Expression ]

            The printf statement writes to standard output the expressions specified by the ExpressionList parameter in
            the format specified by the Format parameter. The printf statement functions exactly like the printf
            command, except for the c conversion specification (%c). The Redirection and Expression parameters function
            the same as in the print statement.

            For the c conversion specification: if the argument has a numeric value, the character whose encoding is
            that value will be output. If the value is zero or is not the encoding of any character in the character
            set, the behavior is undefined. If the argument does not have a numeric value, the first character of the
            string value will be output; if the string does not contain any characters the behavior is undefined.

            Note: If the Expression parameter specifies a path name for the Redirection parameter, the Expression
            parameter should be enclosed in double quotes to insure that it is treated as a string.

Variables

       Variables can be scalars, field variables, arrays, or special variables. Variable names cannot begin with a
       digit.

       Variables can be used just by referencing them. With the exception of function parameters, they are not
       explicitly declared. Uninitialized scalar variables and array elements have both a numeric value of 0 (zero) and
       a string value of the null string (" ").

       Variables take on numeric or string values according to context. Each variable can have a numeric value, a
       string value, or both. For example:

       x = "4" + "8"

       assigns the value of 12 to the variable x. For string constants, expressions should be enclosed in " " (double
       quotation) marks.

       There are no explicit conversions between numbers and strings. To force an expression to be treated as a number,
       add 0 (zero) to it. To force an expression to be treated as a string, append a null string (" ").

Field Variables

       Field variables are designated by a $ (dollar sign) followed by a number or numerical expression. The first
       field in a record is assigned the $1 variable , the second field is assigned to the $2 variable, and so forth.
       The $0 field variable is assigned to the entire record. New field variables can be created by assigning a value
       to them. Assigning a value to a non-existent field, that is, any field larger than the current value of $NF
       field variable, forces the creation of any intervening fields (set to the null string), increases the value of
       the NF special variable, and forces the value of $0 record variable to be recalculated. The new fields are
       separated by the current field separator ( which is the value of the FS special variable). Blanks and tabs are
       the default field separators. To change the field separator, use the -F flag, or assign the FS special variable
       a different value in the awk command program.

Arrays

       Arrays are initially empty and their sizes change dynamically. Arrays are represented by a variable with
       subscripts in [ ] (square brackets). The subscripts, or element identifiers, can be numbers of strings, which
       provide a type of associative array capability. For example, the program:

       /red/  { x["red"]++ }
       /green/ { y["green"]++ }

       increments counts for both the red counter and the green counter.

       Arrays can be indexed with more than one subscript, similar to multidimensional arrays in some programming
       languages. Because programming arrays for the awk command are really one dimensional, the comma-separated
       subscripts are converted to a single string by concatenating the string values of the separate expressions, with
       each expression separated by the value of the SUBSEP environmental variable. Therefore, the following two index
       operations are equivalent:

       x[expr1, expr2,...exprn]

       AND

       x[expr1SUBSEPexpr2SUBSEP...SUBSEPexprn]

       When using the in operator, a multidimensional Index value should be contained within parentheses. Except for
       the in operator, any reference to a nonexistent array element automatically creates that element.

Special Variables

       The following variables have special meaning for the awk command:
       ARGC
            The number of elements in the ARGV array. This value can be altered.
       ARGV
            The array with each member containing one of the File variables or Assignment variables, taken in order
            from the command line, and numbered from 0 (zero) to ARGC -1. As each input file is finished, the next
            member of the ARGV array provides the name of the next input file, unless:
              *    The next member is an Assignment statement, in which case the assignment is evaluated.
              *    The next member has a null value, in which case the member is skipped. Programs can skip selected
                   input files by setting the member of the ARGV array that contains that input file to a null value.
              *    The next member is the current value of ARGV [ARGC -1], which the awk command interprets as the end
                   of the input files.
       CONVFMT
            The printf format for converting numbers to strings (except for output statements, where the OFMT special
            variable is used). The default is "%.6g".
       ENVIRON
            An array representing the environment under which the awk command operates. Each element of the array is of
            the form:

            ENVIRON [ "Environment VariableName" ] = EnvironmentVariableValue

            The values are set when the awk command begins execution, and that environment is used until the end of
            execution, regardless of any modification of the ENVIRON special variable.
       FILENAME
            The path name of the current input file. During the execution of a BEGIN action, the value of FILENAME is
            undefined. During the execution of an END action, the value is the name of the last input file processed.
       FNR
            The number of the current input record in the current file.
       FS
            The input field separator. The default value is a blank. If the input field separator is a blank, any
            number of locale-defined spaces can separate fields. The FS special variable can take two additional

            values:
              *    With FS set to a single character, fields are separated by each single occurrence of the character.
              *    With FS set to an extended regular expression, each occurrence of a sequence matching the extended
                   regular expression separates fields.
       NF
            The number of fields in the current record, with a limit of 99. Inside a BEGIN action, the NF special
            variable is undefined unless a getline function without a Variable parameter has been issued previously.
            Inside an END action, the NF special variable retains the value it had for the last record read, unless a
            subsequent, redirected, getline function without a Variable parameter is issued prior to entering the END
            action.
       NR
            The number of the current input record. Inside a BEGIN action the value of the NR special variable is 0
            (zero). Inside an END action, the value is the number of the last record processed.
       OFMT
            The printf format for converting numbers to strings in output statements. The default is "% .6g".
       OFS
            The output field separator (default is a space).
       ORS
            The output record separator (default is a new-line character).
       RLENGTH
            The length of the string matched by the match function.
       RS
            Input record separator (default is a new-line character). If the RS special variable is null, records are
            separated by sequences of one or more blank lines; leading or trailing blank lines do not result in empty
            records at the beginning or end of input; and the new-line character is always a field separator,
            regardless of the value of the FS special variable.
       RSTART
            The starting position of the string matched by the match function, numbering from 1. Equivalent to the
            return value of the match function.
       SUBSEP
            Separates multiple subscripts. The default is \031.

Flags

       -f ProgramFile
            Obtains instructions for the awk command from the file specified by the ProgramFile variable. If the -f
            flag is specified multiple times, the concatenation of the files, in the order specified, will be used as
            the set of instructions.
       -u
            Displays the output in an unbuffered mode. If this flag is used, the awk command does not buffer the
            output. Instead, it displays the output instantaneously. By default, the awk command displays the output in
            a buffered mode.
       -F Ere
            Uses the extended regular expression specified by the Ere variable as the field separator. The default
            field separator is a blank.
       -v Assignment
            Assigns a value to a variable for the awk command's programming language. The Assignment parameter is in
            the form of Name = Value. The Name portion specifies the name of the variable and can be any combination of
            underscores, digits, and alphabetic characters, but it must start with either an alphabetic character or an
            underscore. The Value portion is also composed of underscores, digits, and alphabetic characters, and is
            treated as if it were preceded and followed by a " (double-quotation character, similar to a string value).
            If the Value portion is numeric, the variable will also be assigned the numeric value.

            The assignment specified by the -v flag occurs before any portion of the awk command's program is executed,
            including the BEGIN section.
       Assignment
            Assigns a value to a variable for the awk command's programming language. It has the same form and function
            as the Assignment variable with the -v flag, except for the time each is processed. The Assignment
            parameter is processed just prior to the input file (specified by the File variable) that follows it on the

            command line. If the Assignment parameter is specified just prior to the first of multiple input files, the
            assignments are processed just after the BEGIN sections (if any). If an Assignment parameter occurs after
            the last file, the assignment is processed before the END sections (if any). If no input files are
            specified, the assignments are processed the standard input is read.
       File
            Specifies the name of the file that contains the input for processing. If no File variable is specified, or
            if a - (minus) sign is specified, standard input is processed.
       'Program'
            Contains the instructions for the awk command. If the -f flag is not specified, the Program variable should
            be the first item on the command line. It should be bracketed by ' ' (single quotes).

Exit Status

       This command returns the following exit values:
       0
            Successful completion.
       >0
            An error occurred.

       You can alter the exit status within the program by using the exit [ Expression ] conditional statement.

Examples
       1    To display the lines of a file that are longer than 72 characters, enter:

            awk  'length  >72'  chapter1

            This selects each line of the chapter1 file that is longer than 72 characters and writes these lines to
            standard output, because no Action is specified. A tab character is counted as 1 byte.
       2    To display all lines between the words start and stop, including "start" and "stop", enter:

            awk  '/start/,/stop/'  chapter1
       3    To run an awk command program, sum2.awk, that processes the file, chapter1, enter:

            awk  -f  sum2.awk  chapter1

            The following program, sum2.awk, computes the sum and average of the numbers in the second column of the
            input file, chapter1:

                {
                   sum += $2
                }
            END {
                   print "Sum: ", sum;
                   print "Average:", sum/NR;
                }

            The first action adds the value of the second field of each line to the variable sum. All variables are
            initialized to the numeric value of 0 (zero) when first referenced. The pattern END before the second
            action causes those actions to be performed after all of the input file has been read. The NR special
            variable, which is used to calculate the average, is a special variable specifying the number of records
            that have been read.
       4    To print the first two fields in opposite order, enter:

            awk '{ print $2, $1 }' chapter1
       5    The following awk program

            awk -f sum3.awk chapter2
            prints the first two fields of the file chapter2 with input fields separated by comma and/or blanks and
            tabs, and then adds up the first column, and prints the sum and average:

            BEGIN  {FS = ",|[ \t]+"}
                   {print $1, $2}
                   {s += $1}
            END    {print "sum is",s,"average is", s/NR }

Related Information

       The egrep command, fgrep command, grep command, lex command, printf command, sed command.

       The popen subroutine, printf subroutine, system subroutine.

 

作者:universsky 发表于2013-5-15 10:36:30 原文链接
阅读:33 评论:0 查看评论

Viewing all articles
Browse latest Browse all 35570

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>