AWK
Mohamed Mukthar Ahmed
CONTENTS
Introduction
Structure of awk Program
First awk Program
Searching For a String
Records and Fields
Awk Relational and Logical Operators
Formatted Output
BEGIN and END Statements
Awk Defined Variables
User Defined Variables
Regular Expressions
Match Operators
Concatenation Operator
Mohamed Mukthar Ahmed
Page 1
CONTENTS
Combining Patterns
Pattern Range
Awk Defined Variables Again
Built In Arithmetic Functions
Built In String Functions
Control Flow Statements
next Statement
Arrays in Awk
for Statement
User Defined Functions
Output to File
Output to Pipes
Awk Input
getline( ) Function
Mohamed Mukthar Ahmed
Introduction
A powerful pattern matching and scanning tool.
SpecialSpecial-purpose language for lineline-oriented pattern
processing
Typically used to scan an input string, grab certain
portions of the string, then output the information in
another format.
Developed by Alfred Aho,
ho, Peter Weinberger, and
Brian Kernighan (hence the name AWK).
AWK).
There are several variants of awk: standard awk
(awk), Gnu awk (gawk
), and new awk (nawk
(gawk),
(nawk)) are the
most common.
.
common
Mohamed Mukthar Ahmed
Page 2
Introduction
AWK is a UNIX/Linux programming language used for
manipulating data and generating reports.
AWK can be used at the command line for simple
operations.
It can also be written into programs or scripts for
larger applications.
Mohamed Mukthar Ahmed
Structure of AWK Program
AWK scans a file (or any input source) line by line,
searching for lines that match a certain pattern
(regular expression) or condition.
For each pattern, an action is specified. The action is
performed when the pattern matches.
In short, an awk program consists of a number of
patterns and associated actions.
Actions are enclosed using curly braces, and separated
using semi-colons.
If there is an action without a pattern, all lines are
executed.
Mohamed Mukthar Ahmed
Page 3
First AWK Program
Consider the following simple awk program.
{ print $0 }
The action prints field 0 ( the entire line )
$ awk f myawk1
/etc/group
The general syntax of awk is as follows:
awk Fchar /pattern/{action} input_file
awk Fchar f awkscript input_file
The default delimiting character is Tab or Space.
Space.
However, if it is other than tab or space, we need to
explicitly state it using the F option.
$ awk F: -f myawk1
/etc/passwd
Mohamed Mukthar Ahmed
Searching For a String
To search for a string in an input line, specify it as a
pattern.
Patterns are enclosed using forward slash symbols.
# Searching for a pattern mukthar
/mukthar/ { print $0 }
$ awk f myawk2
$ who
/etc/passwd
| awk -f myawk2
Mohamed Mukthar Ahmed
Page 4
Records and Fields
awk sees input as a table of rows and columns.
The table consists of rows that represent records.
The entire row is identified by $0
Columns in the table are the fields. The columns are
identified as $1,
$1, $2,
$2, $3
awk expects that the fields are delimited by either a
spaces or a tabs. We can change the delimiter by
using the F option
# Searching for a condition
$3 == 501 { print $0 }
Mohamed Mukthar Ahmed
AWK Relational Operators
awk uses the following relational operators
Relational Operators
Operator
Meaning
Equal to
==
!=
>
>=
<
<=
~
!~
Not Equal to
Greater than
Greater than or equal to
Less than
Less than or equal to
Matches
Does not match
Logical Operators
Operator
Meaning
AND
&&
||
!
OR
NOT
Mohamed Mukthar Ahmed
Page 5
Formatted Output
awk uses the printf(
printf( ) for formatted output.
It is similar to that of C programming.
# Formatted output
/mukthar/ { printf(UID = %d\tGID = %d\n,$3, $4) }
Mohamed Mukthar Ahmed
BEGIN and END Statements
The keywords BEGIN and END are used to perform
specific actions relative to the programs execution.
BEGIN Action before the first input line is read.
END Action after all input lines have been
processed
# myawk4
# Searching for a pattern mukthar
BEGIN { print Locating User mukthar }
/mukthar/ { printf(UID = %d\tGID = %d\n, $3, $4) }
END { print End of Report }
$ awk F: -f myawk4
/etc/passwd
Mohamed Mukthar Ahmed
Page 6
AWK Defined Variable
awk supports a number of prepre-defined variables.
PrePre-defined Variables
Variable
Meaning
The current input line.
NR
NF
Number of fields in the input line.
# myawk5
# Number of valid users
END { print There are , NR , users }
$ awk F: -f myawk5
/etc/passwd
Mohamed Mukthar Ahmed
User Defined Variable
awk supports the use of variables.
There is no need to explicitly initialize the variable to
zero, awk does this by default.
# myawk5b
# Counting users of training group
$4 == 505 { training++ }
END { print The number of users in ;
print training group are , training
}
$ awk F: -f myawk5b
/etc/passwd
Mohamed Mukthar Ahmed
Page 7
Regular Expressions
awk provides pattern matching which is more
comprehensive. These patterns are called regular
expressions.
expressions.
Similar to those supported by UNIX / Linux grep
command.
# myawk6
# Searching for a user1 to user5
BEGIN { print Locating User1 to User5 }
/^user[1-5]/ { print $1, $3, $4 }
END { print End of Report }
$ awk F: -f myawk6
/etc/passwd
Mohamed Mukthar Ahmed
Regular Expressions
Use the ~ or !~ match operators for matching.
# myawk7
# Searching for UIDs 501-505
BEGIN { print Locating UIDs 501-505 }
$3 ~ /50[1-5]/ { print $1, $3, $4 }
END { print End of Report }
Pattern
Meaning
. DOT any one character
\
[ ]
[^
[^]
^
$
despecialize character
any one character in the list. Character Class
any one character not in the list
beginning of line
Also called as anchors
end of line
Mohamed Mukthar Ahmed
Page 8
Regular Expressions
We can match a repeating pattern by adding a
modifier or repetition operator.
operator.
Regular expressions can have any of the three
modifiers.
modifiers.
Modifier
Meaning
?
Match at most once the preceding character
*
+
{n}
Match preceding character exactly n times
{n,}
Match preceding character at least n times
{n,m}
n,m}
zero or more occurrences of preceding character
one or more occurrence of preceding character
Match preceding character at least n times but
not more than m times.
Alternative patterns can be specified by a
alternate separator | (pipe)
Mohamed Mukthar Ahmed
Concatenation Operator
The plus (+
(+) symbol concatenates one or more
strings in pattern matching.
# myawk8
# Searching for a pattern $unix
BEGIN { print Locating $Unix or $unix }
$1 ~ /\$+[Uu]nix/ { print $0 }
END { print End of Report }
awk interprets any string or variable on the right
side of ~ or !~ as a regular expression.
Thus, the regular expression can be assigned to a
variable, and the variable can be used in pattern
matching.
Mohamed Mukthar Ahmed
Page 9
Combining Patterns
Patterns can be combined to provide more
powerful and complex matching.
# myawk9
# Combineing patterns
BEGIN { print Combining Patterns }
$1 == 486 && $5 > 250 { print $0 }
END { print End of Report }
awk pattern range can be specified by having two
patterns separated by a comma.
The action is performed for each input line between
the occurrence of the first and second pattern.
# myawk9b
/user1/,/user8/ { print $0 }
Mohamed Mukthar Ahmed
AWK Defined Variable - Again
awk supports a number of prepre-defined variables.
Variable
NR
NF
FS
FILENAME
FNR
OFS
ORS
ARGC
ARGV
PrePre-defined Variables
Meaning
The current input line.
Number of fields in the input line.
Input field separator.
Name of current input file.
Record number in current input file.
Output field separator.
Output record separator.
Number of command line arguments.
Array of command line arguments.
Mohamed Mukthar Ahmed
Page 10
Examples
Examples on using prepre-defined variables of awk.
awk.
# myawk10
# Print the first five input lines.
FNR == 1, FNR == 5 { print $0 }
# myawk11
# Print each input line with line number.
# Print the heading with file name.
BEGIN { print "File :", FILENAME }
{ print NR, ":\t", $0 }
# myawk12
BEGIN { print "There are ", ARGC, "parameters on the
command line";
print "The first argument is ", ARGV[0];
print "The second argument is ", ARGV[1];
}
Mohamed Mukthar Ahmed
AWK Built In Arithmetic Functions
Following is a summary of awk
awks builtbuilt-in arithmetic
functions.
All operations are done in floatingfloating-point format.
Name
int(x)
int(x)
sqrt(x)
sqrt(x)
rand(x)
rand(x)
srand(x)
srand(x)
exp(x)
exp(x)
log(x)
log(x)
sin(x)
sin(x)
cos(x)
cos(x)
Arithmetic Functions
Description
Integer part of x
Square root of x
Random number between 0 and 1
x is a new seed for rand( )
Exponential function of x
Natural Logarithm of x
Sine of x, with x in radians
Cosine of x, with x in radians
Mohamed Mukthar Ahmed
Page 11
Examples
Examples on builtbuilt-in functions of awk.
awk.
# myawk13
# Print the square root of input value
{ print sqrt( $1 ) }
$ awk f myawk13
2
1.41421
3
1.73205
4
2
If no data file is specified, awk reads from the stdin
input file (i.e. keyboard)
keyboard)
Mohamed Mukthar Ahmed
AWK Built In String Functions
Following is a summary of awk
awks builtbuilt-in string
functions.
Strings are enclosed within quotes ( )
Name
String Functions
Description
Return length of s
length(s)
length(s)
Returns substring of s from position p
substr(s,p)
substr(s,p)
substr(s,p,n)
substr(s,p,n) Returns substring of s from position p of
index(s,t)
index(s,t)
match(s,t)
match(s,t)
split(s,a)
split(s,a)
split(s,a,fs)
split(s,a,fs)
length n
Returns position of t in string s
Returns position of t in string s
Splits s into elements of a defined by FS
Splits s into elements of a defined by fs
Mohamed Mukthar Ahmed
Page 12
AWK Built In String Functions
Following is a summary of awk
awks builtbuilt-in string
functions.
Strings are enclosed within quotes ( )
Name
gsub(r,s)
gsub(r,s)
gsub(r,s,t)
gsub(r,s,t)
String Functions
Description
Substitutes s in place of r in $0 globally.
Returns the number of substitutions made
Substitutes s in place of r in t. Returns the
number of substitutions made.
Substitutes s for first r.
sub(r,s)
sub(r,s)
Substitutes s for first r in t.
sub(r,s,t)
sub(r,s,t)
sprintf(fmt Returns expression list formatted according
to format string specified by fmt.
fmt.
,expr,expr-lst)
lst)
Mohamed Mukthar Ahmed
Control Flow Statements
awk provides constructs to implement selection and
iteration.
iteration.
awk control flow statements are similar to C
language constructs.
# myawk15
# Finding biggest disk
{
if (disksize < $5 )
{
disksize = $5;
computer = $0
}
}
END
{ print
computercomp_data
}
$ awk
-f myawk15
Mohamed Mukthar Ahmed
Page 13
Control Flow Statements - Examples
Examples
# myawk16
# To print out each second field for 286 computers
BEGIN { printf("Type\tLoc\tDisk\n"); }
/286/ { field = 1;
while( field <= NF )
{
printf("%s\t", $field);
field += 2;
}
print "";
}
$ awk
-f myawk16
comp_data
Mohamed Mukthar Ahmed
Control Flow Statements - 2
We are already familiar with break and continue
statements.
The next statement skips to the next input line then
restarts from the first patternpattern-action statement.
# myawk17
# Print out computer type 286 using next
{
while($1 != 286)
next;
print $0
}
$ awk
-f myawk17
comp_data
Mohamed Mukthar Ahmed
Page 14
Arrays in AWK
awk provides single dimensioned arrays.
Arrays need not be declared, they can be created.
Arrays are heterogeneous.
heterogeneous.
Array subscripts are strings.
strings.
# Array Examples
ARRAY[e001] = 100
print ARRAY[e001]
ARRAY[
ARRAY[num
num] and ARRAY[num] are not the same.
ARRAY[1] and ARRAY[
ARRAY[1] is the same.
Mohamed Mukthar Ahmed
Arrays - Examples
Examples
# myawk18
# diskspace[] holds the sum of the disk space for all
# computers
# computers[] holds number of computers of specific type
$1 == "486" { computers["486"]++ }
$5 > 0
{ diskspace[0] += $5 }
END {
print "Number of 486 computers :" ,
computers[486];
print "Total disk space :",
diskspace[0];
}
$ awk
-f myawk18
comp_data
Mohamed Mukthar Ahmed
Page 15
Arrays in AWK
awk array checking.
To check for a subscript in an array, awk provides us
the in operator.
To remove an array element, awk provides us the
delete operator.
# Array Examples
if (e001 in ARRAY)
delete ARRAY[e001]
Mohamed Mukthar Ahmed
for Statement
awk provides a for construct for handling arrays
iterations.
Syntax:
for ( var in array ) statement(s)
The var get one element at a time from the array and
executes the statement until the array elements are
not exhausted.
Mohamed Mukthar Ahmed
Page 16
for - Examples
Examples
# myawk19
# Counting number of computers of all types
{ computers[$1]++ }
END {
for(name in computers)
print Number of , name, computers
is , computers[name]
}
$ awk
-f myawk19
comp_data
Mohamed Mukthar Ahmed
User Defined Functions
awk supports user defined functions.
Syntax:
function name(arg_list)
{
statements
}
There must be NO space between the function name
and the left bracket of the argument list.
The return statement is used to return a value by the
function.
Mohamed Mukthar Ahmed
Page 17
UDF - Examples
Examples
# myawk20
# Finding factorial
function factorial(n) {
if(n<=1) return 1
else return n*factorial(n-1)
} #End of function
{
print Factorial of,$1,is,factorial($1)
}
$ awk
-f myawk20
Mohamed Mukthar Ahmed
Output To File
awk output generated by print or printf can be
redirected to a file by using the redirection concept.
concept.
The name of the file MUST be in quotes.
# myawk21
# Output to file
$1 == "486" {
print "Type = ", $1 , "Location = ", $3
}
$ awk
> "comp486.dat"
-f myawk21 comp_data
The output of awk programs can be piped into a
UNIX / Linux command. Termed as Output To Pipes
Pipes
Mohamed Mukthar Ahmed
Page 18
AWK Input getline Function
awk getline( ) function reads input from the following
Current Pipe
Current File
Specific File
Internal Pipe
# myawk22
# Using getline function
if ($1 == "486) {
FIRSTLINE = $0;
getline;
SECONDLINE = $0
}
$ awk
-f myawk22 comp_data
Mohamed Mukthar Ahmed
AWK Input getline Function
Reads next line, sets $0 and NF.
increments NR and FNR
Moreover,
# myawk23
{
print NR, $0
getline;
print NR, $0
}
$ echo 100 200 300 400 500 600 | awk
-f myawk23
Mohamed Mukthar Ahmed
Page 19
AWK Input getline Function
getline var
FNR
Read next line into var,
var, increment NR
getline < file Read next line from file.
file. Set $0, NF
# myawk24
# Using getline function
while( getline < data )
print $0;
$ awk
-f myawk24
Mohamed Mukthar Ahmed
AWK Input getline Function
getline var < file Read next line from file into var,
var,
increment NR FNR
cmd | getline Read next line from cmd.
cmd. Set $0, NF
cmd | getline var Read next line from cmd into var
# myawk25
# Using getline function
while( who | getline )
print user , $1, tty, $3;
$ awk
-f myawk25
Mohamed Mukthar Ahmed
Page 20