Understanding Awk – Practical Guide

After covering sed in details , its also good to know awk (gawk) – a programmable stream editor

Awk helps with manipulating of structured data and generating reports. awk is actually a programming language with syntax similar to C. awk Uses three ‘blocks’ of instructions: BEGIN, main loop and END and it uses similar principle of line addressing to sed

awk features

  • The ability to look upon a text file as a series of records
  • Variables
  • Arithmetic (floating point too) and string operators
  • Loops and conditions
  • Generate formatted reports
  • Define functions
  • Execute UNIX commands directly from scripts
  • Process the output of UNIX commands directly
  • Process command line arguments
  • Work with multiple input streams

Programming model

Three ‘blocks’ of instructions are used in awk:

  • BEGIN, executed before the first input line is read
  • The main loop executed for each line of input
  • END executed after the last input line has been read
  • The BEGIN and END procedures are optional

Each input line is treated as a record, referred to as $0 and each word (delimited by spaces or tabs) is treated as a field. Fields are referenced by using a “$” ($1 –  first field, $2 – second, and so on).

Simple Example:

# cat emp 
avi 1200 haifa
dani 2300 tel aviv
rina 3100 aco

# awk '{ print $1,"-",$3 }' emp 
avi - haifa
dani - tel
rina - aco

Simple print the file:

awk ' { print } ' filename

use BEGIN and END:

# awk 'BEGIN { print "Customers List:\n==="} { print } END { print "====\nnum:" NR }' emp

Customers List:
===
avi 1200 haifa
dani 2300 tel aviv
rina 3100 aco
====
num:3

Writing a script:

#!/usr/bin/awk -f
{ print $1,"-",$3 }

run it:

# simp emp
avi - haifa
dani - tel
rina - aco

Script with blocks

#! /usr/bin/awk -f

BEGIN { 
	print "Customers List:"
	print "==============="
} 

{ print NR , "-" ,$0 } 

END { 
	print "========="
	print "num:" NR 
}

Run it:

# ./simple emp
Customers List:
===============
1 - avi 1200 haifa
2 - dani 2300 tel aviv
3 - rina 3100 aco
=========
num:3

Note the line number using NR.

Line Addressing

Commands can be restricted to lines with patterns

#!/usr/bin/awk -f
BEGIN    { print "Header" }

  /[0-9]+/    { print "Found Number" }
  /[A-Za-z]+/ { print "Found Word"   }
  /^$/        { print "Found Blank line"  }

END      { print "Footer" }

Use it:

# cat ./uselinead 
hello 100
233
bye

hi

20
# ./linead uselinead 
Header
Found Number
Found Word
Found Number
Found Word
Found Blank line
Found Word
Found Blank line
Found Number
Footer

 

Predefined awk values

  • FS – Field separator – default spaces and tabs
  • OFS – Output field separator – default space
  • RS – Record separator – default newline
  • ORS  – Output record separator – default newline
  • OFMT – Output format – default “%.6g”

Constants:

  • NF – Number of Fields, ie number of words on a particular line
  • NR – Number of Records, ie number of lines read thus far
  • FILENAME – The name of the current file being processed
  • FNR – Current line number in the current file (nawk only)

Example:

useemp

#!/usr/bin/awk -f
BEGIN{ FS=","} {print $3,$2}

useemp2

#!/usr/bin/awk -f
BEGIN{ FS=","; OFS="*"} {print $3,$2}

run it

# cat ./emp2
avi,1200,haifa
dani,2300,tel aviv
rina,3100,aco

# ./useemp emp2
haifa 1200
tel aviv 2300
aco 3100

# ./useemp2 emp2
haifa*1200
tel aviv*2300
aco*3100

 

Variables

Variables are not declared, just given names and values. Un-initialised variables are set to zero. The type is based on the assignment (string, number)

example – calculate the sum of files size:

#!/usr/bin/awk -f
	{ print; numfiles=numfiles + 1; numbytes=numbytes + $5 }
END	{ print numfiles, "files,", numbytes, "bytes" }

Run it using pipe:

ls -l | ./calcsize
total 56
-rw-rw-r-- 1 developer developer 187 אוג 16  2017 avg
-rwxrwxr-x 1 developer developer 310 אוג 16  2017 avg.awk
-rwxrwxr-x 1 developer developer 117 אוג 16  2017 calcsize
-rwxrwxr-x 1 developer developer 382 אוג 14  2017 checkops
-rw-rw-r-- 1 developer developer  48 אוג 16  2017 emp
-rw-rw-r-- 1 developer developer  48 פבר 16 09:39 emp2
-rwxrwxr-x 1 developer developer 198 פבר 16 09:24 linead
-rw-rw-r-- 1 developer developer 254 אוג 14  2017 oplist
-rwxrwxr-x 1 developer developer 154 פבר 16 09:16 simple
-rwxrwxr-x 1 developer developer 148 אוג 16  2017 simple2
-rwxrwxr-x 1 developer developer  49 פבר 16 09:37 useemp
-rwxrwxr-x 1 developer developer  41 פבר 16 08:47 useemp1
-rwxrwxr-x 1 developer developer  57 פבר 16 09:37 useemp2
-rw-rw-r-- 1 developer developer  26 פבר 16 09:25 uselinead
15 files, 2019 bytes

Another example

given the following file:

Name             CM Ph Cmp Math
avi levy         68 72 91  73
eli cohen        31 59 73  87
bibi netanyahu   83 80 89  61
donald tramp     53 72 78  93
Julia roberts    69 68 79  89

and the awk script:

#!/usr/bin/awk -f
BEGIN   {
	print "grades report"
	print "============="
}
	NR == 1 { next }
	{ 
		lines++; 
		fullname = $1 " " $2
	  	print fullname, ($3 + $4 + $5 + $6) / 4
	  	sum1 += $3; 
	  	sum2 += $4; 
	  	sum3 += $5; 
	  	sum4 += $6	
	}
END	{
        print ""
        print "Totals"
        print "======"
	print sum1/lines, sum2/lines, sum3/lines, sum4/lines
}

run it:

# ./avg.awk ./avg
grades report
=============
avi levy 76
eli cohen 62.5
bibi netanyahu 78.25
donald tramp 74
Julia roberts 76.25

Totals
======
60.8 70.2 82 80.6

 

Conditions and Loops

The syntax is similar to C

Given the following input file:

# Year : Month : Day : Customer : D / W : Amount
2015:11:9:Joe:W:5.00
2015:11:12:Mary:W:5.50
2015:12:10:Joe:W:10.00
2015:12:15:Mary:W:10.00
2016:1:2:Hank:W:35.00
2016:1:31:David:D:100.00

Using loops and conditions:

#! /usr/bin/awk -f
# Year : Month : Day : Recipient : D / W : Amount
BEGIN { FS = ":" }

# skip lines started with #
/^[#]/	{ next }	

# simple conditions 
$5 == "W" { withdrawals[$4] += $6 }
$5 == "D" { deposits[$4] += $6 }

END {
	print "Deposit totals:"
	for (i in deposits)
		printf("\t%s: $%g\n", i, deposits[i])
	print ""
	print "Withdrawal totals:"
	for (i in withdrawals)
		if(withdrawals[i] > 15)
			printf("\t%s: $%g\n", i, withdrawals[i])
}

Run it:

# ./checkops ./oplist 
Deposit totals:
	David: $100

Withdrawal totals:
	Hank: $35
	Mary: $15.5

 

 

Tagged

5 thoughts on “Understanding Awk – Practical Guide

  1. Thanks man. I always wanted to understand awk and your article is a nice summarization about how it works and how to use it.

  2. Thanks, informative read!

    “The BEGIN and END procedures are optional”

    I noticed that the main loop is optional too. You could for example get the number of lines in a file with just “END { print NR }”.

  3. Yo this site is AWESOME! New to data manipulation scripts and trying to find a nicE, simple, accurate intro to awk’s been a challenge!

  4. […] via Understanding Awk – Practical Guide – Developers Area […]

  5. […] at the Developers Area, Liran B.H has a nice practical guide to understanding AWK. There’s a whole book dedicated to AWK so obviously a single blog post isn’t going to cover […]

Comments are closed.