After covering sed in details , its also good to know awk (gawk) – a programmable stream editor
Awk helps with manipulating of structured data and generating reports. awk is actually a programming language with syntax similar to C. awk Uses three ‘blocks’ of instructions: BEGIN, main loop and END and it uses similar principle of line addressing to sed
awk features
- The ability to look upon a text file as a series of records
- Variables
- Arithmetic (floating point too) and string operators
- Loops and conditions
- Generate formatted reports
- Define functions
- Execute UNIX commands directly from scripts
- Process the output of UNIX commands directly
- Process command line arguments
- Work with multiple input streams
Programming model
Three ‘blocks’ of instructions are used in awk:
- BEGIN, executed before the first input line is read
- The main loop executed for each line of input
- END executed after the last input line has been read
- The BEGIN and END procedures are optional
Each input line is treated as a record, referred to as $0 and each word (delimited by spaces or tabs) is treated as a field. Fields are referenced by using a “$” ($1 – first field, $2 – second, and so on).
Simple Example:
# cat emp avi 1200 haifa dani 2300 tel aviv rina 3100 aco # awk '{ print $1,"-",$3 }' emp avi - haifa dani - tel rina - aco
Simple print the file:
awk ' { print } ' filename
use BEGIN and END:
# awk 'BEGIN { print "Customers List:\n==="} { print } END { print "====\nnum:" NR }' emp Customers List: === avi 1200 haifa dani 2300 tel aviv rina 3100 aco ==== num:3
Writing a script:
#!/usr/bin/awk -f { print $1,"-",$3 }
run it:
# simp emp avi - haifa dani - tel rina - aco
Script with blocks
#! /usr/bin/awk -f BEGIN { print "Customers List:" print "===============" } { print NR , "-" ,$0 } END { print "=========" print "num:" NR }
Run it:
# ./simple emp Customers List: =============== 1 - avi 1200 haifa 2 - dani 2300 tel aviv 3 - rina 3100 aco ========= num:3
Note the line number using NR.
Line Addressing
Commands can be restricted to lines with patterns
#!/usr/bin/awk -f BEGIN { print "Header" } /[0-9]+/ { print "Found Number" } /[A-Za-z]+/ { print "Found Word" } /^$/ { print "Found Blank line" } END { print "Footer" }
Use it:
# cat ./uselinead hello 100 233 bye hi 20 # ./linead uselinead Header Found Number Found Word Found Number Found Word Found Blank line Found Word Found Blank line Found Number Footer
Predefined awk values
- FS – Field separator – default spaces and tabs
- OFS – Output field separator – default space
- RS – Record separator – default newline
- ORS – Output record separator – default newline
- OFMT – Output format – default “%.6g”
Constants:
- NF – Number of Fields, ie number of words on a particular line
- NR – Number of Records, ie number of lines read thus far
- FILENAME – The name of the current file being processed
- FNR – Current line number in the current file (nawk only)
Example:
useemp
#!/usr/bin/awk -f BEGIN{ FS=","} {print $3,$2}
useemp2
#!/usr/bin/awk -f BEGIN{ FS=","; OFS="*"} {print $3,$2}
run it
# cat ./emp2 avi,1200,haifa dani,2300,tel aviv rina,3100,aco # ./useemp emp2 haifa 1200 tel aviv 2300 aco 3100 # ./useemp2 emp2 haifa*1200 tel aviv*2300 aco*3100
Variables
Variables are not declared, just given names and values. Un-initialised variables are set to zero. The type is based on the assignment (string, number)
example – calculate the sum of files size:
#!/usr/bin/awk -f { print; numfiles=numfiles + 1; numbytes=numbytes + $5 } END { print numfiles, "files,", numbytes, "bytes" }
Run it using pipe:
ls -l | ./calcsize total 56 -rw-rw-r-- 1 developer developer 187 אוג 16 2017 avg -rwxrwxr-x 1 developer developer 310 אוג 16 2017 avg.awk -rwxrwxr-x 1 developer developer 117 אוג 16 2017 calcsize -rwxrwxr-x 1 developer developer 382 אוג 14 2017 checkops -rw-rw-r-- 1 developer developer 48 אוג 16 2017 emp -rw-rw-r-- 1 developer developer 48 פבר 16 09:39 emp2 -rwxrwxr-x 1 developer developer 198 פבר 16 09:24 linead -rw-rw-r-- 1 developer developer 254 אוג 14 2017 oplist -rwxrwxr-x 1 developer developer 154 פבר 16 09:16 simple -rwxrwxr-x 1 developer developer 148 אוג 16 2017 simple2 -rwxrwxr-x 1 developer developer 49 פבר 16 09:37 useemp -rwxrwxr-x 1 developer developer 41 פבר 16 08:47 useemp1 -rwxrwxr-x 1 developer developer 57 פבר 16 09:37 useemp2 -rw-rw-r-- 1 developer developer 26 פבר 16 09:25 uselinead 15 files, 2019 bytes
Another example
given the following file:
Name CM Ph Cmp Math avi levy 68 72 91 73 eli cohen 31 59 73 87 bibi netanyahu 83 80 89 61 donald tramp 53 72 78 93 Julia roberts 69 68 79 89
and the awk script:
#!/usr/bin/awk -f BEGIN { print "grades report" print "=============" } NR == 1 { next } { lines++; fullname = $1 " " $2 print fullname, ($3 + $4 + $5 + $6) / 4 sum1 += $3; sum2 += $4; sum3 += $5; sum4 += $6 } END { print "" print "Totals" print "======" print sum1/lines, sum2/lines, sum3/lines, sum4/lines }
run it:
# ./avg.awk ./avg grades report ============= avi levy 76 eli cohen 62.5 bibi netanyahu 78.25 donald tramp 74 Julia roberts 76.25 Totals ====== 60.8 70.2 82 80.6
Conditions and Loops
The syntax is similar to C
Given the following input file:
# Year : Month : Day : Customer : D / W : Amount 2015:11:9:Joe:W:5.00 2015:11:12:Mary:W:5.50 2015:12:10:Joe:W:10.00 2015:12:15:Mary:W:10.00 2016:1:2:Hank:W:35.00 2016:1:31:David:D:100.00
Using loops and conditions:
#! /usr/bin/awk -f # Year : Month : Day : Recipient : D / W : Amount BEGIN { FS = ":" } # skip lines started with # /^[#]/ { next } # simple conditions $5 == "W" { withdrawals[$4] += $6 } $5 == "D" { deposits[$4] += $6 } END { print "Deposit totals:" for (i in deposits) printf("\t%s: $%g\n", i, deposits[i]) print "" print "Withdrawal totals:" for (i in withdrawals) if(withdrawals[i] > 15) printf("\t%s: $%g\n", i, withdrawals[i]) }
Run it:
# ./checkops ./oplist Deposit totals: David: $100 Withdrawal totals: Hank: $35 Mary: $15.5
5 thoughts on “Understanding Awk – Practical Guide”
Comments are closed.
Thanks man. I always wanted to understand awk and your article is a nice summarization about how it works and how to use it.
Thanks, informative read!
“The BEGIN and END procedures are optional”
I noticed that the main loop is optional too. You could for example get the number of lines in a file with just “END { print NR }”.
Yo this site is AWESOME! New to data manipulation scripts and trying to find a nicE, simple, accurate intro to awk’s been a challenge!
[…] via Understanding Awk – Practical Guide – Developers Area […]
[…] at the Developers Area, Liran B.H has a nice practical guide to understanding AWK. There’s a whole book dedicated to AWK so obviously a single blog post isn’t going to cover […]