Understanding Awk – Practical Guide

After covering sed in details , its also good to know awk (gawk) – a programmable stream editor

Awk helps with manipulating of structured data and generating reports. awk is actually a programming language with syntax similar to C. awk Uses three ‘blocks’ of instructions: BEGIN, main loop and END and it uses similar principle of line addressing to sed

awk features

  • The ability to look upon a text file as a series of records
  • Variables
  • Arithmetic (floating point too) and string operators
  • Loops and conditions
  • Generate formatted reports
  • Define functions
  • Execute UNIX commands directly from scripts
  • Process the output of UNIX commands directly
  • Process command line arguments
  • Work with multiple input streams

Programming model

Three ‘blocks’ of instructions are used in awk:

  • BEGIN, executed before the first input line is read
  • The main loop executed for each line of input
  • END executed after the last input line has been read
  • The BEGIN and END procedures are optional

Each input line is treated as a record, referred to as $0 and each word (delimited by spaces or tabs) is treated as a field. Fields are referenced by using a “$” ($1 –  first field, $2 – second, and so on).

Simple Example:

Simple print the file:

use BEGIN and END:

Writing a script:

run it:

Script with blocks

Run it:

Note the line number using NR.

Line Addressing

Commands can be restricted to lines with patterns

Use it:

 

Predefined awk values

  • FS – Field separator – default spaces and tabs
  • OFS – Output field separator – default space
  • RS – Record separator – default newline
  • ORS  – Output record separator – default newline
  • OFMT – Output format – default “%.6g”

Constants:

  • NF – Number of Fields, ie number of words on a particular line
  • NR – Number of Records, ie number of lines read thus far
  • FILENAME – The name of the current file being processed
  • FNR – Current line number in the current file (nawk only)

Example:

useemp

useemp2

run it

 

Variables

Variables are not declared, just given names and values. Un-initialised variables are set to zero. The type is based on the assignment (string, number)

example – calculate the sum of files size:

Run it using pipe:

Another example

given the following file:

and the awk script:

run it:

 

Conditions and Loops

The syntax is similar to C

Given the following input file:

Using loops and conditions:

Run it:

 

 

Tagged

3 thoughts on “Understanding Awk – Practical Guide

  1. Thanks man. I always wanted to understand awk and your article is a nice summarization about how it works and how to use it.

  2. Thanks, informative read!

    “The BEGIN and END procedures are optional”

    I noticed that the main loop is optional too. You could for example get the number of lines in a file with just “END { print NR }”.

  3. Yo this site is AWESOME! New to data manipulation scripts and trying to find a nicE, simple, accurate intro to awk’s been a challenge!

Leave a Reply

Your email address will not be published. Required fields are marked *