AWK

Chapter 3: Built-in Variables and Field Manipulation

Transform AWK from basic text processor to data manipulation wizard. Master FS, OFS, NR, NF variables and reshape any data format you need.

You already saw a few built-in variables in the first chapter. Let's have a look at some other built-in variables along with the ones you already saw. Repitition is good for reinforced learning.

Sample Data Files

Let me create some sample files for you to work with. Save these to follow along the tutorial on your system:

Create access.log:

192.168.1.100 - alice [29/Jun/2024:10:15:22] "GET /index.html" 200 1234
192.168.1.101 - bob [29/Jun/2024:10:16:45] "POST /api/login" 200 567
192.168.1.102 - charlie [29/Jun/2024:10:17:10] "GET /images/logo.png" 404 0
10.0.0.50 - admin [29/Jun/2024:10:18:33] "GET /admin/panel" 403 892
192.168.1.100 - alice [29/Jun/2024:10:19:55] "GET /profile" 200 2456

Create inventory.csv:

laptop,Dell,XPS13,1299.99,5
desktop,HP,Pavilion,899.50,3
tablet,Apple,iPad,599.00,8
monitor,Samsung,27inch,349.99,12
keyboard,Logitech,MX Keys,99.99,15

FS (Field Separator): How you split your data

You have already used FS before. FS tells AWK how to slice each line into fields - think of it as choosing the right places to cut your data.

Default whitespace splitting

By default, the field separator is white space (space, tab etc).

Let's extract user information from our access log:

awk '{print "IP:", $1, "User:", $3, "Status:", $7}' access.log

It automatically splits on spaces and extracts IP address, username, and HTTP status code.

Output:

IP: 192.168.1.100 User: alice Status: 200
IP: 192.168.1.101 User: bob Status: 200
IP: 192.168.1.102 User: charlie Status: 404
IP: 10.0.0.50 User: admin Status: 403
IP: 192.168.1.100 User: alice Status: 200

Default Whitespace Splitting: Exrtact user info from access log. — Default Whitespace Splitting

Custom field separators

Now let's process our CSV inventory. Here we define that we have to cut the data at every comma with -F,:

awk -F, '{print $1, "by", $2, "costs $" $4}' inventory.csv

In this example, it uses comma as a separator to extract product type, manufacturer, and price from CSV.

laptop by Dell costs $1299.99
desktop by HP costs $899.50
tablet by Apple costs $599.00
monitor by Samsung costs $349.99
keyboard by Logitech costs $99.99

💡 You can also handle multiple separators.

Create mixed_data.txt:

server01::cpu::75::memory::4096
web02|admin|active|192.168.1.10
db-server,mysql,running,8192,16
cache:redis:online:1024

Now let's work on it.

awk -F'[:|,]' '{print "Server:", $1, "Service:", $2, "Info:", $4}' mixed_data.txt

It uses a character class to split on colons, pipes, or commas, thus handling inconsistent delimiters.

Server: server01 Service:  Info: 75
Server: web02 Service: admin Info: 192.168.1.10
Server: db-server Service: mysql Info: 8192
Server: cache Service: redis Info: 1024

💡

Newer version of gawk (GNU AWK) has --csv option to better deal with CSV files as some fields may contain comma inside quotes.

OFS (Output Field Separator): How you join your data

OFS controls how fields appear in your output - it's like choosing the glue between your data pieces.

Let's convert our space-separated log to CSV:

awk 'BEGIN {OFS=","} {print $3, $1, $7}' access.log

It will set the output separator to comma and create CSV with username, IP, and status.

alice,192.168.1.100,200
bob,192.168.1.101,200
charlie,192.168.1.102,404
admin,10.0.0.50,403
alice,192.168.1.100,200

Of course, you can simply use awk '{print $3 "," $1 "," $7}' access.log to achieve the same output, but that's not the point here.

📋

BEGIN is a special block that ensures your formatting is set up correctly before any data processing begins, making it perfect for this type of data transformation task. You can also use it without BEGIN:

awk -v OFS="," '{print $3, $1, $7}' access.log

Similarly, let's change our inventory csv to a pipe-delimited report:

awk -F, 'BEGIN {OFS="|"} {print $2, $3, $4, $5}' inventory.csv

Here's what it would look like:

Dell|XPS13|1299.99|5
HP|Pavilion|899.50|3
Apple|iPad|599.00|8
Samsung|27inch|349.99|12
Logitech|MX Keys|99.99|15

Change our inventory csv to a pipe-delimited report — CSV to Pipe-delimited report

Note that the original files are not touched. You see the output on STDOUT. They are not written on the input file.

RS (Record Separator): How you define records

RS tells AWK where one record ends and another begins.

We'll use a new sample file multiline_records.txt:

Name: John Smith
Age: 35
Department: Engineering
Salary: 75000

Name: Mary Johnson
Age: 42
Department: Marketing
Salary: 68000

Name: Bob Wilson
Age: 28
Department: Sales
Salary: 55000

Process these paragraph-style records with:

awk 'BEGIN {RS=""; FS="\n"} {
    name = substr($1, 7)
    age = substr($2, 6) 
    dept = substr($3, 13)
    salary = substr($4, 9)
    print name, age, dept, salary
}' multiline_records.txt

It is a bit complicated, but assuming that you are terating data files, it will be worth the effort. Here, awk treats empty lines as record separators and each line (\n) within a record as a field, then extracts the values after the colons.

Look at the formatted output now:

John Smith 35 Engineering 75000
Mary Johnson 42 Marketing 68000
Bob Wilson 28 Sales 55000

ORS (Output Record Separator): How you end records

ORS controls what goes at the end of each output record - think of it as choosing your punctuation mark.

For example, if you use this command with inventory.csv file:

awk -F, 'BEGIN {ORS=" | "} {print $1}' inventory.csv

It will replace newlines with " | " to create a continuous horizontal list of product types.

laptop | desktop | tablet | monitor | keyboard |

A more practical, real-world use case would be to add HTML line breaks to your log output so that it is displayed properly in a web browser:

awk 'BEGIN {ORS="<br>\n"} {print $3, "accessed at", $2}' access.log

Here's the output and feel free to parse it as HTML

alice accessed at -<br>
bob accessed at -<br>
charlie accessed at -<br>
admin accessed at -<br>
alice accessed at -<br>

NR (Number of Records): Your line counter

Honestly, I like to remember it as number of rows. NR tracks which record you're currently processing - like a page number, I mean line number ;)

Add line numbers to the inventory file:

awk '{printf "%2d: %s\n", NR, $0}' inventory.csv

It prints a formatted line number followed by the original line. Deja Vu? We have seen this in the first chapter, too.

 1: laptop,Dell,XPS13,1299.99,5
 2: desktop,HP,Pavilion,899.50,3
 3: tablet,Apple,iPad,599.00,8
 4: monitor,Samsung,27inch,349.99,12
 5: keyboard,Logitech,MX Keys,99.99,15

Now a better idea would to use this information to dela with specific lines only.

awk -F, 'NR >= 2 && NR <= 4 {print "Item " NR ":", $1, $3}' inventory.csv

So now, AWK will process only lines 2-4, extracting product type and model.

Item 2: desktop Pavilion
Item 3: tablet iPad
Item 4: monitor 27inch

NF (Number of Fields): Your column counter

NF tells you how many fields are in each record (row/line). This is excellent when you have to loop on data (discussed in later chapters) or have to get the last column/field for processing.

Create variable_fields.txt:

web01 active 
db02 maintenance scheduled friday
cache01 offline
backup01 running full-backup nightly
api-server online load-balanced

Let's work on this data file and make it display the number of fields in each line:

awk '{print "Server " $1 " has " NF " fields:", $0}' variable_fields.txt

As you can see, it displays the number of fields:

Server web01 has 2 fields: web01 active 
Server db02 has 4 fields: db02 maintenance scheduled friday
Server cache01 has 2 fields: cache01 offline
Server backup01 has 4 fields: backup01 running full-backup nightly
Server api-server has 3 fields: api-server online load-balanced

Let's take another example where it always prints the last field irrespective of the number of fields:

awk '{print $1 ":", $NF}' variable_fields.txt

Works fine, right?

web01: active
db02: friday
cache01: offline
backup01: nightly
api-server: load-balanced

📋

There is no check on the number of columns. If a line has only 5 fields and you want to display the 6th field, it will show blank. There won't be any error.

FILENAME: Your file tracker

FILENAME shows which file is being processed. This is essential when you handle multiple files.

Create these log files:

server1.log:

ERROR: Database connection failed
WARN: High memory usage
INFO: Backup completed

server2.log:

ERROR: Network timeout
INFO: Service restarted  
ERROR: Disk space low

Track errors across multiple files but also include from which file the output line is coming from by printing FILENAME:

awk '/ERROR/ {print FILENAME ":", $0}' server1.log server2.log

As you can see, it finds all ERROR lines and shows which file they came from.

server1.log: ERROR: Database connection failed
server2.log: ERROR: Network timeout
server2.log: ERROR: Disk space low

FNR (File Number of Records): Your per-file counter

Another in-built AWK variable that helps while dealing with multiple files. FNR resets to 1 for each new file.

Imagine a situation where you have two files to deal with AWK. If you use NR, it will count the number of rows from both files together. FNR on the other hand, will give you the number of records from each file.

Let's take an example:

awk '{print FILENAME, "line", FNR, "(overall line", NR "):", $0}' server1.log server2.log

It shows both the line number within each file (FNR) and the overall line number (NR) across all files.

server1.log line 1 (overall line 1): ERROR: Database connection failed
server1.log line 2 (overall line 2): WARN: High memory usage
server1.log line 3 (overall line 3): INFO: Backup completed
server2.log line 1 (overall line 4): ERROR: Network timeout
server2.log line 2 (overall line 5): INFO: Service restarted
server2.log line 3 (overall line 6): ERROR: Disk space low

Field Manipulation: Changing Your Data

Modifying Existing Fields

Apply a 10% discount to all prices:

awk -F, 'BEGIN {OFS=","} {$4 = $4 * 0.9; print}' inventory.csv

What it does: Multiplies the price field (column 4) by 0.9 and rebuilds the line with commas.

Output:

laptop,Dell,XPS13,1169.991,5
desktop,HP,Pavilion,809.55,3
tablet,Apple,iPad,539.1,8
monitor,Samsung,27inch,314.991,12
keyboard,Logitech,MX Keys,89.991,15

Adding New Fields

Calculate total inventory value:

awk -F, 'BEGIN {OFS=","} {
    total_value = $4 * $5
    print $0, total_value
}' inventory.csv

What it does: Multiplies price by quantity and adds the result as a new field.

Output:

laptop,Dell,XPS13,1299.99,5,6499.95
desktop,HP,Pavilion,899.50,3,2698.5
tablet,Apple,iPad,599.00,8,4792
monitor,Samsung,27inch,349.99,12,4199.88
keyboard,Logitech,MX Keys,99.99,15,1499.85

Working with Multi-Character Delimiters

Create complex_log.txt:

2024-06-29::10:15:22::INFO::System started successfully
2024-06-29::10:16:45::ERROR::Database connection timeout  
2024-06-29::10:17:10::WARN::Low disk space detected
2024-06-29::10:18:33::ERROR::Network unreachable

Parse double-colon separated data:

awk -F'::' '{print $1, $2, $3 ":", $4}' complex_log.txt

What it does: Uses double-colon as field separator to create readable timestamp and message format.

Output:

2024-06-29 10:15:22 INFO: System started successfully
2024-06-29 10:16:45 ERROR: Database connection timeout
2024-06-29 10:17:10 WARN: Low disk space detected
2024-06-29 10:18:33 ERROR: Network unreachable

🪧 Time to recall

You now have powerful tools for data manipulation:

FS/OFS: Control how you split input and join output
RS/ORS: Define what constitutes records
NR/FNR: Track line numbers globally and per-file
NF: Count fields and access the last column
FILENAME: Identify source files

These variables work together to give you complete control over how AWK processes your data.

Practice Exercises

Try these exercises with the sample files I've provided:

Convert the access.log to CSV format with just IP, user, and status
Add a 10% discount to the items in inventory.csv
Find all inventory items with quantity less than 10
Add a new field in inventory.csv that shows inventory value by multiplying sock with pricing
Add line numbers only to ERROR entries in the server logs
Calculate the average price of all inventory items
Process the variable_fields.txt file and show only lines with exactly 3 fields

In the next chapter, you'll learn mathematical operations and string functions that will turn AWK into your personal calculator and text processor!

Abhishek Prakash

Creator of Linux Handbook and It's FOSS. An ardent Linux user who has new-found love for self-hosting, homelabs and local AI.

@abhishek_foss

AWK