Chapter 3: Built-in Variables and Field Manipulation
Transform AWK from basic text processor to data manipulation wizard. Master FS, OFS, NR, NF variables and reshape any data format you need.

You already saw a few built-in variables in the first chapter. Let's have a look at some other built-in variables along with the ones you already saw. Repitition is good for reinforced learning.
Sample Data Files
Let me create some sample files for you to work with. Save these to follow along the tutorial on your system:
Create access.log
:
192.168.1.100 - alice [29/Jun/2024:10:15:22] "GET /index.html" 200 1234
192.168.1.101 - bob [29/Jun/2024:10:16:45] "POST /api/login" 200 567
192.168.1.102 - charlie [29/Jun/2024:10:17:10] "GET /images/logo.png" 404 0
10.0.0.50 - admin [29/Jun/2024:10:18:33] "GET /admin/panel" 403 892
192.168.1.100 - alice [29/Jun/2024:10:19:55] "GET /profile" 200 2456
Create inventory.csv
:
laptop,Dell,XPS13,1299.99,5
desktop,HP,Pavilion,899.50,3
tablet,Apple,iPad,599.00,8
monitor,Samsung,27inch,349.99,12
keyboard,Logitech,MX Keys,99.99,15
FS (Field Separator): How you split your data
You have already used FS before. FS tells AWK how to slice each line into fields - think of it as choosing the right places to cut your data.
Default whitespace splitting
By default, the field separator is white space (space, tab etc).
Let's extract user information from our access log:
awk '{print "IP:", $1, "User:", $3, "Status:", $7}' access.log
It automatically splits on spaces and extracts IP address, username, and HTTP status code.
Output:
IP: 192.168.1.100 User: alice Status: 200
IP: 192.168.1.101 User: bob Status: 200
IP: 192.168.1.102 User: charlie Status: 404
IP: 10.0.0.50 User: admin Status: 403
IP: 192.168.1.100 User: alice Status: 200
Custom field separators
Now let's process our CSV inventory. Here we define that we have to cut the data at every comma with -F,
:
awk -F, '{print $1, "by", $2, "costs $" $4}' inventory.csv
In this example, it uses comma as a separator to extract product type, manufacturer, and price from CSV.
laptop by Dell costs $1299.99
desktop by HP costs $899.50
tablet by Apple costs $599.00
monitor by Samsung costs $349.99
keyboard by Logitech costs $99.99
💡 You can also handle multiple separators.
Create mixed_data.txt
:
server01::cpu::75::memory::4096
web02|admin|active|192.168.1.10
db-server,mysql,running,8192,16
cache:redis:online:1024
Now let's work on it.
awk -F'[:|,]' '{print "Server:", $1, "Service:", $2, "Info:", $4}' mixed_data.txt
It uses a character class to split on colons, pipes, or commas, thus handling inconsistent delimiters.
Server: server01 Service: Info: 75
Server: web02 Service: admin Info: 192.168.1.10
Server: db-server Service: mysql Info: 8192
Server: cache Service: redis Info: 1024
--csv
option to better deal with CSV files as some fields may contain comma inside quotes.OFS (Output Field Separator): How you join your data
OFS controls how fields appear in your output - it's like choosing the glue between your data pieces.
Let's convert our space-separated log to CSV:
awk 'BEGIN {OFS=","} {print $3, $1, $7}' access.log
It will set the output separator to comma and create CSV with username, IP, and status.
alice,192.168.1.100,200
bob,192.168.1.101,200
charlie,192.168.1.102,404
admin,10.0.0.50,403
alice,192.168.1.100,200
Of course, you can simply use awk '{print $3 "," $1 "," $7}' access.log
to achieve the same output, but that's not the point here.
awk -v OFS="," '{print $3, $1, $7}' access.log
Similarly, let's change our inventory csv to a pipe-delimited report:
awk -F, 'BEGIN {OFS="|"} {print $2, $3, $4, $5}' inventory.csv
Here's what it would look like:
Dell|XPS13|1299.99|5
HP|Pavilion|899.50|3
Apple|iPad|599.00|8
Samsung|27inch|349.99|12
Logitech|MX Keys|99.99|15
Note that the original files are not touched. You see the output on STDOUT. They are not written on the input file.
RS (Record Separator): How you define records
RS tells AWK where one record ends and another begins.
We'll use a new sample file multiline_records.txt
:
Name: John Smith
Age: 35
Department: Engineering
Salary: 75000
Name: Mary Johnson
Age: 42
Department: Marketing
Salary: 68000
Name: Bob Wilson
Age: 28
Department: Sales
Salary: 55000
Process these paragraph-style records with:
awk 'BEGIN {RS=""; FS="\n"} {
name = substr($1, 7)
age = substr($2, 6)
dept = substr($3, 13)
salary = substr($4, 9)
print name, age, dept, salary
}' multiline_records.txt
It is a bit complicated, but assuming that you are terating data files, it will be worth the effort. Here, awk treats empty lines as record separators and each line (\n) within a record as a field, then extracts the values after the colons.
Look at the formatted output now:
John Smith 35 Engineering 75000
Mary Johnson 42 Marketing 68000
Bob Wilson 28 Sales 55000
ORS (Output Record Separator): How you end records
ORS controls what goes at the end of each output record - think of it as choosing your punctuation mark.
For example, if you use this command with inventory.csv
file:
awk -F, 'BEGIN {ORS=" | "} {print $1}' inventory.csv
It will replace newlines with " | " to create a continuous horizontal list of product types.
laptop | desktop | tablet | monitor | keyboard |
A more practical, real-world use case would be to add HTML line breaks to your log output so that it is displayed properly in a web browser:
awk 'BEGIN {ORS="<br>\n"} {print $3, "accessed at", $2}' access.log
Here's the output and feel free to parse it as HTML
alice accessed at -<br>
bob accessed at -<br>
charlie accessed at -<br>
admin accessed at -<br>
alice accessed at -<br>
NR (Number of Records): Your line counter
Honestly, I like to remember it as number of rows. NR tracks which record you're currently processing - like a page number, I mean line number ;)
Add line numbers to the inventory file:
awk '{printf "%2d: %s\n", NR, $0}' inventory.csv
It prints a formatted line number followed by the original line. Deja Vu? We have seen this in the first chapter, too.
1: laptop,Dell,XPS13,1299.99,5
2: desktop,HP,Pavilion,899.50,3
3: tablet,Apple,iPad,599.00,8
4: monitor,Samsung,27inch,349.99,12
5: keyboard,Logitech,MX Keys,99.99,15
Now a better idea would to use this information to dela with specific lines only.
awk -F, 'NR >= 2 && NR <= 4 {print "Item " NR ":", $1, $3}' inventory.csv
So now, AWK will process only lines 2-4, extracting product type and model.
Item 2: desktop Pavilion
Item 3: tablet iPad
Item 4: monitor 27inch
NF (Number of Fields): Your column counter
NF tells you how many fields are in each record (row/line). This is excellent when you have to loop on data (discussed in later chapters) or have to get the last column/field for processing.
Create variable_fields.txt
:
web01 active
db02 maintenance scheduled friday
cache01 offline
backup01 running full-backup nightly
api-server online load-balanced
Let's work on this data file and make it display the number of fields in each line:
awk '{print "Server " $1 " has " NF " fields:", $0}' variable_fields.txt
As you can see, it displays the number of fields:
Server web01 has 2 fields: web01 active
Server db02 has 4 fields: db02 maintenance scheduled friday
Server cache01 has 2 fields: cache01 offline
Server backup01 has 4 fields: backup01 running full-backup nightly
Server api-server has 3 fields: api-server online load-balanced
Let's take another example where it always prints the last field irrespective of the number of fields:
awk '{print $1 ":", $NF}' variable_fields.txt
Works fine, right?
web01: active
db02: friday
cache01: offline
backup01: nightly
api-server: load-balanced
FILENAME: Your file tracker
FILENAME shows which file is being processed. This is essential when you handle multiple files.
Create these log files:
server1.log
:
ERROR: Database connection failed
WARN: High memory usage
INFO: Backup completed
server2.log
:
ERROR: Network timeout
INFO: Service restarted
ERROR: Disk space low
Track errors across multiple files but also include from which file the output line is coming from by printing FILENAME
:
awk '/ERROR/ {print FILENAME ":", $0}' server1.log server2.log
As you can see, it finds all ERROR lines and shows which file they came from.
server1.log: ERROR: Database connection failed
server2.log: ERROR: Network timeout
server2.log: ERROR: Disk space low
FNR (File Number of Records): Your per-file counter
Another in-built AWK variable that helps while dealing with multiple files. FNR resets to 1 for each new file.
Imagine a situation where you have two files to deal with AWK. If you use NR
, it will count the number of rows from both files together. FNR
on the other hand, will give you the number of records from each file.
Let's take an example:
awk '{print FILENAME, "line", FNR, "(overall line", NR "):", $0}' server1.log server2.log
It shows both the line number within each file (FNR) and the overall line number (NR) across all files.
server1.log line 1 (overall line 1): ERROR: Database connection failed
server1.log line 2 (overall line 2): WARN: High memory usage
server1.log line 3 (overall line 3): INFO: Backup completed
server2.log line 1 (overall line 4): ERROR: Network timeout
server2.log line 2 (overall line 5): INFO: Service restarted
server2.log line 3 (overall line 6): ERROR: Disk space low
Field Manipulation: Changing Your Data
Modifying Existing Fields
Apply a 10% discount to all prices:
awk -F, 'BEGIN {OFS=","} {$4 = $4 * 0.9; print}' inventory.csv
What it does: Multiplies the price field (column 4) by 0.9 and rebuilds the line with commas.
Output:
laptop,Dell,XPS13,1169.991,5
desktop,HP,Pavilion,809.55,3
tablet,Apple,iPad,539.1,8
monitor,Samsung,27inch,314.991,12
keyboard,Logitech,MX Keys,89.991,15
Adding New Fields
Calculate total inventory value:
awk -F, 'BEGIN {OFS=","} {
total_value = $4 * $5
print $0, total_value
}' inventory.csv
What it does: Multiplies price by quantity and adds the result as a new field.
Output:
laptop,Dell,XPS13,1299.99,5,6499.95
desktop,HP,Pavilion,899.50,3,2698.5
tablet,Apple,iPad,599.00,8,4792
monitor,Samsung,27inch,349.99,12,4199.88
keyboard,Logitech,MX Keys,99.99,15,1499.85
Working with Multi-Character Delimiters
Create complex_log.txt
:
2024-06-29::10:15:22::INFO::System started successfully
2024-06-29::10:16:45::ERROR::Database connection timeout
2024-06-29::10:17:10::WARN::Low disk space detected
2024-06-29::10:18:33::ERROR::Network unreachable
Parse double-colon separated data:
awk -F'::' '{print $1, $2, $3 ":", $4}' complex_log.txt
What it does: Uses double-colon as field separator to create readable timestamp and message format.
Output:
2024-06-29 10:15:22 INFO: System started successfully
2024-06-29 10:16:45 ERROR: Database connection timeout
2024-06-29 10:17:10 WARN: Low disk space detected
2024-06-29 10:18:33 ERROR: Network unreachable
🪧 Time to recall
You now have powerful tools for data manipulation:
- FS/OFS: Control how you split input and join output
- RS/ORS: Define what constitutes records
- NR/FNR: Track line numbers globally and per-file
- NF: Count fields and access the last column
- FILENAME: Identify source files
These variables work together to give you complete control over how AWK processes your data.
Practice Exercises
Try these exercises with the sample files I've provided:
- Convert the access.log to CSV format with just IP, user, and status
- Add a 10% discount to the items in inventory.csv
- Find all inventory items with quantity less than 10
- Add a new field in inventory.csv that shows inventory value by multiplying sock with pricing
- Add line numbers only to ERROR entries in the server logs
- Calculate the average price of all inventory items
- Process the variable_fields.txt file and show only lines with exactly 3 fields
In the next chapter, you'll learn mathematical operations and string functions that will turn AWK into your personal calculator and text processor!
Creator of Linux Handbook and It's FOSS. An ardent Linux user who has new-found love for self-hosting, homelabs and local AI.