Chapter 2: Pattern Matching and Basic Operations
Think of AWK patterns like a security guard at a nightclub - they decide which lines get past the velvet rope and which actions get executed. Master pattern matching, and you control exactly what AWK processes.
Pattern matching fundamentals
AWK patterns work like filters: they test each line and execute actions only when conditions are met. No match = no action.
Here are some very basic examples of pattern matching:
awk '/ERROR/ {print $0}' logfile.txt # Find lines containing "ERROR"
awk '/^root/ {print $1}' /etc/passwd # Lines starting with "root"
awk '/ssh$/ {print NR}' processes.txt # Lines ending with "ssh"
awk '/^ERROR$/ {print}' logfile.txt # Lines containing only ERROR
Regular expressions in AWK use the same syntax as grep and sed. The pattern sits between forward slashes /pattern/
.
You must have some basic understanding of regex to use the pattern matching.
Tip: Instead of multiple AWK calls:
awk '/ERROR/ file && awk '/WARNING/ file
Use one call with OR:
awk '/ERROR|WARNING/ {print}' file
Conditional operations: making decisions with if-else
AWK's if-else statements work like traffic lights - they direct program flow based on conditions.
Create this file as performance.txt
:
server01 cpu 75 memory 60 disk 45
server02 cpu 45 memory 30 disk 85
server03 cpu 95 memory 85 disk 70
server04 cpu 25 memory 40 disk 20
server05 cpu 65 memory 75 disk 90
And we shall see how you can use if-else to print output that matches a certain pattern.
Simple if statement: Binary decisions
Think of if
like a bouncer - one condition, one action.
Let's use this command with the performance.txt
we created previously:
awk '{if ($3 > 80) print $1, "CPU ALERT"}' performance.txt
It will show the lines that have CPU usage ($3=3rd column) greater than 80 but print the server name ($1=first column).
server03 CPU ALERT
Only server03 exceeds the 80% CPU threshold, so only it 'triggers the alert'.
if-else structure: Either-or logic
Think of if-else
like a fork in the road - two paths, always take one.
Let's label the servers based on the disk usage.
awk '{
if ($5 > 70)
print $1, "HIGH DISK"
else
print $1, "DISK OK"
}' performance.txt
Output:
server01 DISK OK
server02 HIGH DISK
server03 DISK OK
server04 DISK OK
server05 HIGH DISK
Every server gets classified - no line escapes without a label.
if-else if-else chain: Multi-tier classification
Think of nested conditions like a sorting machine - items flow through multiple gates until they find their category.
awk '{
if ($5 > 80)
status = "CRITICAL"
else if ($5 > 60)
status = "WARNING"
else if ($5 > 40)
status = "MODERATE"
else
status = "OK"
print $1, "disk:", $5"%", status
}' performance.txt
Output:
server01 disk: 45% MODERATE
server02 disk: 85% CRITICAL
server03 disk: 70% WARNING
server04 disk: 20% OK
server05 disk: 90% CRITICAL
Each server cascades through conditions until it hits its classification tier.
Complex multi-field analysis
Let's make it a bit more complicated by combining CPU, memory, and disk metrics and create a monitoring script:
awk '{
cpu = $3; mem = $5; disk = $7
if (cpu > 90 || mem > 90 || disk > 90)
alert = "CRITICAL"
else if (cpu > 70 && mem > 70)
alert = "HIGH_LOAD"
else if (cpu > 80 || mem > 80 || disk > 80)
alert = "WARNING"
else
alert = "NORMAL"
printf "%-10s CPU:%2d%% MEM:%2d%% DISK:%2d%% [%s]\n", $1, cpu, mem, disk, alert
}' performance.txt
It should show this output.
server01 CPU:75% MEM:60% DISK:45% [NORMAL]
server02 CPU:45% MEM:30% DISK:85% [WARNING]
server03 CPU:95% MEM:85% DISK:70% [CRITICAL]
server04 CPU:25% MEM:40% DISK:20% [NORMAL]
server05 CPU:65% MEM:75% DISK:90% [CRITICAL]
This tiered approach mimics real monitoring systems - critical issues trump warnings, combined load factors matter. You can combine it with proc data and cron to convert it into an actual system resource alert system.
Comparison operators
AWK comparison operators work like mathematical symbols - they return true and false for conditions. This gives you greater control to put login in place.
We will use the following data files for our testing in this section.
A server_stats.txt file that has the hostname, cpu_cores, memory_mb, cpu_usage, status fields.
web01 8 4096 75 online
web02 4 2048 45 maintenance
db01 16 8192 90 online
db02 8 4096 65 online
cache01 2 1024 0 offline
backup01 4 2048 100 online
And a network_ports.txt file that has ip, service, port, protocol and state fields.
192.168.1.10 ssh 22 tcp open
192.168.1.10 http 80 tcp open
192.168.1.15 ftp 21 tcp closed
192.168.1.20 mysql 3306 tcp open
192.168.1.25 custom 8080 tcp open
192.168.1.30 ssh 22 tcp open
Numeric comparisons
Numeric comparisons are simple. You use the regular <,>,= etc symbols for comparing numbers.
Greater than - Like checking if servers exceed CPU thresholds:
awk '$4 > 70 {print $1, "high CPU:", $4"%"}' server_stats.txt
Output:
web01 high CPU: 75%
db01 high CPU: 90%
backup01 high CPU: 100%
Less than or equal - Like finding servers with limited resources:
awk '$2 <= 4 {print $1, "low core count:", $2}' server_stats.txt
Output:
web02 low core count: 4
cache01 low core count: 2
backup01 low core count: 4
Equals - Like finding servers with zero usage (probably offline):
awk '$4 == 0 {print $1, "zero CPU usage"}' server_stats.txt
Output:
cache01 zero CPU usage
Not equals - Like finding non-standard ports:
awk '$3 != 22 && $3 != 80 && $3 != 443 {print $1, "unusual port:", $3}' network_ports.txt
Output:
192.168.1.15 unusual port: 21
192.168.1.20 unusual port: 3306
192.168.1.25 unusual port: 8080
String comparisons
You have different operators for comparing strings. They are quite easy to use and understand.
Exact string match (==)
Let's check servers with running status:
awk '$5 == "online" {print $1, "is running"}' server_stats.txt
Output:
web01 is running
db01 is running
db02 is running
backup01 is running
Pattern match (~)
Let's find ports that are running a database like sql:
awk '$2 ~ /sql/ {print $1, "database service on port", $3}' network_ports.txt
Output:
192.168.1.20 database service on port 3306
Does NOT match (!~):
When you want to exclude the matches. For example,
echo -e "# This is a comment\nweb01 active\n# Another comment\ndb01 running" | awk '$1 !~ /^#/ {print "Valid config:", $0}'
The output will omit lines starting with #:
Valid config: web01 active
Valid config: db01 running
~
operator is like a smart search - it finds patterns within strings, not exact matches.Logical operators: &&, || and !
Logical operators work like sentence connectors - they join multiple conditions into complex tests. You'll be using them as well to add complex logic to your scripts.
Here's the test file process_list.txt
for this section:
nginx 1234 www-data 50 2.5 running
mysql 5678 mysql 200 8.1 running
apache 9012 www-data 30 1.2 stopped
redis 3456 redis 15 0.8 running
postgres 7890 postgres 150 5.5 running
sshd 2468 root 5 0.1 running
It has process, pid, user, memory_mb, cpu_percent and status fields.
AND Operator (&&) - Both conditions must be true
Let's find processes that are both memory AND CPU hungry in our input file. Let's filter the lines that have RAM more than 100 and CPU usage greater than 5.
awk '$4 > 100 && $5 > 5.0 {print $1, "resource hog:", $4"MB", $5"%"}' process_list.txt
Here's the output:
mysql resource hog: 200MB 8.1%
postgres resource hog: 150MB 5.5%
OR Operator (||) - Either condition can be true
The command filters out important services like mysql or postgres or services with CPU usage greater than 7:
awk '$1 == "mysql" || $1 == "postgres" || $5 > 7.0 {print $1, "needs attention"}' process_list.txt
Output:
mysql needs attention
postgres needs attention
NOT Operator (!) - Reverses the condition
Let's find all the services that are not active in our test file:
awk '!($6 == "running") {print $1, "not running, status:", $6}' process_list.txt
Here's the output:
apache not running, status: stopped
Complex combined Logic
You can combine them to test multiple criteria. Try and figure out what this command does:
awk '($3 == "root" && $4 > 10) || ($5 > 2.0 && $6 == "running") {
printf "Monitor: %-10s User:%-8s Mem:%3dMB CPU:%.1f%%\n", $1, $3, $4, $5
}' process_list.txt
The output should help you understand it:
Monitor: nginx User:www-data Mem: 50MB CPU:2.5%
Monitor: mysql User:mysql Mem:200MB CPU:8.1%
Monitor: postgres User:postgres Mem:150MB CPU:5.5%
Practical examples for system administrators
Now, let's see some real-world scenarios where you can use these operators. It will also have some elements from the previous chapters.
Example 1: Failed SSH login analysis
Find failed SSH attempts with IP addresses. Please note that this may not output anything if you are on a personal system that does not accept SSH connections.
awk '/Failed password/ && /ssh/ {
for(i=1; i<=NF; i++)
if($i ~ /^[0-9]+\.[0-9]+/)
print $1, $2, $i
}' /var/log/auth.log
Example 2: Process memory monitoring
Let's create a script that will display processes with high memory consumption at the time when the script was run.
ps aux | awk 'NR > 1 {
if ($4 > 5.0)
printf "HIGH MEM: %s using %.1f%%\n", $11, $4
else if ($4 > 2.0)
printf "MEDIUM: %s using %.1f%%\n", $11, $4
}'
There are better ways to monitor and setting up alert system though.
Example 3: Disk space alerts
Check for filesystems with over 80% full space.
df -h | awk 'NR > 1 {
gsub(/%/, "", $5) # Remove % symbol
if ($5 > 80)
printf "WARNING: %s is %s%% full\n", $6, $5
}'
Example 4: Log level filtering
Filter logs based on the severity levels. This is a dummy example, as you'll need some services running that have these logs level.
awk '{
if ($3 ~ /ERROR|FATAL/)
print "CRITICAL:", $0
else if ($3 ~ /WARNING|WARN/)
print "ATTENTION:", $0
else if ($3 ~ /INFO/)
print "INFO:", $4, $5, $6
}' application.log
Example 5: Network connection analysis
Analyze netstat output for suspicious connections:
netstat -an | awk '
$1 ~ /tcp/ && $6 == "ESTABLISHED" {
if ($4 !~ /:22$|:80$|:443$/)
print "Unusual connection:", $4, "->", $5
}
'
Works better on servers.
awk '$1 ~ /^adm/ {print}' /etc/passwd
as you know only the first field consists of usernames.🪧 Time to recall
In this chapter, you've learned:
- Patterns filter which lines get processed
- Comparison operators test numeric and string conditions
- Logical operators combine multiple tests
- Regular expressions provide flexible string matching
- Field-specific patterns offer precision control
Practice Exercises
- Find all users with UID greater than 1000 in
/etc/passwd
- Extract error and warning messages from a log file (if you have one)
- Show processes using more than 50% CPU from
ps aux
output - Find
/etc/ssh/sshd_config
configuration lines that aren't comments - Identify network connections on non-standard ports (see one of the examples below for reference)
In the next chapter, learn about built-in variables and field manipulation - where AWK transforms from a simple filter into a data processing powerhouse.