Skip to main content

AWK

Chapter 2: Pattern Matching and Basic Operations

Control exactly what AWK processes with pattern matching. Master if-else statements, comparison operators, and complex filtering conditions.

Warp Terminal

Think of AWK patterns like a security guard at a nightclub - they decide which lines get past the velvet rope and which actions get executed. Master pattern matching, and you control exactly what AWK processes.

Pattern matching fundamentals

AWK patterns work like filters: they test each line and execute actions only when conditions are met. No match = no action.

Here are some very basic examples of pattern matching:

awk '/ERROR/ {print $0}' logfile.txt       # Find lines containing "ERROR"
awk '/^root/ {print $1}' /etc/passwd       # Lines starting with "root"
awk '/ssh$/ {print NR}' processes.txt      # Lines ending with "ssh"
awk '/^ERROR$/ {print}' logfile.txt        # Lines containing only ERROR

Regular expressions in AWK use the same syntax as grep and sed. The pattern sits between forward slashes /pattern/.

You must have some basic understanding of regex to use the pattern matching.

Tip: Instead of multiple AWK calls:

awk '/ERROR/ file && awk '/WARNING/ file

Use one call with OR:

awk '/ERROR|WARNING/ {print}' file
💡
I advise creating the data files and trying all the commands on your system. This will give you a lot better understanding of concepts than just reading the text and mentioned outputs.

Conditional operations: making decisions with if-else

AWK's if-else statements work like traffic lights - they direct program flow based on conditions.

Create this file as performance.txt:

server01 cpu 75 memory 60 disk 45
server02 cpu 45 memory 30 disk 85  
server03 cpu 95 memory 85 disk 70
server04 cpu 25 memory 40 disk 20
server05 cpu 65 memory 75 disk 90

And we shall see how you can use if-else to print output that matches a certain pattern.

Simple if statement: Binary decisions

Think of if like a bouncer - one condition, one action.

Let's use this command with the performance.txt we created previously:

awk '{if ($3 > 80) print $1, "CPU ALERT"}' performance.txt

It will show the lines that have CPU usage ($3=3rd column) greater than 80 but print the server name ($1=first column).

server03 CPU ALERT
Simple if statement
Simple if statement

Only server03 exceeds the 80% CPU threshold, so only it 'triggers the alert'.

if-else structure: Either-or logic

Think of if-else like a fork in the road - two paths, always take one.

Let's label the servers based on the disk usage.

awk '{
    if ($5 > 70) 
        print $1, "HIGH DISK"
    else 
        print $1, "DISK OK"
}' performance.txt

Output:

server01 DISK OK
server02 HIGH DISK
server03 DISK OK  
server04 DISK OK
server05 HIGH DISK
if-else structure. Either or logic
if-else structure

Every server gets classified - no line escapes without a label.

📋
The multi-line AWK command can be copy-pasted as it is in the terminal and it should run fine. While it all is just one line, it is easier to understand when written across lines. When you are using it inside bash scripts, always use multiple lines.

if-else if-else chain: Multi-tier classification

Think of nested conditions like a sorting machine - items flow through multiple gates until they find their category.

awk '{
    if ($5 > 80) 
        status = "CRITICAL"
    else if ($5 > 60) 
        status = "WARNING" 
    else if ($5 > 40)
        status = "MODERATE"
    else 
        status = "OK"
    print $1, "disk:", $5"%", status
}' performance.txt

Output:

server01 disk: 45% MODERATE
server02 disk: 85% CRITICAL
server03 disk: 70% WARNING
server04 disk: 20% OK
server05 disk: 90% CRITICAL
if-else if-else chain: Multi-tier classification
if-else if-else chain

Each server cascades through conditions until it hits its classification tier.

Complex multi-field analysis

Let's make it a bit more complicated by combining CPU, memory, and disk metrics and create a monitoring script:

awk '{
    cpu = $3; mem = $5; disk = $7
    
    if (cpu > 90 || mem > 90 || disk > 90)
        alert = "CRITICAL"
    else if (cpu > 70 && mem > 70)
        alert = "HIGH_LOAD" 
    else if (cpu > 80 || mem > 80 || disk > 80)
        alert = "WARNING"
    else
        alert = "NORMAL"
        
    printf "%-10s CPU:%2d%% MEM:%2d%% DISK:%2d%% [%s]\n", $1, cpu, mem, disk, alert
}' performance.txt

It should show this output.

server01   CPU:75% MEM:60% DISK:45% [NORMAL]
server02   CPU:45% MEM:30% DISK:85% [WARNING]
server03   CPU:95% MEM:85% DISK:70% [CRITICAL]
server04   CPU:25% MEM:40% DISK:20% [NORMAL]
server05   CPU:65% MEM:75% DISK:90% [CRITICAL]
Complex Multi-field analysis using awk command
Complex multi-field analysis

This tiered approach mimics real monitoring systems - critical issues trump warnings, combined load factors matter. You can combine it with proc data and cron to convert it into an actual system resource alert system.

Comparison operators

AWK comparison operators work like mathematical symbols - they return true and false for conditions. This gives you greater control to put login in place.

We will use the following data files for our testing in this section.

A server_stats.txt file that has the hostname, cpu_cores, memory_mb, cpu_usage, status fields.

web01 8 4096 75 online
web02 4 2048 45 maintenance  
db01 16 8192 90 online
db02 8 4096 65 online
cache01 2 1024 0 offline
backup01 4 2048 100 online

And a network_ports.txt file that has ip, service, port, protocol and state fields.

192.168.1.10 ssh 22 tcp open
192.168.1.10 http 80 tcp open
192.168.1.15 ftp 21 tcp closed
192.168.1.20 mysql 3306 tcp open  
192.168.1.25 custom 8080 tcp open
192.168.1.30 ssh 22 tcp open

Numeric comparisons

Numeric comparisons are simple. You use the regular <,>,= etc symbols for comparing numbers.

Greater than - Like checking if servers exceed CPU thresholds:

awk '$4 > 70 {print $1, "high CPU:", $4"%"}' server_stats.txt

Output:

web01 high CPU: 75%
db01 high CPU: 90%
backup01 high CPU: 100%
Numeric comparisons: Greater than: Checking if servers exceed CPU thresholds
Numeric Comparisons: Greater than.

Less than or equal - Like finding servers with limited resources:

awk '$2 <= 4 {print $1, "low core count:", $2}' server_stats.txt

Output:

web02 low core count: 4
cache01 low core count: 2
backup01 low core count: 4
Numeric Comparisons: Less than or  Equal: finding servers with limited resources.
Less than or equal

Equals - Like finding servers with zero usage (probably offline):

awk '$4 == 0 {print $1, "zero CPU usage"}' server_stats.txt

Output:

cache01 zero CPU usage
Numeric Comparisons: Equals: Finding servers with zero usage.
Numeric Comparison: Equals

Not equals - Like finding non-standard ports:

awk '$3 != 22 && $3 != 80 && $3 != 443 {print $1, "unusual port:", $3}' network_ports.txt

Output:

192.168.1.15 unusual port: 21
192.168.1.20 unusual port: 3306
192.168.1.25 unusual port: 8080
Numeric Comparisons: Not Equals: Finding non-standard ports
Not equals

String comparisons

You have different operators for comparing strings. They are quite easy to use and understand.

Exact string match (==)

Let's check servers with running status:

awk '$5 == "online" {print $1, "is running"}' server_stats.txt

Output:

web01 is running
db01 is running
db02 is running
backup01 is running
String Comparisons: Exact String Match
Exact String Match

Pattern match (~)

Let's find ports that are running a database like sql:

awk '$2 ~ /sql/ {print $1, "database service on port", $3}' network_ports.txt

Output:

192.168.1.20 database service on port 3306
String Comparisons: Pattern Match
Pattern Match

Does NOT match (!~):

When you want to exclude the matches. For example,

echo -e "# This is a comment\nweb01 active\n# Another comment\ndb01 running" | awk '$1 !~ /^#/ {print "Valid config:", $0}'

The output will omit lines starting with #:

Valid config: web01 active
Valid config: db01 running
Does not match
String Comparisons: Does not Match
💡
The ~ operator is like a smart search - it finds patterns within strings, not exact matches.

Logical operators: &&, || and !

Logical operators work like sentence connectors - they join multiple conditions into complex tests. You'll be using them as well to add complex logic to your scripts.

Here's the test file process_list.txt for this section:

nginx 1234 www-data 50 2.5 running
mysql 5678 mysql 200 8.1 running
apache 9012 www-data 30 1.2 stopped
redis 3456 redis 15 0.8 running
postgres 7890 postgres 150 5.5 running
sshd 2468 root 5 0.1 running

It has process, pid, user, memory_mb, cpu_percent and status fields.

AND Operator (&&) - Both conditions must be true

Let's find processes that are both memory AND CPU hungry in our input file. Let's filter the lines that have RAM more than 100 and CPU usage greater than 5.

awk '$4 > 100 && $5 > 5.0 {print $1, "resource hog:", $4"MB", $5"%"}' process_list.txt

Here's the output:

mysql resource hog: 200MB 8.1%
postgres resource hog: 150MB 5.5%
And operator: Both conditions must be true
And Operator

OR Operator (||) - Either condition can be true

The command filters out important services like mysql or postgres or services with CPU usage greater than 7:

awk '$1 == "mysql" || $1 == "postgres" || $5 > 7.0 {print $1, "needs attention"}' process_list.txt

Output:

mysql needs attention
postgres needs attention
Or operator: Either condition can be true.
Or Operator

NOT Operator (!) - Reverses the condition

Let's find all the services that are not active in our test file:

awk '!($6 == "running") {print $1, "not running, status:", $6}' process_list.txt

Here's the output:

apache not running, status: stopped
Not Operator: Reverses the Conditions
Not Operator

Complex combined Logic

You can combine them to test multiple criteria. Try and figure out what this command does:

awk '($3 == "root" && $4 > 10) || ($5 > 2.0 && $6 == "running") {
    printf "Monitor: %-10s User:%-8s Mem:%3dMB CPU:%.1f%%\n", $1, $3, $4, $5
}' process_list.txt

The output should help you understand it:

Monitor: nginx      User:www-data Mem: 50MB CPU:2.5%
Monitor: mysql      User:mysql    Mem:200MB CPU:8.1%
Monitor: postgres   User:postgres Mem:150MB CPU:5.5%
Complex Combined Logic: Combine to test multiple criteria.
Complex Combined Logic

Practical examples for system administrators

Now, let's see some real-world scenarios where you can use these operators. It will also have some elements from the previous chapters.

Example 1: Failed SSH login analysis

Find failed SSH attempts with IP addresses. Please note that this may not output anything if you are on a personal system that does not accept SSH connections.

awk '/Failed password/ && /ssh/ {
    for(i=1; i<=NF; i++) 
        if($i ~ /^[0-9]+\.[0-9]+/) 
            print $1, $2, $i
}' /var/log/auth.log

Example 2: Process memory monitoring

Let's create a script that will display processes with high memory consumption at the time when the script was run.

ps aux | awk 'NR > 1 {
    if ($4 > 5.0) 
        printf "HIGH MEM: %s using %.1f%%\n", $11, $4
    else if ($4 > 2.0)
        printf "MEDIUM: %s using %.1f%%\n", $11, $4
}'

There are better ways to monitor and setting up alert system though.

Example 3: Disk space alerts

Check for filesystems with over 80% full space.

df -h | awk 'NR > 1 {
    gsub(/%/, "", $5)  # Remove % symbol
    if ($5 > 80)
        printf "WARNING: %s is %s%% full\n", $6, $5
}'

Example 4: Log level filtering

Filter logs based on the severity levels. This is a dummy example, as you'll need some services running that have these logs level.

awk '{
    if ($3 ~ /ERROR|FATAL/) 
        print "CRITICAL:", $0
    else if ($3 ~ /WARNING|WARN/)
        print "ATTENTION:", $0  
    else if ($3 ~ /INFO/)
        print "INFO:", $4, $5, $6
}' application.log

Example 5: Network connection analysis

Analyze netstat output for suspicious connections:


netstat -an | awk '
    $1 ~ /tcp/ && $6 == "ESTABLISHED" {
        if ($4 !~ /:22$|:80$|:443$/)
            print "Unusual connection:", $4, "->", $5
    }
'

Works better on servers.

💡
It is more precise to search in fields than searching entire line, when it is suited. For example, if you are looking for username that starts with adm, use awk '$1 ~ /^adm/ {print}' /etc/passwd as you know only the first field consists of usernames.

🪧 Time to recall

In this chapter, you've learned:

  • Patterns filter which lines get processed
  • Comparison operators test numeric and string conditions
  • Logical operators combine multiple tests
  • Regular expressions provide flexible string matching
  • Field-specific patterns offer precision control

Practice Exercises

  1. Find all users with UID greater than 1000 in /etc/passwd
  2. Extract error and warning messages from a log file (if you have one)
  3. Show processes using more than 50% CPU from ps aux output
  4. Find /etc/ssh/sshd_config configuration lines that aren't comments
  5. Identify network connections on non-standard ports (see one of the examples below for reference)

In the next chapter, learn about built-in variables and field manipulation - where AWK transforms from a simple filter into a data processing powerhouse.

Abhishek Prakash