Skip to main content

File Manipulation Commands

Uniq Command in Linux Explained With Examples

The uniq command in Linux and Unix is used for removing duplicate lines from a file. Learn how to use uniq command with these examples.

The uniq command in Unix and Linux is used for filtering duplicate text. It can be used by itself but it is commonly used in along with other commands like to identify redundant information in a file.

Here’s the syntax of the uniq command:

uniq [options] <input-file> <output-file>

When you run uniq without options it will use the stdin and stdout for input and output.

While using stdin is possible using the clipboard (copy/paste), this isn’t the most practical use.

Instead you would probably want to use this command on a file that you suspect contains duplicate information.

One limitation of the uniq command is that it will only identify duplicates that are adjacent, or next to each other, in the file. This is pretty straightforward, but let me show you at an example so you can see it in action.

[linuxhandbook@fedora ~]$ cat apple.txt
apple
apple
orange
orange
apple 
orange
[linuxhandbook@fedora ~]$ uniq apple.txt 
apple
orange
apple 
orange

So, you know right away that you cannot trust the program to identify every duplicate on its own. There are some ways to get around this and normally it is with the sort command.

I’ll show it to you later in this article. First, let me run through some examples to familiarize you with ‘uniq’ before mixing in other commands and potentially confusing things.

7 examples of the uniq command in Linux

Uniq Command

I used a real system log but edited it for demonstration purposes. Most of the file has already been sorted into adjacent order, but I’ve left a couple lines “out of place” to show the functionality of uniq command.

https://gist.github.com/abhishekpc/7dada8c6e57fd5b854f9d2dae72dddb0DOWNLOAD SAMPLE TEXT FILE

Example 1: Using uniq command the default way

Although I showed you this already, let’s look at our sample file using the default syntax.

[linuxhandbook@fedora ~]$ uniq sample_log_file.txt 
/usr/lib/gdm3/gdm-x-session[1443]: (II) No input driver specified, ignoring this device.
/usr/lib/gdm3/gdm-x-session[1443]: (II) event9  - Intel HID events: device is a keyboard
/usr/lib/gdm3/gdm-x-session[1443]: (II) event9  - Intel HID events: device removed
/usr/lib/gdm3/gdm-x-session[1443]: (II) event9  - Intel HID events: is tagged by udev as: Keyboard
/usr/lib/gdm3/gdm-x-session[1443]: (II) No input driver specified, ignoring this device.
/usr/lib/gdm3/gdm-x-session[1443]: (II) systemd-logind: got fd for /dev/input/event10 13:74 fd 55 paused 0
/usr/lib/gdm3/gdm-x-session[1443]: (II) This device may have been added with another device file.
PackageKit: get-updates transaction /354_eebeebaa from uid 1000 finished with success after 1514ms
wpa_supplicant[898]: RRM: Ignoring radio measurement request: Not RRM network

You can see that a lot of the duplicate lines are consolidated, but it still has redundant information. This is due to the functional limitation I already described. Let’s look at a few more examples and examine some of the options that are built-in to the ‘uniq’ command line utility.

Example 2: Output filtered results to destination file

You may want to save this output so you can easily edit it or preserve it. You can direct our output to a separate file instead of the normal stdout (terminal). It is important to note that you cannot use this format to override the original file.

[linuxhandbook@fedora ~]$ uniq sample_log_file.txt uniq_log_output.txt 

Here’s the content of the output file:

[linuxhandbook@fedora ~]$ cat uniq_log_output.txt 
/usr/lib/gdm3/gdm-x-session[1443]: (II) No input driver specified, ignoring this device.
/usr/lib/gdm3/gdm-x-session[1443]: (II) event9  - Intel HID events: device is a keyboard
/usr/lib/gdm3/gdm-x-session[1443]: (II) event9  - Intel HID events: device removed
/usr/lib/gdm3/gdm-x-session[1443]: (II) event9  - Intel HID events: is tagged by udev as: Keyboard
/usr/lib/gdm3/gdm-x-session[1443]: (II) No input driver specified, ignoring this device.
/usr/lib/gdm3/gdm-x-session[1443]: (II) systemd-logind: got fd for /dev/input/event10 13:74 fd 55 paused 0
/usr/lib/gdm3/gdm-x-session[1443]: (II) This device may have been added with another device file.
PackageKit: get-updates transaction /354_eebeebaa from uid 1000 finished with success after 1514ms
wpa_supplicant[898]: RRM: Ignoring radio measurement request: Not RRM network

Example 3: Using ‘-c’ to get the count of repeated lines

This option is pretty self-explanatory. The program will append the count to the beginning of each line.

[linuxhandbook@fedora ~]$ uniq sample_log_file.txt -c
      2 /usr/lib/gdm3/gdm-x-session[1443]: (II) No input driver specified, ignoring this device.
      2 /usr/lib/gdm3/gdm-x-session[1443]: (II) event9  - Intel HID events: device is a keyboard
      1 /usr/lib/gdm3/gdm-x-session[1443]: (II) event9  - Intel HID events: device removed
      2 /usr/lib/gdm3/gdm-x-session[1443]: (II) event9  - Intel HID events: is tagged by udev as: Keyboard
      5 /usr/lib/gdm3/gdm-x-session[1443]: (II) No input driver specified, ignoring this device.
      1 /usr/lib/gdm3/gdm-x-session[1443]: (II) systemd-logind: got fd for /dev/input/event10 13:74 fd 55 paused 0
      7 /usr/lib/gdm3/gdm-x-session[1443]: (II) This device may have been added with another device file.
      1 PackageKit: get-updates transaction /354_eebeebaa from uid 1000 finished with success after 1514ms
      8 wpa_supplicant[898]: RRM: Ignoring radio measurement request: Not RRM network

Example 4: Only print repeated lines with ‘-d’

As you can see only lines that are duplicated throughout the file are shown, if you use the -d option of the uniq command.

[linuxhandbook@fedora ~]$ uniq sample_log_file.txt -d
/usr/lib/gdm3/gdm-x-session[1443]: (II) No input driver specified, ignoring this device.
/usr/lib/gdm3/gdm-x-session[1443]: (II) event9  - Intel HID events: device is a keyboard
/usr/lib/gdm3/gdm-x-session[1443]: (II) event9  - Intel HID events: is tagged by udev as: Keyboard
/usr/lib/gdm3/gdm-x-session[1443]: (II) No input driver specified, ignoring this device.
/usr/lib/gdm3/gdm-x-session[1443]: (II) This device may have been added with another device file.
wpa_supplicant[898]: RRM: Ignoring radio measurement request: Not RRM network

Example 5: Only print unique lines with ‘-u’

Here, you get the inverse output of the previous command. None of these commands are repeated in the file.

[linuxhandbook@fedora ~]$ uniq sample_log_file.txt -u
/usr/lib/gdm3/gdm-x-session[1443]: (II) event9  - Intel HID events: device removed
/usr/lib/gdm3/gdm-x-session[1443]: (II) systemd-logind: got fd for /dev/input/event10 13:74 fd 55 paused 0
PackageKit: get-updates transaction /354_eebeebaa from uid 1000 finished with success after 1514ms

Example 6: Ignore fields or characters with uniq [‘-f’ and ‘-s’]

This is really two examples, but the functions are nearly identical. I will explain how they work and then provide some clarity on the differences between the two of them.

Each of them use the following syntax

Skip fields with:
uniq <source_file> -f N
Skip characters with:
uniq <source_file> -s N

In each of these examples, ‘N’ is the count of items that you wish to skip. When you skip this number of items,uniq will begin the comparison at that point rather than compare the entire line.

The option ‘f’ will skip the assigned number of fields. The fields will be interpreted using the blank space.

[linuxhandbook@fedora ~]$ cat field_separated_values.txt 
blue fish
blue fish
blue fish
blue class
red fish
green fish
two class
two class

If you want to use the uniq command on the second column, you’ll have to skip the first field like this:

[linuxhandbook@fedora ~]$ uniq -f1 field_separated_values.txt  
blue fish
blue class
red fish
two class

As you can see, it takes both ‘red fish’ and ‘green fish’ as the same line because the first field (with colors) has been ignored. If you use the count option here, it will show you the count of the unique lines it has found:

[linuxhandbook@fedora ~]$ uniq -f1 -c field_separated_values.txt  
      3 blue fish
      1 blue class
      2 red fish
      2 two class

Why would you need that? I’ll give you a practical scenario. Many log files have the timestamp at the beginning of the lines. If you are looking to find only the unique lines in such a file, you can skip the first field with the timestamp with the -f option.

Similarly you can skip a specific number of characters.

[linuxhandbook@fedora ~]$ uniq -s 10 field_separated_values.txt 
blue    fish

Example 7: Use ‘-w’ to compare only N characters

The ‘-w’ option allows us to specify an exact number of characters to use in our comparison.

If you used the log file for the previous couple examples, that is fine. I wanted to make the comparison text a little simpler to limit confusion. If not, let’s pull it back up and see what happens when you use only the first for characters to find duplicates.

[linuxhandbook@fedora ~]$ uniq -w 4 sample_log_file.txt 
/usr/lib/gdm3/gdm-x-session[1443]: (II) No input driver specified, ignoring this device.
PackageKit: get-updates transaction /354_eebeebaa from uid 1000 finished with success after 1514ms
wpa_supplicant[898]: RRM: Ignoring radio measurement request: Not RRM network

All of the lines that begin ‘/usr’ are now identified as the “same” from the perspective of the program.

This might prove helpful if you are looking for a particular log event.

Bonus: Avoid incomplete matches using ‘sort’ and ‘uniq’ at the same time.

You can run these commands separately to achieve the same effect, but if you have never used a pipe (the | character) in Linux this is a great way to learn about them.

You can use pipes to combine different commands to save us keystrokes and improve our workflow. The commands will be performed in the order that they are typed.

This is the sample input I am going to use:

[linuxhandbook@fedora ~]$ cat apple.txt 
apple
orange
orange
apple
apple
banana
apple
banana

Now, let’s sort the input file and then use the uniq command on it. The sort command rearranges the text so that all items are in adjacent order first. Then when the uniq command is run, it finds only 3 unique lines in the file.

[linuxhandbook@fedora ~]$ sort apple.txt | uniq 
apple
banana
orange

If you reverse the order, things will change. Performing the ‘uniq’ command first will identify only the adjacent duplicates, and then they will each be sorted into alphabetical order using the ‘sort’ command.

[linuxhandbook@fedora ~]$ uniq apple.txt | sort
apple
apple
apple
banana
banana
orange

Pipes allow us to run multiple commands at the same, but it is important to consider their order.

Note that the contents of the file remain unchanged, just as they would when running the commands individually. Piping the two commands together also keeps the results in the system’s “memory”. If you run them separately, you could not get these results unless you made a new file and used it to overwrite the original content before running your second command.

Conclusion

As you might imagine, this is an important concept in learning bash. These particular commands (sort and uniq) are often used together to filter information from large files like our pseudo-log quickly.