What is tar? Tar stands for “tape archive” and refers to a practice from the earlier days of computing when data was backed up to tapes. Despite the nostalgic origin of the name, tar is very powerful and uses modern technologies to archive and compress files.
Refresh the basics: Archive vs Compression
The tar command is important for Linux users to understand. Before we get too deep into the subject, let’s start things off with a little clarification.
- Archiving – The act of storing multiple files as one file.
- Compression – The act of shrinking a larger file or files.
Tar is an archiving tool. It creates a single file out of multiple files. This saves network bandwidth, time and processing power while transferring the files. A single file of 100 MB takes a lot less than transferring 100 files of 1 MB because of the file overhead.
This is why you’ll often find software available in a ‘tarball‘. Tarball is the common term used for a tar file.
While tar itself cannot compress files, you can use one of the common compression algorithms to compress the files while creating a tarball. I’ll show you how to do that later in this basic tar tutorial.
Common tar files
Here are some of the common “tar files” you’ll find:
.Tar: This is a tarball file. It is only an archive and no compression is performed.
.Tar.Gz or .tgz: This is the extension of an archive that has been compressed with Gzip.
.Tar.Bz2 or .tbz: This is the extension of an archive that has been compressed with Bz2. This is a relatively new technology. It features a higher ratio of compression, but that increased shrinking power means it takes a bit longer to complete.
.Tar.xz or .txz, etc.: Tar also features built in support for xz, lzip, and more. These tools primarily use the same compression algorithm, LZMA. The popular 7z that has become fairly common in Windows environments also uses this algorithm. Further differences in the files result from structure and metadata. I won’t go into these details in our examples, but I wanted to mention them.
It’s important to remember that extensions are not necessary on Linux and other Unix-based systems. Your Unix system can typically identify files by their headers regardless of extension, but using the common naming scheme can help avoid confusion. You may use the file command in Linux to know the type of a file.
Tar command examples
For this article, I want to demonstrate some of the common methods for archiving and compressing files using tar.
I have assembled some files of different types in my Documents folder. There are stock images and some text files. We will look at how the file size is changed by the compression through several examples.
Here is a complete list of the documents that will be used.
[email protected]:~/Documents$ ll total 7404 drwxr-xr-x 2 christopher christopher 4096 May 5 10:55 ./ drwxr-xr-x 17 christopher christopher 4096 May 5 10:59 ../ -rw-r--r-- 1 christopher christopher 5094932 Apr 30 23:54 aerial-view-of-bushes-on-sand-field-3876435.jpg -rw-r--r-- 1 christopher christopher 13268 Apr 30 23:57 lorem1.txt -rw-r--r-- 1 christopher christopher 13268 Apr 30 23:57 lorem2.txt -rw-r--r-- 1 christopher christopher 13268 Apr 30 23:57 lorem3.txt -rw-r--r-- 1 christopher christopher 13268 Apr 30 23:56 lorem.txt -rw-r--r-- 1 christopher christopher 2411919 Apr 30 23:54 mountain-range-2397645.jpg
I’ve already touched on the capabilities of tar. It is a powerful tool with a lot of options. The various options and file types may make the command appear more complicated than it really is. As always, my goal is to demystify the command line. So let’s break things down into smaller pieces before combining several options in the examples.
Here is a table of commonly used options. Keep in mind, this is just the beginning. I recommend going through the help documentation yourself to find even more possibilities once you feel comfortable.
|-c,||–create||create a new archive|
|-d,||–diff, –compare||find differences between archive and file system|
|-r,||–append||append files to the end of an archive|
|-t,||–list||list the contents of an archive|
|-u,||–update||only append files newer than copy in archive|
|-x,||–extract, –get||extract files from an archive|
|-j,||–bzip2||filter the archive through bzip2|
|-z,||–gzip, –gunzip, –ungzip||filter the archive through gzip|
1) Create a tarball
Earlier I mentioned the common file types associated with the tar command. This is perhaps the most basic.
tar cvf output_tarball.tar source_directory
Compression will not be applied so the file will occupy at least as much space as the files in the documents folder.
[email protected]:~$ tar cvf doc.tar ~/Documents
tar cvf (create, verbose, file-archive): it creates a new tar file named doc.tar from all the files in ~/Documents.
[email protected]:~$ ll doc.tar -rw-r--r-- 1 christopher christopher 7567360 May 5 11:00 doc.tar
Do note that the tarball will have the same directory structure as your source directory of the tarball.
2) Create a gzip tarball
Let’s try gzip compression while creating the tarball.
[email protected]:~$ tar cvzf doc.tar.gz ~/Documents
tar cvzf (create, verbose, g-zip, file-archive): it creates a new tar file named doc.tar.gz from all the files in ~/Documents. It uses gzip compression (with option z) while creating the tarball.
[email protected]:~$ ll doc.tar.gz -rw-r--r-- 1 christopher christopher 7512498 May 5 11:01 doc.tar.gz
As you can see, the gzipped tarball size is 54862 bytes (53MB) less that a normal, uncompressed tar ball.
Pay attention while using hyphen – with tar options
Usually, when you use options with a Linux command, you add hyphen (-) before the options.
The hyphen before options is not mandatory and is best avoided. This is why I haven’t used it in the examples.
If you use hyphen before the options, you should always keep the f at the end of the options. If you use tar -cvfz, the z becomes an argument for option z. And then you’ll see an error like this:
tar: doc.tar.gz: Cannot stat: No such file or directory
This is why it is a good practice to use the option f at the end of all other options so that even if you use hyphen out of habit, it won’t create a problem.
3) Create a bz2 tarball
Say, you want to create a bz2 tarball. The steps are same as the previous one. You just need to change the option z (gzip) to j (bz2). Refer to the option table I had mentioned earlier.
[email protected]:~$ tar cvjf doc.tar.bz2 ~/Documents
tar cvfj (create, verbose, bz2 type, file-archive).
[email protected]:~$ ll doc.tar.bz2 -rw-r--r-- 1 christopher christopher 7479782 May 5 11:04 doc.tar.bz2
Notice the size? It is even less than the gzipped tarball.
4) List the contents of a tarball
You can use the
-t option (instead of -c) to view the contents of an archive file. This works the same whether the file is compressed or not. It will list the actual file size, not the compressed size.
[email protected]:~$ tar tvf doc.tar drwxr-xr-x christopher/christopher 0 2020-05-05 10:55 home/christopher/Documents/ -rw-r--r-- christopher/christopher 5094932 2020-04-30 23:54 home/christopher/Documents/aerial-view-of-bushes-on-sand-field-3876435.jpg -rw-r--r-- christopher/christopher 13268 2020-04-30 23:57 home/christopher/Documents/lorem1.txt -rw-r--r-- christopher/christopher 13268 2020-04-30 23:56 home/christopher/Documents/lorem.txt -rw-r--r-- christopher/christopher 2411919 2020-04-30 23:54 home/christopher/Documents/mountain-range-2397645.jpg -rw-r--r-- christopher/christopher 13268 2020-04-30 23:57 home/christopher/Documents/lorem3.txt -rw-r--r-- christopher/christopher 13268 2020-04-30 23:57 home/christopher/Documents/lorem2.txt
5) Add more files to a tarball
You can append files to a tarball archive using
-r. You cannot add files to a compressed archive without extracting them first using the tar command.
You can also append using the
-u option for update. This option is supposed to only add the new files according to help docs, but in my practice, it worked the same as append, adding new copies of all the files.
[email protected]:~$ tar rvf doc.tar ~/Documents/
6) Extract a tarball
Now that you have seen how the different types of compression affect the overall file size, let’s look at extracting those files.
[email protected]:~$ cd docs [email protected]:~/docs$ tar xvf ~/doc.tar.gz
I changed into a new directory called docs. Then I used tar xvf (extract, verbose, file-archive) to unpack the contents here.
[email protected]:~/docs/home/christopher/Documents$ ls aerial-view-of-bushes-on-sand-field-3876435.jpg lorem1.txt lorem2.txt lorem3.txt lorem.txt mountain-range-2397645.jpg
It’s important to note that tar retains file structure so when I extract the files, they are in /home/christopher/Documents. To avoid this, you can switch to the desired directory (~/Documents) and copy all files using the * wildcard instead of the directory structure.
7) Extract the tarball to a specific directory
By default, the content of a tarball are extracted in the current directory. That’s not always desirable.
You can extract a tarball to a specific directory in the following manner:
tar xvf tar_file -C destination_directory
The destination directory must exist so make sure to use mkdir command to create one beforehand.
Here’s what you need to keep in mind while using tar in Linux:
It is almost always used in tar cf or tar xvf format. Remember this:
- c stands for create: You use it for creating a tarball
- x stands for extract: You use it for extracting a tarball
- f stands for file: You use it for tar file name (for both creating or extracting). Try to use it at the end of the options.
- v stands for verbose: It’s optional but it shows what’s happening with the command.
Of course, you cannot use both c and x options in the same tar command.
Did you enjoy our guide to the tar command? I hope all of these tips taught you something new.
If you like this guide, please share it on social media. If you have any comments or questions, leave them below.
If you have any suggestions for topics you’d like to see covered, feel free to leave those as well. Thanks for reading.