Skip to main content

File Manipulation Commands

paste Command Examples

Learn how to use the paste utility on practical examples to merge text files, and discover a couple of tricks and pitfalls of that command at the same time.

In a previous article, we talked about the cut command which can be used to extract columns from a CSV or tabular text data file.

The paste command does the exact opposite: it merges several input files to produce a new delimited text file from them. We are going to see how to effectively use the paste command in Linux and Unix.

7 Practical examples of paste command in Linux

If you prefer videos, you can watch this video explaining the same paste command examples discussed in this article.

1. Pasting columns

In its most basic use case, the paste command takes N input files and join them line by line on the output.

I used the bash printf command to format the output of the input files in the example below.

sh$ printf "%s\n" {a..e} | tee letters

sh$ printf "%s\n" {1..5} | tee digits

sh$ paste letters digits
a    1
b    2
c    3
d    4
e    5

But let’s leave now the theoretical explanations to work on a practical example. If you’ve downloaded the sample files used in the video above, you can see I have several data files corresponding to the various columns of a table:

sh$ head -3 *.csv
==> ACCOUNTLIB.csv <==

==> ACCOUNTNUM.csv <==

==> CREDIT.csv <==
<--- empty line
<--- empty line

==> DEBIT.csv <==

It is quite easy to produce a tab-delimited text file from those data:

sh$ paste *.csv | head -3
TIDE SCHEDULE    623477        00000001615,00
VAT BS/ENC    445452        00000000323,00

As you may see, when displayed on the console, the content of that tab-separated values file does not produce a perfectly formatted table. But this is by design: the paste command is not used to create fixed-width text files, but only delimited text files where one given character is assigned the role of being the field separator.

So, even if it is not obvious in the output above, there is actually one and only one tab character between each field. Let’s make that apparent by using the sed command:

sh$ paste *.csv | head -3 | sed -n l
TIDE SCHEDULE\t623477\t\t00000001615,00$
VAT BS/ENC\t445452\t\t00000000323,00$

Now, invisible characters are displayed unambiguously in the output. And you can see the tab characters displayed as \t. You may count them: there is always three tabs on every output line— one between each field. And when you see two of them in a row, that only means there was an empty field there. This is often the case in my particular example files since on each line, either the CREDIT or DEBIT field is set, but never both of them at the same time.

2. Changing the field delimiter

As we’ve seen it, the paste command uses the tab character as the default field separator (“delimiter”). Something we can change using the -d option. Let’s say I would like to use a semi-colon instead:

# The quotes around the ';' are used to prevent the
# shell to consider that semi-colon as being a command separator
sh$ paste -d ';' *.csv | head -3
TIDE SCHEDULE;623477;;00000001615,00
VAT BS/ENC;445452;;00000000323,00

No need to append the sed command at the end of the pipeline here since the separator we used is a printable character. Anyway, the result is the same: on a given row, each field is separated from its neighbor by using a one-character delimiter.

3. Transposing data using the serial mode

The examples above have one thing in common: the paste command reads all its input files in parallel, something that is required so it can merge them on a line-by-line basis in the output.

But the paste command can also operate in the so-called serial mode, enabled using the -s flag. As its name implies it, in the serial mode, the paste command will read the input files one after the other. The content of the first input file will be used to produce the first output line. Then the content of the second input file will be used to produce the second output line, and so on. That also means the output will have as many lines as there were files in the input.

More formally, the data taken from file N will appear as the Nth line in the output in serial mode, whereas it would appear as the Nth column in the default “parallel” mode. In mathematical terms, the table obtained in serial mode is the transpose of the table produced in the default mode (and vice versa).

To illustrate that, let’s consider a small subsample of our data:

sh$ head -5 ACCOUNTLIB.csv | tee ACCOUNTLIB.sample
sh$ head -5 ACCOUNTNUM.csv | tee ACCOUNTNUM.sample

In the default (“parallel”) mode, the input file’s data will serve as columns in the output, producing a two columns by five rows table:

sh$ paste *.sample
VAT BS/ENC    445452
PAYABLES    4356

But in serial mode, the input file’s data will appear as rows, producing now a five columns by two rows table:

sh$ paste -s *.sample
ACCOUNTNUM    623477    445452    4356    623372

4. Working with the standard input

Like many standard utilities, the paste command can use the standard input to read data. Either implicitly when there is no filename given as an argument, or explicitly by using the special - filename. Apparently, this isn’t that useful though:

# Here, the paste command is useless
head -5 ACCOUNTLIB.csv | paste

I encourage you to test it by yourself, but the following syntax should produce the same result— making once again the paste command useless in that case:

head -5 ACCOUNTLIB.csv | paste -

So, what could be the point of reading data from the standard input? Well, with the -s flag, things become a lot more interesting as we will see it now.

4.1. Joining lines of a file

As we’ve seen it a couple of paragraphs earlier, in the serial mode the paste command will write all lines of an input file on the same output line. This gives us a simple way to join all the lines read from the standard input into only one (potentially very long) output line:

sh$ head -5 ACCOUNTLIB.csv | paste -s -d':'

This is mostly the same thing you could do using the tr command, but with one difference though. Let’s use the diff utility to spot that:

sh$ diff <(head -5 ACCOUNTLIB.csv | paste -s -d':') \
         <(head -5 ACCOUNTLIB.csv | tr '\n' ':')
\ No newline at end of file

As reported by the diff utility, we can see the tr command has replaced every instance of the newline character by the given delimiter, including the very last one. On the other hand, the paste command kept the last newline character untouched. So depending if you need the delimiter after the very last field or not, you will use one command or the other.

4.2. Multi-column formatting of one input file

According to the Open Group specifications, “the standard input shall be read one line at a time” by the paste command. So, passing several occurrences of the - special file name as arguments to the paste command will result with as many consecutive lines of the input being written into the same output line:

sh$ seq 9 | paste - - -
1    2    3
4    5    6
7    8    9

To make things more clear, I encourage you to study the difference between the two commands below. In the first case, the paste command opens three times the same file, resulting in data duplication in the output. On the other hand, in the second case the ACCOUNTLIB file is opened only once (by the shell), but read three times for each line (by the paste command), resulting in the file content being displayed as three columns:

sh$ paste ACCOUNTLIB.csv ACCOUNTLIB.csv ACCOUNTLIB.csv | head -2

sh$ paste - - - < ACCOUNTLIB.csv | head -2

Given the behavior of the paste command when reading from the standard input, it is usually not advisable to use several - special file names in serial mode. In that case, the first occurrence would read the standard input until its end, and the subsequent occurrences of - would read from an already exhausted input stream— resulting in no more data being available:

# The following command will produce 3 lines of output.
# But the first one exhausted the standard input,
# so the remaining two lines are empty
sh$ seq 9 | paste -s - - -
1    2    3    4    5    6    7    8    9

5. Working with files of different length

The Open Group specifications for the paste utility are quite clear:

If an end-of-file condition is detected on one or more input files, but not all input files, paste shall behave as though empty lines were read from the files on which end-of-file was detected, unless the -s option is specified.

So, the behavior is what you may expect: missing data are replaced by “empty” content. To illustrate that behavior, let’s record a couple more transactions into our “database”. In order to keep the original files intact, we will work on a copy of our data though:

# Copy files
  cp ${f}.csv NEW${f}.csv

# Update the copy
sh$ cat - << EOF >> NEWACCOUNTNUM.csv

sh$ cat - << EOF >> NEWDEBIT.csv


sh$ cat - << EOF >> NEWCREDIT.csv


With those updates, we have now registered a new capital movement from account #1080 to account #4356. However, as you may have noticed it, I didn’t bother to update the ACCOUNTLIB file. This does not seem such a big issue because the paste command will replace the missing rows with empty data:

sh$ paste -d';' NEWACCOUNTNUM.csv \
                NEWACCOUNTLIB.csv \
                NEWDEBIT.csv \
                NEWCREDIT.csv | tail
613866;RENTAL COSTS;00000000018,00;
657991;MISCELLANEOUS CHARGES;00000000015,00;
445333;VAT BS/DEBIT;00000000003,00;
626510;LANDLINE TELEPHONE;00000000069,14;
445452;VAT BS/ENC;00000000013,83;
1080;;00000001207,35; # <-- the account label is missing here
4356;;;00000001207,35 # <-- the account label is missing here

But beware, the paste command can only match lines by their physical position: all it can tell is a file is “shorter” than another one. Not where the data are missing. So it always adds the blanks fields at the end of the output, something that can cause unexpected offsets in your data. Let’s make that obvious by adding yet another transaction:

sh$ cat << EOF >> NEWACCOUNTNUM.csv

sh$ cat << EOF >> NEWACCOUNTLIB.csv

sh$ cat << EOF >> NEWDEBIT.csv


sh$ cat << EOF >> NEWCREDIT.csv


This time, I was more rigorous since I properly updated both the account number (ACCOUNTNUM), and it’s corresponding label (ACCOUNTLIB) as well as the CREDIT and DEBIT data files. But since there were missing data in the previous record, the paste command is no longer able to keep the related fields on the same line:

sh$ paste -d';' NEWACCOUNTNUM.csv \
                NEWACCOUNTLIB.csv \
                NEWDEBIT.csv \
                NEWCREDIT.csv | tail
657991;MISCELLANEOUS CHARGES;00000000015,00;
445333;VAT BS/DEBIT;00000000003,00;
626510;LANDLINE TELEPHONE;00000000069,14;
445452;VAT BS/ENC;00000000013,83;
4356;WEB HOSTING;;00000001207,35

As you may see it, the account #4356 is reported with the label “WEB HOSTING” whereas, in reality, that latter should appear on the row corresponding to the account #3465.

As a conclusion, if you have to deal with missing data, instead of the paste command you should consider using the join utility since that latter will match rows based on their content, and not based on their position in the input file. That makes it much more suitable for “database” style applications. I’ve already published a video about the join command, but that should probably deserve an article of its own, so let us know if you are interested in that topic!

6. Cycling over delimiters

In the overwhelming majority of the use cases, you will provide only one character as the delimiter. This is what we have done until now. However, if you give several characters after the -d option, the paste command will cycle over them: the first character will be used as the first field delimiter on the row, the second character as the second field delimiter, and so on.

sh$ paste -d':+-' ACCOUNT*.csv CREDIT.csv DEBIT.csv | head -5
TIDE SCHEDULE:623477+-00000001615,00
VAT BS/ENC:445452+-00000000323,00
ACCOMODATION GUIDE:623372+-00000001333,00

Field delimiters can only appear between fields. Not at the end of a line. And you can’t insert more than one delimiters between two given fields. As a trick to overcome these limitations, you may use the /dev/null special file as an extra input where you need an additional separator:

# Display the opening bracket between the
# ACCOUNTLIB field and the ACCOUNTNUM field, and
# the closing bracket between the ACCOUNTNUM field
# and the empty `/dev/null` field:
sh$ paste  -d'()' \
           ACCOUNT*.csv /dev/null | head -5
VAT BS/ENC(445452)

Something you may even abuse:

sh$ paste -d'# is ' \
          - ACCOUNTNUM.csv - - - ACCOUNTLIB.csv < /dev/null | tail -5
#445333 is VAT BS/DEBIT
#4356 is PAYABLES
#445452 is VAT BS/ENC

However, no need to say if you reach that level of complexity, it might be a clue the paste utility was not necessarily the best tool for the job. Maybe worth considering, in that case, something else like sedor awk command.

But what if the list contains fewer delimiters than needed to display a row in the output? Interestingly, the paste command will “cycle” over them. So, once the list is exhausted, the paste command will jump back to the first delimiter, something that probably opens the door to some creative usage. As of myself, I was not able to make anything really useful with that feature given my data. So you will have to be satisfied with the following a bit far-fetched example. But it will not be a complete waste your time since that was a good occasion to mention you have to double the backslash (\\) when you want to use it as a delimiter:

sh$ paste -d'/\\' \
          - ACCOUNT*.csv CREDIT.csv DEBIT.csv - < /dev/null | tail -5
/MISCELLANEOUS CHARGES\657991/\00000000015,00/
/VAT BS/DEBIT\445333/\00000000003,00/
/LANDLINE TELEPHONE\626510/\00000000069,14/
/VAT BS/ENC\445452/\00000000013,83/

7. Multibyte character delimiters

Like most of the standard Unix utilities, the paste command is born at a time one character was equivalent to one byte. But this is no longer the case: today, many systems are using the UTF-8 variable length encoding by default. In UTF-8, a character can be represented by 1, 2, 3 or 4 bytes. That allows us to mix in the same text file the whole variety of human writing— as well as tons of symbols and emojis— while maintaining ascending compatibility with the legacy one-byte US-ASCII character encoding.

Let’s say for example I would like to use the WHITE DIAMOND (◇ U+25C7) as my field separator. In UTF-8, this character is encoded using the three bytes e2 97 87. This character might be hard to obtain from the keyboard, so if you want to try that by yourself, I suggest you copy-paste it from the code block below:

# The sed part is only used as a little trick to add the
# row number as the first field in the output
sh$ sed -n = ACCOUNTNUM.csv |
       paste -d'◇' - ACCOUNT*.csv | tail -5
27�VAT BS/DEBIT�445333
30�VAT BS/ENC�445452

Quite deceptive, isn’t it? Instead of the expected white diamond, I have that “question mark” symbol (at least, this is how it is displayed on my system). It is not a “random” character though. It is the Unicode replacement character used “to indicate problems when a system is unable to render a stream of data to a correct symbol”. So, what has gone wrong?

Once again, examining the raw binary content of the output will give us some clues:

sh$ sed -n = ACCOUNTNUM.csv | paste -d'◇' - ACCOUNT*.csv | tail -5 | hexdump -C
00000000  32 36 e2 4d 49 53 43 45  4c 4c 41 4e 45 4f 55 53  |26.MISCELLANEOUS|
00000010  20 43 48 41 52 47 45 53  97 36 35 37 39 39 31 0a  | CHARGES.657991.|
00000020  32 37 e2 56 41 54 20 42  53 2f 44 45 42 49 54 97  |27.VAT BS/DEBIT.|
00000030  34 34 35 33 33 33 0a 32  38 e2 50 41 59 41 42 4c  |445333.28.PAYABL|
00000040  45 53 97 34 33 35 36 0a  32 39 e2 4c 41 4e 44 4c  |ES.4356.29.LANDL|
00000050  49 4e 45 20 54 45 4c 45  50 48 4f 4e 45 97 36 32  |INE TELEPHONE.62|
00000060  36 35 31 30 0a 33 30 e2  56 41 54 20 42 53 2f 45  |6510.30.VAT BS/E|
00000070  4e 43 97 34 34 35 34 35  32 0a                    |NC.445452.|

We already had the opportunity of practicing with hex dumps above, so your eyes should now be sharpened enough to spot the field delimiters in the byte stream. By looking closely, you will see the field separator after the line number is the byte e2. But if you continue your investigations, you will notice the second field separator is 97. Not only the paste command didn’t output the character I wanted, but it also didn’t use everywhere the same byte as the separator?!?

Wait a minute: doesn’t that remind you something we already talk about? And those two bytes e2 97, aren’t they somewhat familiar to you? Well, familiar is probably a little bit too much, but if you jump back a few paragraphs you might find them mentioned somewhere…

So did you find where it was? Previously, I said in UTF-8, the white diamond is encoded as the three bytes e2 97 87. And indeed, the paste command has considered that sequence not as a whole three-byte character, but as three independent bytes and so, it used the first byte as the first field separator, then the second byte as the second field separator.

I let you re-run that experiment by adding one more column in the input data; you should see the third field separator to be 87 — the third byte of the UTF-8 representation for the white diamond.

Ok, that’s the explanation: the paste command only accepts one-byte “characters” as the separator. And that’s particularly annoying, since, once again, I don’t know any way to overcome that limitation except by using the /dev/null trick I already gave to you:

sh$ sed -n = ACCOUNTNUM.csv |
    paste  -d'◇' \
           - /dev/null /dev/null \
           ACCOUNTLIB.csv /dev/null /dev/null \
           ACCOUNTNUM.csv | tail -5
27◇VAT BS/DEBIT◇445333
30◇VAT BS/ENC◇445452

If you read my previous article about the cut command, you may remember I had similar issues with the GNU implementation of that tool. But I noticed at that time the OpenBSD implementation was correctly taking into account the LC_CTYPE locale setting to identify multibyte characters. Out of curiosity, I’ve tested the paste command on OpenBSD too. Alas, with the same result as on my Debian box this time, despite the specifications for the paste utility mentioning the LC_CTYPE environment variable as determining ” the locale for the interpretation of sequences of bytes of text data as characters (for example, single-byte as opposed to multi-byte characters in arguments and input files)”. From my experience, all the major implementations of the paste utility currently ignore multi-byte characters in the delimiter list and assume one-byte separators. But I will not claim having tested that for the whole variety of the *nix platforms. So if I missed something here, don’t hesitate to use the comment section to correct me!

Bonus Tip: Avoiding the \0 pitfall

For historical reasons:

The commands:
paste -d “\0” …​ paste -d “” …​
are not necessarily equivalent; the latter is not specified by this volume of IEEE Std 1003.1-2001 and may result in an error. The construct ‘\0’ is used to mean “no separator” because historical versions of paste did not follow the syntax guidelines, and the command:
paste -d”” …​
could not be handled properly by getopt().

So, the portable way of pasting files without using a delimiter is by specifying the \0 delimiter. This is somewhat counterintuitive since, for many commands, \0 means the NUL character–a character encoded as a byte made only of zeros that should not clash with any text content.

You might find the NUL character an useful separator especially when your data may contain arbitrary characters (like when working with file names or user-provided data). Unfortunately, I’m not aware of any way to use the NUL character as the field delimiter with the paste command. But maybe do you know how to do that? If that’s the case, I would be more than happy to read your solution in the command section.

On the other hand, the paste implementation part of the GNU Coreutils has the non-standard -z option to switch from the newline to the NUL character for the line separator. But in that case, the NUL character will be used as line separator both for the input and output. So, to test that feature, we need first a zero-terminated version of our input files:

sh$ tr '\n' '\0' < ACCOUNTLIB.csv >
sh$ tr '\n' '\0' < ACCOUNTNUM.csv >

To see what has changed in the process, we can use the hexdump utility to examine the raw binary content of the files:

sh$ hexdump -C ACCOUNTLIB.csv | head -5
00000000  41 43 43 4f 55 4e 54 4c  49 42 0a 54 49 44 45 20  |ACCOUNTLIB.TIDE |
00000010  53 43 48 45 44 55 4c 45  0a 56 41 54 20 42 53 2f  |SCHEDULE.VAT BS/|
00000020  45 4e 43 0a 50 41 59 41  42 4c 45 53 0a 41 43 43  |ENC.PAYABLES.ACC|
00000030  4f 4d 4f 44 41 54 49 4f  4e 20 47 55 49 44 45 0a  |OMODATION GUIDE.|
00000040  56 41 54 20 42 53 2f 45  4e 43 0a 50 41 59 41 42  |VAT BS/ENC.PAYAB|
sh$ hexdump -C | head -5
00000000  41 43 43 4f 55 4e 54 4c  49 42 00 54 49 44 45 20  |ACCOUNTLIB.TIDE |
00000010  53 43 48 45 44 55 4c 45  00 56 41 54 20 42 53 2f  |SCHEDULE.VAT BS/|
00000020  45 4e 43 00 50 41 59 41  42 4c 45 53 00 41 43 43  |ENC.PAYABLES.ACC|
00000030  4f 4d 4f 44 41 54 49 4f  4e 20 47 55 49 44 45 00  |OMODATION GUIDE.|
00000040  56 41 54 20 42 53 2f 45  4e 43 00 50 41 59 41 42  |VAT BS/ENC.PAYAB|

I will let you compare by yourself the two hex dumps above to identify the difference between “.zero” files and the original text files. As a hint, I can tell you a newline is encoded as the 0a byte.

Hopefully, you took the time needed to locate the NUL character in the “.zero” input files. Anyway, we have now a zero-terminated version of the input files, so we can use the -z option of the paste command to handle those data, producing in the output as well a zero-terminated result:

# Hint: in the hexadecimal dump:
#  the byte 00 is the NUL character
#  the byte 09 is the TAB character
# Look at any ASCII table to find the mapping
# for the letters or other symbols
# (
sh$ paste -z *.zero | hexdump -C | head -5
00000000  41 43 43 4f 55 4e 54 4c  49 42 09 41 43 43 4f 55  |ACCOUNTLIB.ACCOU|
00000010  4e 54 4e 55 4d 00 54 49  44 45 20 53 43 48 45 44  |NTNUM.TIDE SCHED|
00000020  55 4c 45 09 36 32 33 34  37 37 00 56 41 54 20 42  |ULE.623477.VAT B|
00000030  53 2f 45 4e 43 09 34 34  35 34 35 32 00 50 41 59  |S/ENC.445452.PAY|
00000040  41 42 4c 45 53 09 34 33  35 36 00 41 43 43 4f 4d  |ABLES.4356.ACCOM|

# Using the `tr` utility, we can map \0 to newline
# in order to display the output on the console:
sh$ paste -z *.zero | tr '\0' '\n' | head -3
VAT BS/ENC    445452

Since my input files do not contain embedded newlines in the data, the -z option is of limited usefulness here. But based on the explanations above, I let you try to understand why the following example is working “as expected”. To fully understand that you probably need to download the sample files and examine them at byte level using the hexdump utility as we did above:

# Somehow, the head utility seems to be confused
# by the ACCOUNTS file content (I wonder why?;)

==> ACCOUNTS <==
# The output is quite satisfactory, putting the account number
# after the account name and keeping things surprisingly nicely formatted:
sh$ paste -z -d':' CATEGORIES ACCOUNTS | tr '\0' '\n' | head -5


What’s more?

The paste command produces only delimited text output. But as illustrated at the end of the introductory video, if your system does support the BSD column utility, you can use it to obtain nicely formatted tables by converting the paste command output to a fixed-width text format. But that will be the subject of an upcoming article. So stay tuned, and as always, don’t forget to share that article on your favorite websites and social media!

Sylvain Leroux
Website France