In a previous article, I showed the basic usage of Sed, the stream editor, on a practical use case. Today, be prepared to gain more insight about Sed as we will take an in-depth tour of the sed execution model.
This will be also an opportunity to make an exhaustive review of all Sed commands and to dive into their details and subtleties.
So, if you are ready, launch a terminal, download the test files and sit comfortably before your keyboard: we will start our exploration right now!
A little bit of theory on Sed
A first look at the sed execution model
To truly understand Sed you must first understand the tool execution model.
When processing data, Sed reads one line of input at a time and stores it into the so-called pattern space. All Sed’s transformations apply to the pattern space. Transformations are described by one-letter commands provided on the command line or in an external Sed script file. Most Sed commands can be preceded by an address, or an address range, to limit their scope.
By default, Sed prints the content of the pattern space at the end of each processing cycle, that is, just before overwriting the pattern space with the next line of input. We can summarize that model like that:
- Try to read the next input line into the pattern space
If the read was successful:
- Apply in the script order all commands whose address matches the current input line
- If sed was not launched in quiet mode (
-n) print the content of the (potentially modified) pattern space
- got back to 1.
Since the content of the pattern space is lost after each line is processed, it is not suitable for long-term storage. For that purpose, Sed has a second buffer, the hold space. Sed never clears, puts or gets data from the hold space unless you explicitly request it. We will investigate that more in depth later when studying the exchange, get and hold commands.
The Sed abstract machine
The model explained above is what you will see described in many Sed tutorials. Indeed, it is correct enough to understand the most basic Sed programs. But when you start digging into more advanced commands, you will see it is not sufficient. So let’s try to be a little bit more formal now.
- three buffers to store arbitrary length text. Yes: three! In the basic execution model we talked about the pattern- and hold-space, but Sed has a third buffer: the append queue. From the Sed script perspective, it is a write-only buffer that Sed will flush automatically at predefined moments of its execution (broadly speaking before reading a new line from the input, or just before quitting).
- Sed also maintains two registers: the line counter (LC) which holds the number of lines read from the input, and the program counter (PC) which always hold the index (“position” in the script) of the next command to execute. Sed automatically increments the PC as part of its main loop. But using specific commands, a script can also directly modify the PC to skip or repeat parts of the program. This is how loops or conditional statements can be implemented with Sed. More on that in the dedicated branches section below.
- Finally two flags can modify the behavior of certain Sed commands: the auto-print flag (AP) the substitution flag (SF). When the auto-print flag is set, Sed will automatically print the content of the pattern space before overwriting it (notably before reading a new line of input but not only). When the auto-print flag is clear (“not set”), Sed will never print the content of the pattern space without an explicit command in the script. You can clear the auto-print flag by running Sed in “quiet mode” (using the
-ncommand line option or by using the special comment
#non the very first line or the script). The “substitution flag” is set by the substitution command (the
scommand) when both its address and search pattern match the content of the pattern space. The substitution flag is cleared at the start of each new cycle, or when a new line is read from input, or after a conditional branch is taken. Here again, we will revisit that topic in details in the branches section.
In addition, Sed maintains the list of commands having entered their address range (more on that of the range addresses section) as well as a couple of file handles to read and write data. You will find some more information on that in the read and write command description.