Today I ran part of the way to work. It was a cold, beautiful winter morning in Stockholm.
Sometimes, I solve programming problems by coding on paper. A few days ago, it looked like this:
I’ve started working on a code editor that is a mix of a text editor and a structured editor. It is all text, but parsers and pretty printers allow you to work with a tree structure and not think too much about syntax. It is a work in progress. Code is here.
We got some more snow. I like running in the winter. Especially when there is snow and the sun is shining.
I needed to submit some heic photos to a service that only accepted jpg. I didn’t know about the heic format, but a little searching gave me a solution:
$ heif-convert
bash: heif-convert: command not found...
Install package 'libheif' to provide command 'heif-convert'? [N/y] y
...
$ find . -iname '*.heic' -exec heif-convert -q 100 {} {}.jpg \;
Today was the first day of snow this season. Not much. I’m looking forward to many more runs on a white trail.
I was researching how to run Black (and possibly other formatters) from Vim and found Ergonomic mappings for code formatting in Vim. It was very helpful.
How would you improve this code?
def update_r_users(service)
r_users = []
for user in service.get_all_users():
if "r" in user:
r_users.append(user)
service.set_users_in_group("users_with_r_in_name", r_users)
Find out what I did it in my latest newsletter.
Today I learned about the Rison data serialization format. I wrote a function to convert a Python value to Rison format. It was an elegant recursive function with partial support for the format.
I’ve used testing without mocks quite extensively now. I’ve also used it in a work project for more than a year. My experience is that it’s the best testing strategy that I’ve ever used. I’ve never felt more confident that my code works. I refactor code without fear of it breaking. It’s so good.
It’s getting dark. It gives variation to the running.
Various things have kept me from running for a while. Today I had enough. I just had to go for a short run. It was the first run with warmer clothes. The weather was nice. I reclaimed some energy.
Pull requests discourage experiments because changes can only propagate after approval. The idea behind PRs is to only approve “good” changes.
First, the learning opportunities of mistakes are gone. Second, you might loose interest in experimenting because you are afraid of making mistakes.
Today I just needed to run. I had not run since I hurt my achilles tendon almost a month ago. I wanted to see if it still hurt. I felt something, but not too much. I think I still need to take it easy with running, but man it felt good moving again.
If you want to know how to implement a Bash-like shell, with support for redirects, in only 31 lines of Python, you should check out my latest blog post Bash Redirects Explained.
Do you know the difference between the following Bash commands?
program 2>&1 >/tmp/log.txt
program >/tmp/log.txt 2>&1
If not, you might be interested in my latest blog post Bash Redirects Explained.
Bash Redirects Explained
I thought I knew how Bash redirects worked.
If I wanted to redirect the output of a command to a file, I’d type this:
program > /tmp/log.txt
If I wanted to pipe both stdout and stderr to a text editor for further processing, I’d type this:
program 2>&1 | vim -
I knew that 2>&1
meant redirect stderr to stdout making it appear on stdout
as well.
I knew certain patterns for certain situations. But when I encountered situations where I had not learned a pattern, I was lost. For example, I could not explain the difference between
program 2>&1 >/tmp/log.txt
and
program >/tmp/log.txt 2>&1
And I got scared when I saw something like this:
program < input.txt > output.txt 2>&1
Have you also been there? What did you do?
I would search the Internet for a pattern that matched the use case, or just try different alternatives and notice how they behaved.
I did this until one day when I learned a mental model for how Bash redirects work. Now I no longer need to rely on patterns. I can easily parse any situation and use any combination of redirects for my purposes.
The rest of this article explains this mental model.
The Standard Streams
A process has three standard streams attached to it:
- stdin (0)
- stdout (1)
- stderr (2)
When we start a program from the terminal, Bash sets up the standard streams as follows:
- stdin: terminal/keyboard
- stdout: terminal
- stderr: terminal
What redirects do is to modify what the standard streams point to before the program starts executing.
<
means modify stdin.>
means modify stdout.2>
means modify stderr.
That is the mental model: redirects modify standard streams before program execution.
Let’s evaluate a few examples using this mental model to see how it works.
Logcat Utility
To be able to show what happens in different examples, we have a utility
program, logcat.py
, that makes use of all three streams. It reads text from
stdin, logs the arguments and the length of the text to stderr, and writes the
text to stdout. It looks like this:
#!/usr/bin/env python
import sys
text = sys.stdin.read()
sys.stderr.write(f"Args: {(sys.argv[1:])}\n")
sys.stderr.write(f"Read {len(text)} characters.\n")
sys.stdout.write(text)
Example: No Redirect
Let’s start with an example without redirects to see the operation of
logcat.py
:
$ ./logcat.py ignored arguments
Before logcat.py
starts executing, Bash sets up the standard streams as
follows:
- stdin: terminal/keyboard
- stdout: terminal
- stderr: terminal
When execution starts, logcat.py
waits for input. If we type hello
in the
terminal (followed by a return and ctrl+d), the following is printed to the
terminal:
Args: ['ignored', 'arguments']
Read 6 characters.
hello
We can see that it read our input from the terminal/keyboard and wrote the log messages along with our input to the terminal as well.
Example: Redirect Stdin
Now let’s modify stdin to instead of the terminal/keyboard be the logcat.py
source code:
$ ./logcat.py ignored arguments <logcat.py
This instructs Bash to modify stdin to point to the file logcat.py
.
Before logcat.py
starts executing, Bash sets up the standard streams as
follows:
- stdin:
logcat.py
(opened in read mode) - stdout: terminal
- stderr: terminal
When execution starts, the following is printed to the terminal:
Args: ['ignored', 'arguments']
Read 182 characters.
#!/usr/bin/env python
import sys
text = sys.stdin.read()
sys.stderr.write(f"Args: {(sys.argv[1:])}\n")
sys.stderr.write(f"Read {len(text)} characters.\n")
sys.stdout.write(text)
We can see that the redirect operation is stripped from the arguments. Only
Bash sees it and does not pass it along to the program. Furthermore we can see
that the logcat.py
source code is printed to the terminal.
Example: Redirect Stdin and Stdout
Let’s say we’re only interested in the log messages, and want to throw away stdout:
$ ./logcat.py ignored arguments <logcat.py >/dev/null
This instructs Bash to modify stdin to point to the file logcat.py
and to
modify stdout to point to the file /dev/null
.
Before logcat.py
starts executing, Bash sets up the standard streams as
follows:
- stdin:
logcat.py
(opened in read mode) - stdout:
/dev/null
(opened in write mode) - stderr: terminal
When execution starts, the following is printed to the terminal:
Args: ['ignored', 'arguments']
Read 182 characters.
We can see that the redirect operations are all stripped from the arguments and
the source code has been written to /dev/null
and is thus not shown in the
terminal.
Extended Mental Model
Let’s extended the mental model to clarify how Bash operates.
When Bash parses a command, it divides it into two parts: the arguments and the redirects. Before it starts executing the program with the arguments, it goes through the redirects, in order, and configures the standard streams before execution.
Example: Redirect All Streams
Let’s see how we can interpret a more complex command using the extended mental model:
$ ./logcat.py <logcat.py is the >out.txt best 2>&1 thing
If we split this into arguments and redirects, we get this:
- Arguments:
./logcat.py
,is
,the
,best
,thing
- Redirects:
<logcat.py
,>out.txt
,2>&1
Now, let’s evaluate the redirects in order. The state of the standard streams at start is this:
- stdin: terminal/keyboard
- stdout: terminal
- stderr: terminal
Then we evaluate <logcat.py
and get this:
- stdin:
logcat.py
(opened in read mode) - stdout: terminal
- stderr: terminal
Then we evaluate >out.txt
and get this:
- stdin:
logcat.py
(opened in read mode) - stdout:
out.txt
(opened in write mode) - stderr: terminal
Then we evaluate 2>&1
, which means modify stderr (2>
) to be whatever
stdout points to (&1
), and get this:
- stdin:
logcat.py
(opened in read mode) - stdout:
out.txt
(opened in write mode) - stderr:
out.txt
(opened in write mode)
After the standard streams have been set up, execution of ./logcat.py is the best thing
starts. Nothing appears on the terminal since all output has been
redirected to out.txt
:
$ cat out.txt
Args: ['is', 'the', 'best', 'thing']
Read 182 characters.
#!/usr/bin/env python
import sys
text = sys.stdin.read()
sys.stderr.write(f"Args: {(sys.argv[1:])}\n")
sys.stderr.write(f"Read {len(text)} characters.\n")
sys.stdout.write(text)
Mini Shell
I created a mini version of a shell to demonstrate how straight forward it is to implement redirects with POSIX system calls. It works exactly as the extended mental model, and because it is running software, it fills in some more details of the model. I would guess that Bash does something similar even though I haven’t read its source code.
First off, here is a demo that shows how the mini shell can replicate the complex example from above:
$ ./minishell.py
~~?~~> ./logcat.py <logcat.py is the >out.txt best 2>&1 thing
~~0~~> cat out.txt
Args: ['is', 'the', 'best', 'thing']
Read 182 characters.
#!/usr/bin/env python
import sys
text = sys.stdin.read()
sys.stderr.write(f"Args: {(sys.argv[1:])}\n")
sys.stderr.write(f"Read {len(text)} characters.\n")
sys.stdout.write(text)
And here is the implementation in only 31 lines of Python:
#!/usr/bin/env python
import os
import sys
STDIN = 0
STDOUT = 1
STDERR = 2
statuscode = "?"
while True:
sys.stdout.write(f"~~{statuscode}~~> ")
sys.stdout.flush()
command = input()
pid = os.fork()
if pid == 0:
args = []
for part in command.split(" "):
if part.startswith("<"):
os.dup2(os.open(part[1:], os.O_RDONLY), STDIN)
elif part.startswith(">"):
os.dup2(os.open(part[1:], os.O_WRONLY|os.O_CREAT, 0o644), STDOUT)
elif part == "2>&1":
os.dup2(STDOUT, STDERR)
elif part.startswith("2>"):
os.dup2(os.open(part[2:], os.O_WRONLY|os.O_CREAT, 0o644), STDERR)
else:
args.append(part)
os.execvp(args[0], args)
else:
_, statuscode = os.waitpid(pid, 0)
To understand how this works, you need some knowledge of the POSIX system calls
fork
, waitpid
, open
, dup2
, and execvp
. But even if you don’t
understand the specifics, I think this codified model can help in understanding
how Bash operates. Let’s look at an example.
Example: Duplicated Files
Let’s see if we can explain the difference between the following commands using the mini shell for the model:
$ ./logcat.py <logcat.py >out.txt 2>out.txt
$ ./logcat.py <logcat.py >out.txt 2>&1
At a first glance, it looks like both commands redirect both stdout and stderr
to the out.txt
file. But if we evaluate it like mini shell does, we see that
the first example will open the file twice (two calls to os.open
creating two
file handles), whereas the second example will open the file only once and then
duplicate the file handle for stderr.
When two file handles are created, writes to the two streams will attempt to write to the same location in the file and they will overwrite each other. Furthermore, buffering might alter in which order writes happen, so it is not clear what will actually end up in the file. So to make sure all output is captured in the file, the second example should be used where the file is only opened once.
Conclusion
There is still more to Bash redirects than what I have explained here. But this mental model (along with its extended versions) have helped me reason about Bash redirects. I hope it will do the same for you.