Bash

    Bash Redirects Explained

    I thought I knew how Bash redirects worked.

    If I wanted to redirect the output of a command to a file, I’d type this:

    program > /tmp/log.txt
    

    If I wanted to pipe both stdout and stderr to a text editor for further processing, I’d type this:

    program 2>&1 | vim -
    

    I knew that 2>&1 meant redirect stderr to stdout making it appear on stdout as well.

    I knew certain patterns for certain situations. But when I encountered situations where I had not learned a pattern, I was lost. For example, I could not explain the difference between

    program 2>&1 >/tmp/log.txt
    

    and

    program >/tmp/log.txt 2>&1
    

    And I got scared when I saw something like this:

    program < input.txt > output.txt 2>&1
    

    Have you also been there? What did you do?

    I would search the Internet for a pattern that matched the use case, or just try different alternatives and notice how they behaved.

    I did this until one day when I learned a mental model for how Bash redirects work. Now I no longer need to rely on patterns. I can easily parse any situation and use any combination of redirects for my purposes.

    The rest of this article explains this mental model.

    The Standard Streams

    A process has three standard streams attached to it:

    • stdin (0)
    • stdout (1)
    • stderr (2)

    Diagram of the three streams of a process.

    When we start a program from the terminal, Bash sets up the standard streams as follows:

    • stdin: terminal/keyboard
    • stdout: terminal
    • stderr: terminal

    What redirects do is to modify what the standard streams point to before the program starts executing.

    • < means modify stdin.
    • > means modify stdout.
    • 2> means modify stderr.

    That is the mental model: redirects modify standard streams before program execution.

    Let’s evaluate a few examples using this mental model to see how it works.

    Logcat Utility

    To be able to show what happens in different examples, we have a utility program, logcat.py, that makes use of all three streams. It reads text from stdin, logs the arguments and the length of the text to stderr, and writes the text to stdout. It looks like this:

    #!/usr/bin/env python
    
    import sys
    
    text = sys.stdin.read()
    
    sys.stderr.write(f"Args: {(sys.argv[1:])}\n")
    sys.stderr.write(f"Read {len(text)} characters.\n")
    
    sys.stdout.write(text)
    

    Example: No Redirect

    Let’s start with an example without redirects to see the operation of logcat.py:

    $ ./logcat.py ignored arguments
    

    Before logcat.py starts executing, Bash sets up the standard streams as follows:

    • stdin: terminal/keyboard
    • stdout: terminal
    • stderr: terminal

    When execution starts, logcat.py waits for input. If we type hello in the terminal (followed by a return and ctrl+d), the following is printed to the terminal:

    Args: ['ignored', 'arguments']
    Read 6 characters.
    hello
    

    We can see that it read our input from the terminal/keyboard and wrote the log messages along with our input to the terminal as well.

    Example: Redirect Stdin

    Now let’s modify stdin to instead of the terminal/keyboard be the logcat.py source code:

    $ ./logcat.py ignored arguments <logcat.py
    

    This instructs Bash to modify stdin to point to the file logcat.py.

    Before logcat.py starts executing, Bash sets up the standard streams as follows:

    • stdin: logcat.py (opened in read mode)
    • stdout: terminal
    • stderr: terminal

    When execution starts, the following is printed to the terminal:

    Args: ['ignored', 'arguments']
    Read 182 characters.
    #!/usr/bin/env python
    
    import sys
    
    text = sys.stdin.read()
    
    sys.stderr.write(f"Args: {(sys.argv[1:])}\n")
    sys.stderr.write(f"Read {len(text)} characters.\n")
    
    sys.stdout.write(text)
    

    We can see that the redirect operation is stripped from the arguments. Only Bash sees it and does not pass it along to the program. Furthermore we can see that the logcat.py source code is printed to the terminal.

    Example: Redirect Stdin and Stdout

    Let’s say we’re only interested in the log messages, and want to throw away stdout:

    $ ./logcat.py ignored arguments <logcat.py >/dev/null
    

    This instructs Bash to modify stdin to point to the file logcat.py and to modify stdout to point to the file /dev/null.

    Before logcat.py starts executing, Bash sets up the standard streams as follows:

    • stdin: logcat.py (opened in read mode)
    • stdout: /dev/null (opened in write mode)
    • stderr: terminal

    When execution starts, the following is printed to the terminal:

    Args: ['ignored', 'arguments']
    Read 182 characters.
    

    We can see that the redirect operations are all stripped from the arguments and the source code has been written to /dev/null and is thus not shown in the terminal.

    Extended Mental Model

    Let’s extended the mental model to clarify how Bash operates.

    When Bash parses a command, it divides it into two parts: the arguments and the redirects. Before it starts executing the program with the arguments, it goes through the redirects, in order, and configures the standard streams before execution.

    Example: Redirect All Streams

    Let’s see how we can interpret a more complex command using the extended mental model:

    $ ./logcat.py <logcat.py is the >out.txt best 2>&1 thing
    

    If we split this into arguments and redirects, we get this:

    • Arguments: ./logcat.py, is, the, best, thing
    • Redirects: <logcat.py, >out.txt, 2>&1

    Now, let’s evaluate the redirects in order. The state of the standard streams at start is this:

    • stdin: terminal/keyboard
    • stdout: terminal
    • stderr: terminal

    Then we evaluate <logcat.py and get this:

    • stdin: logcat.py (opened in read mode)
    • stdout: terminal
    • stderr: terminal

    Then we evaluate >out.txt and get this:

    • stdin: logcat.py (opened in read mode)
    • stdout: out.txt (opened in write mode)
    • stderr: terminal

    Then we evaluate 2>&1, which means modify stderr (2>) to be whatever stdout points to (&1), and get this:

    • stdin: logcat.py (opened in read mode)
    • stdout: out.txt (opened in write mode)
    • stderr: out.txt (opened in write mode)

    After the standard streams have been set up, execution of ./logcat.py is the best thing starts. Nothing appears on the terminal since all output has been redirected to out.txt:

    $ cat out.txt
    Args: ['is', 'the', 'best', 'thing']
    Read 182 characters.
    #!/usr/bin/env python
    
    import sys
    
    text = sys.stdin.read()
    
    sys.stderr.write(f"Args: {(sys.argv[1:])}\n")
    sys.stderr.write(f"Read {len(text)} characters.\n")
    
    sys.stdout.write(text)
    

    Mini Shell

    I created a mini version of a shell to demonstrate how straight forward it is to implement redirects with POSIX system calls. It works exactly as the extended mental model, and because it is running software, it fills in some more details of the model. I would guess that Bash does something similar even though I haven’t read its source code.

    First off, here is a demo that shows how the mini shell can replicate the complex example from above:

    $ ./minishell.py
    ~~?~~> ./logcat.py <logcat.py is the >out.txt best 2>&1 thing
    ~~0~~> cat out.txt
    Args: ['is', 'the', 'best', 'thing']
    Read 182 characters.
    #!/usr/bin/env python
    
    import sys
    
    text = sys.stdin.read()
    
    sys.stderr.write(f"Args: {(sys.argv[1:])}\n")
    sys.stderr.write(f"Read {len(text)} characters.\n")
    
    sys.stdout.write(text)
    

    And here is the implementation in only 31 lines of Python:

    #!/usr/bin/env python
    
    import os
    import sys
    
    STDIN  = 0
    STDOUT = 1
    STDERR = 2
    
    statuscode = "?"
    while True:
        sys.stdout.write(f"~~{statuscode}~~> ")
        sys.stdout.flush()
        command = input()
        pid = os.fork()
        if pid == 0:
            args = []
            for part in command.split(" "):
                if part.startswith("<"):
                    os.dup2(os.open(part[1:], os.O_RDONLY), STDIN)
                elif part.startswith(">"):
                    os.dup2(os.open(part[1:], os.O_WRONLY|os.O_CREAT, 0o644), STDOUT)
                elif part == "2>&1":
                    os.dup2(STDOUT, STDERR)
                elif part.startswith("2>"):
                    os.dup2(os.open(part[2:], os.O_WRONLY|os.O_CREAT, 0o644), STDERR)
                else:
                    args.append(part)
            os.execvp(args[0], args)
        else:
            _, statuscode = os.waitpid(pid, 0)
    

    To understand how this works, you need some knowledge of the POSIX system calls fork, waitpid, open, dup2, and execvp. But even if you don’t understand the specifics, I think this codified model can help in understanding how Bash operates. Let’s look at an example.

    Example: Duplicated Files

    Let’s see if we can explain the difference between the following commands using the mini shell for the model:

    $ ./logcat.py <logcat.py >out.txt 2>out.txt
    $ ./logcat.py <logcat.py >out.txt 2>&1
    

    At a first glance, it looks like both commands redirect both stdout and stderr to the out.txt file. But if we evaluate it like mini shell does, we see that the first example will open the file twice (two calls to os.open creating two file handles), whereas the second example will open the file only once and then duplicate the file handle for stderr.

    When two file handles are created, writes to the two streams will attempt to write to the same location in the file and they will overwrite each other. Furthermore, buffering might alter in which order writes happen, so it is not clear what will actually end up in the file. So to make sure all output is captured in the file, the second example should be used where the file is only opened once.

    Conclusion

    There is still more to Bash redirects than what I have explained here. But this mental model (along with its extended versions) have helped me reason about Bash redirects. I hope it will do the same for you.