How to write reliable socket servers that survive crashes and restarts?

Published on 2 September 2022.

A few months ago, I was researching how to do zero-downtime deployments and found the wonderful blog post Dream Deploys: Atomic, Zero-Downtime Deployments.

In it, Alan describes how separating listening on a socket and accepting connections on it into different processes can keep a socket “live” at all times even during a restart.

In this blog post I want to document that trick and my understanding of it.

Video version

If you prefer a video version covering this topic:

The problem with a crashing server

To illustrate the problem with a crashing server, we use the example below:

  1. server-listen.py
import socket

with socket.socket() as s:
    s.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
    s.bind(("localhost", 9000))
    s.listen()
    print("listening on port 9000")
    while True:
        conn, addr = s.accept()
        print("accepting connection")
        with conn:
            data = conn.recv(100)
            number = int(data)
            conn.sendall(f"{number}*{number}={number*number}\n".encode("ascii"))

This is a TCP server, listening on port 9000, reading numbers from clients, and returning the product of the two numbers. It assumes that numbers can be parsed as integers. If parsing fails, the server crashes.

To test the behavior of the server, we use the following client:

  1. client.py
import socket
import time

def make_request(number):
    with socket.socket() as s:
        s.connect(("localhost", 9000))
        if number == 5:
            s.sendall(b"five\n")
        else:
            s.sendall(f"{number}\n".encode("ascii"))
        return s.recv(100).decode("ascii").rstrip()

for i in range(20):
    try:
        time_start = time.perf_counter()
        message = make_request(i)
        time_end = time.perf_counter()
        diff = int((time_end - time_start) * 1000)
        if message:
            print(f"{message} (request took {diff}ms)")
        else:
            print(f"No response for {i}")
    except:
        print(f"Connection failed for {i}")
    time.sleep(0.01)

It sends 20 requests to the server with a 10ms delay between them. However, for request with number 5, instead of sending the number 5 it sends the string five to cause the server to crash.

If we start the server, then the client, the output looks as follows:

  1. server output
$ python server-listen.py 
listening on port 9000
accepting connection
accepting connection
accepting connection
accepting connection
accepting connection
accepting connection
Traceback (most recent call last):
  File "/home/rick/rickardlindberg.me/writing/reliable-socket-servers/server-listen.py", line 13, in <module>
    number = int(data)
ValueError: invalid literal for int() with base 10: b'five\n'
  1. client output
$ python client.py 
0*0=0 (request took 1ms)
1*1=1 (request took 0ms)
2*2=4 (request took 0ms)
3*3=9 (request took 0ms)
4*4=16 (request took 0ms)
No response for 5
Connection failed for 6
Connection failed for 7
Connection failed for 8
Connection failed for 9
Connection failed for 10
Connection failed for 11
Connection failed for 12
Connection failed for 13
Connection failed for 14
Connection failed for 15
Connection failed for 16
Connection failed for 17
Connection failed for 18
Connection failed for 19

In the client output, we see that request with number 5 never receives a response from the server and that subsequent requests fail because the server has crashed, and there is no one listening on port 9000.

Solution: restart the server in a loop

In order for subsequent requests to succeed, we need to start the server again after it has crashed. One way to do that is to run the server program in an infinite loop using a script like the one below:

  1. loop.sh
while true; do
    echo "$@"
    "$@" || true
    echo "restarting"
done

This Bash script takes a command to run as argument and runs that command in a loop, ignoring any exit code.

Invoking the server and client again, we get the following output:

  1. server output
$ bash loop.sh python server-listen.py
python server-listen.py
listening on port 9000
accepting connection
accepting connection
accepting connection
accepting connection
accepting connection
accepting connection
Traceback (most recent call last):
  File "/home/rick/rickardlindberg.me/writing/reliable-socket-servers/server-listen.py", line 13, in <module>
    number = int(data)
ValueError: invalid literal for int() with base 10: b'five\n'
restarting
python server-listen.py
listening on port 9000
accepting connection
accepting connection
accepting connection
accepting connection
accepting connection
accepting connection
  1. client output
$ python client.py 
0*0=0 (request took 1ms)
1*1=1 (request took 0ms)
2*2=4 (request took 0ms)
3*3=9 (request took 1ms)
4*4=16 (request took 0ms)
No response for 5
Connection failed for 6
Connection failed for 7
Connection failed for 8
Connection failed for 9
Connection failed for 10
Connection failed for 11
Connection failed for 12
Connection failed for 13
14*14=196 (request took 0ms)
15*15=225 (request took 0ms)
16*16=256 (request took 0ms)
17*17=289 (request took 0ms)
18*18=324 (request took 0ms)
19*19=361 (request took 1ms)

In the server output, we see that the server starts again after the crash and starts listening on port 9000.

In the client output, we see that request with number 5 fails the same way, but after a few more requests, it starts getting responses again at request with number 14.

The problem with a restarting server

Running the server in a loop is an improvement. Instead of dropping all subsequent requests, we only drop a few.

But during the time between the server crash and a new server being up, there is no one listening on port 9000 and we still drop connections.

How can we make sure to handle all requests?

Solution: separate listening on a socket and accepting connections

The trick, as also demonstrated in Alan’s blog post, is to listen on the socket in one process and accept connections and processing requests in another process. That way, if processing fails, and that process dies, the socket still stays open because it is managed by another process.

Here is a program that listens on a socket and then spawns server processes in a loop to accept connections:

  1. server-listen-loop.py
import os
import socket

with socket.socket() as s:
    s.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
    s.bind(("localhost", 9000))
    s.listen()
    print("listening on port 9000")
    os.dup2(s.fileno(), 0)
    os.close(s.fileno())
    os.execvp("bash", ["bash", "loop.sh", "python", "server-accept.py"])

The first part of this program creates a socket and starts listening. This is what we had in the previous example.

The second part moves the file descriptor of the socket to file descriptor 0 (stdin) to make it available to child processes.

The third part replaces the current process with bash loop.sh python server-accept.py. At this point the process is listening on the socket and starts the server-accept.py program in a loop. As long as the loop.sh script doesn’t exit, there will be someone listening on port 9000.

The server-accept.py program is similar to server-listen.py, but instead of listening on port 9000, it just accepts connections on the socket which is passed to it as file descriptor 0 (stdin):

Here is server-accept.py:

  1. server-accept.py
import socket

with socket.socket(fileno=0) as s:
    while True:
        conn, addr = s.accept()
        print("accepting connection")
        with conn:
            data = conn.recv(100)
            number = int(data)
            conn.sendall(f"{number}*{number}={number*number}\n".encode("ascii"))

Invoking the server and client again, we get the following output:

  1. server output
$ python server-listen-loop.py
listening on port 9000
python server-accept.py
accepting connection
accepting connection
accepting connection
accepting connection
accepting connection
accepting connection
Traceback (most recent call last):
  File "/home/rick/rickardlindberg.me/writing/reliable-socket-servers/server-accept.py", line 9, in <module>
    number = int(data)
ValueError: invalid literal for int() with base 10: b'five\n'
restarting
python server-accept.py
accepting connection
accepting connection
accepting connection
accepting connection
accepting connection
accepting connection
accepting connection
accepting connection
accepting connection
accepting connection
accepting connection
accepting connection
accepting connection
accepting connection
  1. client output
$ python client.py 
0*0=0 (request took 0ms)
1*1=1 (request took 0ms)
2*2=4 (request took 1ms)
3*3=9 (request took 0ms)
4*4=16 (request took 0ms)
No response for 5
6*6=36 (request took 106ms)
7*7=49 (request took 0ms)
8*8=64 (request took 1ms)
9*9=81 (request took 0ms)
10*10=100 (request took 0ms)
11*11=121 (request took 1ms)
12*12=144 (request took 0ms)
13*13=169 (request took 0ms)
14*14=196 (request took 0ms)
15*15=225 (request took 0ms)
16*16=256 (request took 1ms)
17*17=289 (request took 0ms)
18*18=324 (request took 0ms)
19*19=361 (request took 1ms)

Now all requests (except the one that causes a crash) get a response. We see that request with number six takes longer to complete. That is because server-accept.py needs time to start up (by the loop script) and call accept on the socket. But the request doesn’t fail. The client will not get a connection error.

And this is one way to write reliable socket servers that survive crashes and restarts.

Questions & Answers

How long will a socket wait before timing out?

I tried to modify the loop script to sleep for 60 seconds before restarting the server:

  1. loop-sleep.sh
while true; do
    echo "$@"
    "$@" || true
    echo "restarting"
    sleep 60
done

The client output looked like this:

  1. client output
...
No response for 5
6*6=36 (request took 60123ms)
7*7=49 (request took 0ms)
...

So it seems that the client got no errors even though the request took 60 seconds to be responded to.

I suppose you can put a timeout in the client code. But this question was about how long the operating system on the server will keep the connection “alive” even though no one calls accept.

I suppose the operating system has some kind of buffer. Say that there are multiple clients making requests at the same time and the server never calls accept during that time. Eventually some buffer must be exceeded and connections get dropped.

(It seems that the client hangs on the s.recv call. That means that the request was sent to the server and must have filled up some buffer.)

If anyone can point me to documentation where I can read about this behavior, please drop me a line.

Can we decrease the startup time?

When the loop script restarts the server, it takes ~100ms for it to come up and process requests. How can we decrease that time?

One way would be to modify the loop script to spawn multiple server processes. That way, if one crashes, the other can serve the next request.

This would also make the server code concurrent. That is, no “global” state can reside in the server process, because we don’t know which server process will serve the next request.

Another solution might be to have a second process in standby mode. So the loop script starts a second server process, but it stops it right before calling accept. But then we would need a way to signal to the process to resume operation. Perhaps by sending it a signal?

  1. server-accept-standby.py
import socket

with socket.socket(fileno=0) as s:
    # wait for signal before proceeding
    while True:
        conn, addr = s.accept()
        print("accepting connection")
        with conn:
            data = conn.recv(100)
            number = int(data)
            conn.sendall(f"{number}*{number}={number*number}\n".encode("ascii"))

Both of these make the loop script more complicated. And if it gets more complicated, it is more likely to crash. And if it crashes, the socket gets closed, and subsequent requests will get connection errors.

Can we use this technique to create a load balancer?

Well, yes.

If the loop script spawns multiple server processes, the operating system will load balance between them.

No fancy load balancing software needed.

More info in this blog post.

Why do we need to move the socket file descriptor?

In the middle of server-listen-loop.py we move the file descriptor of the socket, s.fileno(), to file descriptor 0 (stdin):

  1. server-listen-loop.py
import os
import socket

with socket.socket() as s:
    s.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
    s.bind(("localhost", 9000))
    s.listen()
    print("listening on port 9000")
    os.dup2(s.fileno(), 0)
    os.close(s.fileno())
    os.execvp("bash", ["bash", "loop.sh", "python", "server-accept.py"])

We do that to make the file descriptor available to child processes so that they can create a socket using it and then call accept.

In Python, the file descriptor of the socket is not inheritable by default. That is, a child process will not be able to access the socket file descriptor. That is why we have to move it to file descriptor 0 (stdin) which is inherited.

Another option might be to make the file descriptor inheritable. Something like this:

  1. server-listen-loop-inherit.py
import os
import socket

with socket.socket() as s:
    s.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
    s.bind(("localhost", 9000))
    s.listen()
    print("listening on port 9000")
    os.set_inheritable(s.fileno(), True)
    os.execvp("bash", ["bash", "loop.sh", "python", "server-accept-inherit.py", str(s.fileno())])

Then the file descriptor must also be passed to the server processes and used there instead of stdin:

  1. server-accept-inherit.py
import socket
import sys

with socket.socket(fileno=int(sys.argv[1])) as s:
    while True:
        conn, addr = s.accept()
        print("accepting connection")
        with conn:
            data = conn.recv(100)
            number = int(data)
            conn.sendall(f"{number}*{number}={number*number}\n".encode("ascii"))

This seems to work as well.

I think I choose the first approach because that is how Alan did it.

Not having to pass the file descriptor to the child processes might be preferable in some situations. I don’t know.

Why is execvp needed?

At the end of server-listen-loop.py we call execvp to start executing the loop script in the same process that started listening on the socket:

  1. server-listen-loop.py
import os
import socket

with socket.socket() as s:
    s.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
    s.bind(("localhost", 9000))
    s.listen()
    print("listening on port 9000")
    os.dup2(s.fileno(), 0)
    os.close(s.fileno())
    os.execvp("bash", ["bash", "loop.sh", "python", "server-accept.py"])

Why do we do that?

I did it because that is how Alan did it. But now that I think about it, I think we can just as well inline the loop script in server-listen-loop.py. That, of course, requires the loop script to be written in Python. Something like this:

  1. server-listen-loop-python.py
import os
import socket
import subprocess

with socket.socket() as s:
    s.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
    s.bind(("localhost", 9000))
    s.listen()
    print("listening on port 9000")
    os.dup2(s.fileno(), 0)
    os.close(s.fileno())
    while True:
        subprocess.Popen(["python", "server-accept.py"]).wait()
        print("restarting")

It seems to work as well.

If the loop script is a simple loop like this, perhaps it makes sense to inline it. But if the loop script is more complex, perhaps even a third party product to manage server processes, it makes sense to do the execvp.

Why socket option REUSE?

In server-listen.py, we set the socket option SO_REUSEADDR:

  1. server-listen.py
import socket

with socket.socket() as s:
    s.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
    s.bind(("localhost", 9000))
    s.listen()
    print("listening on port 9000")
    while True:
        conn, addr = s.accept()
        print("accepting connection")
        with conn:
            data = conn.recv(100)
            number = int(data)
            conn.sendall(f"{number}*{number}={number*number}\n".encode("ascii"))

Why?

I think this Stackoverflow answer explains it well:

This socket option tells the kernel that even if this port is busy (in the TIME_WAIT state), go ahead and reuse it anyway. If it is busy, but with another state, you will still get an address already in use error. It is useful if your server has been shut down, and then restarted right away while sockets are still active on its port.

Without it, it can not be run in a loop, and will get this error:

OSError: [Errno 98] Address already in use

Is this how Supervisor works?

Supervisor can create a process that listens on a socket and then pass that socket to child processes. For example like this:

[fcgi-program:test]
socket=tcp://localhost:9000
command=python server-accept.py

The server-accept.py program will get a socket passed to it as file descriptor 0 (stdin).

However, if server-accept.py crashes, it seems like Supervisor closes the socket and creates it again upon restart:

2022-05-10 21:46:28,734 INFO exited: test (exit status 1; not expected)
2022-05-10 21:46:28,734 INFO Closing socket tcp://localhost:9000
2022-05-10 21:46:29,736 INFO Creating socket tcp://localhost:9000
2022-05-10 21:46:29,737 INFO spawned: 'test' with pid 561624
2022-05-10 21:46:30,740 INFO success: test entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)

So in this setup, we would still drop connections.

Why not make the server more reliable?

We could make the server more reliable so that it doesn’t crash. But sometimes a server needs to be restarted anyway. For example when configuration changes or a new version of the server should be deployed. The approach described in this blog post makes it possible to do those kinds of things without ever dropping connections as well.

Can this approach be used for zero-downtime deployments

Well, yes, that is how I learned about it in Alan’s blog post.

Can we use a Unix domain socket instead of a TCP socket?

Well, yes.

In fact, the accepting server doesn’t know what kind of socket is passed to it. It could be either a Unix domain socket or a TCP socket:

  1. server-accept.py
import socket

with socket.socket(fileno=0) as s:
    while True:
        conn, addr = s.accept()
        print("accepting connection")
        with conn:
            data = conn.recv(100)
            number = int(data)
            conn.sendall(f"{number}*{number}={number*number}\n".encode("ascii"))

(Unix domain sockets are probably faster than TCP sockets when running on the same machine.)


Site proudly generated by Hakyll.