Connection Tracking Problems

A couple of years back I discovered that when a VoIP server would sporadically lose it’s registration to the upstream VoIP provider, and then in spite of re-sending those registration requests would seem completely unable to register. Sometimes when rebooting the device it would then manage to reregister but not always.

UPDATE: I can no longer reproduce this on newer Linux kernels. I’m not sure where in the version line this was fixed, but this no longer seems to be a problem on Linux itself. Confirmed on kernel version 3.12.21. Newer versions of Mikrotik also no longer seems to suffer this problem. So all of this information should be treated as historical, and mostly to serve as a reference since there definitely are still routers out there suffering from this problem. It is for this reason alone that I’ve decided to leave this information here. If you run into this, the only “fix” for the VoIP situation is the use SIP over TCP, or if you don’t have that option: Shut down the SIP endpoint behind NAT, wait for the connection tracking entry to clear (or reboot the gateways), start the SIP endpoint again.

Eventually I managed to track this down to a problem with NAT, and specifically the way connection tracking is handled by the Linux kernel (other kernels may be affected too). To understand the problem one has to understand a little bit about the basics of connection tracking. I’ll try to keep it as simple as required for the particular use case here. It should also be understood that a lot (most) of information here is deduced information.

For the purposes of this discussion I’ll pretend as if 192.168.0.0/24 is our private range (ie, not routable on the internet), and our ISP is using 172.16.0.0/16 and our provider is in the 10.0.0.0/24 IP range. Our router is sitting on 192.168.0.1 (LAN IP) and has 172.16.0.1/32 as it’s public “PPP” IP. The internal VoIP server is at 192.168.0.2 and the upstream VoIP provider is sitting on 10.0.0.10 (we don’t care about it’s subnet and range at all).

When we register to 10.0.0.10 we send a UDP packet (usually from port 5060 but this really doesn’t matter, I’ll use 5555 just to be explicit on what’s where) with the following addressing:

protocol: udp
src ip: 192.168.0.2
dst ip: 10.0.0.10
src port: 5555
dst port: 5060

Since the default gateway is 192.168.0.1 we pass this packet to the default gateway, and it uses it’s routing table to determine that the packet should be sent out on the ppp link. At this point we all know “the source IP has to be changed”, so the router replaces it:

protocol: udp
src ip: 172.16.0.1
dst ip: 10.0.0.10
src port: 5555
dst port: 5060

In the Linux kernel this process is a little bit more involved (eg, the source port may be changed too), we need to remember this mapping since when a packet returns, we need to remember what internal IP this needs to be forwarded to again. So the kernel creates an entry that contains (amongst other information):

protocol: udp
src ip: 192.168.0.2
dst ip: 10.0.0.10
src port: 5555
dst port: 5060
reply src ip: 10.0.0.10
reply dst ip: 172.16.0.1
reply src port: 5060
reply dst port: 5555

As can be see the information for both legs of the call is stored, in such a way that we can look up both packets, and immediately know what the rewritten information needs to look like.

In the Linux kernel, the NAT table is consulted in netfilter the first time a connection is seen (ie, no entry in the connection tracking cache matched). So when we receive a packet, we look it up in the connection tracking table, if we find a hit, we rewrite the information and move along, otherwise we consult the nat table to find out what the rewrites should look like.

The core of the problem is now that this entry gets created whether we have a route to pass the packet through or not. So if the internet is down, and the router sees the original packet, since it can’t be routed, and the replaced source IP can’t be determined, the entry ends up looking like this:

protocol: udp
src ip: 192.168.0.2
dst ip: 10.0.0.10
src port: 5555
dst port: 5060
reply src ip: 10.0.0.10
reply dst ip: 192.168.0.2 <--
reply src port: 5060
reply dst port: 5555

Now the Internet comes up, but the faulty tracking entry remains, and we are stuck with having to get that entry timed out before VoIP will work again. A connection going down doesn't matter, because the kernel will clear entries from the table that utilizes the IP of the removed link as reply-dst-ip, and if the link restores quickly enough so that no SIP packet is seen whilst the link is down everything is OK - even if we get a different IP from our ISP.

It is for this reason that we recommend using SIP over TCP if there is NAT involved. This will force the connection to break, and the source port will (hopefully) be different for every connection attempt thus resulting a the faulty connection tracking entry not being hit. And for RTP this is not a major issue since if the internet connection goes down you're dropping the call in all likelyhood anyway (unless you happen to come back up quickly enough with the same public IP again).

So the sequence of events for this to happen:

1. Internet starts off down.
2. Device behind NAT sends packet to internet.
3. Faulty conntrack entry gets created.
4. Internet comes up.
5. NAT netfilter table doesn't get queried for new NAT information.

Point 5 results in the packets being routed out with the internal LAN IP as source onto the provider link, not only resulting in exposing the internal LAN range, but also preventing the world from being able to route traffic back.

I've wondered about possible resolutions, and there are a number of things to consider, and the use-cases gets significantly more complex very quickly, especially when multiple internet uplinks are considered. For the simple case above there are essentially two possible fixes, both of which are building blocks for solving the more complex cases:

1. Flush the connection tracking cache when the Internet comes up. This is the simplest fix. It is also the bulldozer approach, and needs to be refined to only removing incorrect entries in more complex scenarios. For the typical SOHO setup, this is the perfect solution.
2. Don't create a connection tracking entry if the packet can't be routed.

For the latter we really need to modify the way the kernel works - which is beyond the scope of my development skills. It does, however, solve (with the correct iptables and routing rules) the more complex use-cases.

I usually get asked why this only seems to affect VoIP, and it turns out the answer is deceptively simple. The far majority of other protocols use "random" source ports, even when using UDP. This simply doesn't hold for VoIP. A number of games are likely also to be affected since they tend to use "well known" ports (similar to VoIP), but in this use-case when the internet goes down the user probably needs to restart his game anyway, and by that time the connections has timed out on the router.

Comments are closed.