Right, so Kevin (one of my staff) had the savvy to take a few tcpdump traces on both the client and the server side of a failed PPtP VPN connection over the weekend. The result? It seems the great firewall of Vodacom has yet again taken another victem.
I’m not sure whether this is a result of too little testing, total ignorance or just incompetence. Either way, it would seem it’s a bit of a race condition, and hits something similar to what we in the office refer to as the “connection tracking bit bucket”. Basically it seem most connection tracking implementations (when combined with a state full firewall such as that used by Vodacom – as per their own admission in their last letter to me) results in certain flows being prematurely marked as “invalid”. In particular in the example that Kevin has captured for me the server ends up being the first entity to send a GRE packet, this then gets (or got, seeing that it’s fixed again) intercepted by the firewall, perceived as an inbound connection to the client and the uni-directional flow gets marked as invalid. When the client now sends GRE traffic to the server this gets allowed, but the return traffic still bites the “invalid” mark. I can only speculate as to the exact state (seeing that Vodacom doesn’t reveal exactly what software they are using – probably proprietary anyway) of things, making it difficult. This I will attempt to speculate as objectively as possible (not always easy).
Seeing that there are two entities involved in this dump, and I want to do a side-by side comparison, some ASCII art is in order. Essentially three columns being used, the sending agent will indicate what is being sent, and if it was received by the destination I’ll mark that column with an ACK. I’ll also add (R) to retransmits on the sending side. The ISN mods still applies to the TCP connections, however, the data itself isn’t being tampered with in this case. Note that in the GRE traffic case there are still ACK packets being sent by the server, these ACK packets however goes lost (as indicated in the packet sequence).
Client Direction Server
SYN -> ACK
ACK <- SYN
PPTP (Start Req) -> ACK
ACK <- PPTP (Start Resp)
<- GRE (PPP-LCP Conf Req)
GRE (PPP-LCP Conf Req) (R) -> ACK (goes lost)
GRE (PPP-LCP Conf Req) (R) -> ACK (goes lost)
... a few more of these ...
GRE (PPP-LCP Conf Req) (R) -> ACK (goes lost)
PPTP (Call Clear Req) -> ACK
Once the Call Clear Req is received TCP/IP teardowns happens, surprisingly without the flurry of injected RST packets I’ve growned accustomed to, just a single out-of-order delivery between one ACK and FIN/ACK packet.
What I would want (not sure what resolution they picked) is for them to either perform a routine inspection of the PPTP control traffic (specifically the Start Request and Start Reply packets) to determine the GRE traffic parameters (based on what I can see, just mark the fact that GRE is to be expected between the two given end points) and allow that traffic, or, stop this firewalling nonsense. It’s only the Cellular “ISPs” performing actions such as these. The arguments for providing this protection is sound. But then it needs to be done sanely. For the most part I’ll have to admit that the firewall works and doesn’t cause too many problems.
Seeing that the problem has been resolved by Vodacom, I’ll let it rest, for now.
Is there any decent TCP testers? It will make debugging (mostly firewall caused) TCP-issues a lot easier… (Cisco firewalls also mess around with TCP flags, which breaks some applications… They also mess around with the ISN by default)
It should be able to test most TCP features (transfers, sequence numbers, urgent flag, common extensions) and give you an idea of what the network allows and what not…
Back to 3G: I wonder if their “internet conenctions” actually work properly for non-TCP / UDP-based protocols… (such as SCTP-based protocols…)
I always just use tcpdump. The number of these problems I have are far and few between, and vary so much in nature that I believe it’ll be difficult to write a tool to reliably test for all possible things that can go wrong. In terms of probing you can probably use nmap to see what’s open and what not.
The more interesting part as you suggest is the question surrounding non-TCP and non-UDP. I know that GRE works in combination with PPTP (as per above, when they don’t break it). But other protocols may not work properly, this remains to be seen and I can’t really comment. But brings me back to elementary networking … the iso networking layer model. You have been given an IP right? Which is routable? So why does the ISP feel it’s required to break the iso layer by looking at stuff anything higher than layer 3? Why do they feel it’s required to look at anything inside the IP packets when that is all they actually need to look at in order to deliver traffic?
Hi Jaco,
I was suprised to find your blog entry when googling my symptoms! I’ve been experiencing the exact same problem again this week (2 years later), building VPN connections using vodacom 3g. Interestingly, there problem only occurs if I get a 10.x.x.x IP, but everything works if I get a 41.x.x.x IP. I assume there might be 2 different firewalls. Anyway, did you speak to someone specific at Vodacom last time, log a general support call, or did the problem go away “by itself”:) Any tips would be appreciated.
Regards
Carl
Hi Carl.
Long time …
Yea, you’re right. The 10.x.x.x private IP ranges would require some form of NAT (obviously). So the firewalls would be different. If their firewall only tracks GRE as at a protocol level instead of looking at the end-point descriptors in order to enforce a proper NAT mapping (like the pptp nat connection tracking module in the Linux kernel does) their firewall would indead break PPtP when using private IPs for assigning to clients. The public IPs would in that scenario be OK, permitting that the station behind 3G transmits the first GRE packet, not the VPN concentrator.
Hope that makes sense. If you can, try rather using L2TP/IPSec, or even OpenVPN. L2TP/IPSec would detect the NAT layer and switch to encapsulating the IPSec packets in UDP packets, that is handled properly by the Great Vodacom Failwall.
Long time indeed. Thanks for the feedback. I agree, the 10.x.x.x network is private, implying NAT. I’m not really familiar with GRE, but if it’s on layer 4, then there is no TCP, so no ports involved. Then port translation would break it. But pure NAT should still work. It’s frustrating when something like this stops working after more than a year without a hitch. I’ll see what I can do here.
A solution: https://www.vodacom.co.za/personal/internet/broadbandonthemove/unrestrictedapn
The unrestricted APN isn’t a solution. It’s a kludge. The reality is that the unrestricted APN _forces_ the allocation of a publicly routable IP, and firewall entirely disabled, so you can receive incoming connections too. This just outright works around the lack of proper connection tracking on their NAT platform.
GRE isn’t TCP/UDP, so there are no ports to do port translation with. Masquerading PPTP GRE connections isn’t part of the regular linux kernel: http://tldp.org/HOWTO/VPN-Masquerade-HOWTO-2.html (see the last section, 2.11). Also http://tldp.org/HOWTO/VPN-Masquerade-HOWTO-3.html
Not sure how up to date this information is though. I’m setting up openvpn anyway, it should get through anything.
Hi,
Yea, GRE sits next to TCP/UDP, however, there is a conntrack module that can track it (From a quick look the referenced pages are still based on a 2.0.x/2.2.x kernel, which still used ipchains, I’ve been using CONFIG_NF_CONNTRACK_PPTP and CONFIG_NF_NAT_PPTP in recent kernels to perform the required tracking) based on stream and id numbers that are present in the protocol headers (similar to port numbers in tcp/udp).
OpenVPN will definitely sort out your problem though, or alternatively L2TP/IPSec (easier integration for Windows as it has a built-in client, but, depending on the server you’re using, harder to set up), which has NAT detection and NAT trafersal (basically it switches to ESP/AH inside of UDP instead of ESP/AH direct on IP when it detects NAT – which is pretty cool).
No real new information, but someone might find it useful.
OK, so if Vodacom were to use a linux system with a recent kernel for the firewall/masquerading, they’d be able to track the GRE connections properly and I wouldn’t be having this problem. I’ve seen the connection numbers in the GRE headers with tcpdump, so that makes sense. Anyway, I realise this is a different issue from what you originally reported on in the blog.
Interesting that openwrt has this: http://wiki.openwrt.org/doc/howto/vpn.nat.pptp
Thanks for the info on L2TP/IPSec, sounds interesting.
We now have OpenVPN set up and working nicely. Really simple to set up, and it works from windows without much hassle. Most of the intended users run some form of linux though 🙂
Hi guys i’ve just both a vpnbook package and try to connect with my mobile broadband using vodacom as my network.if i connect it gives me an error “fatal tls error” can somebody help me please
It could potentially be related, if it’s sporadic. If you get it on every single https:// connection then it’s a different problem. Usually your browser will give additional information more than just fatal tls error. Please make 100 % sure of the following:
1. Your system time is accurate (at least to within a few minutes)
2. Your browser (or the application giving the error) is up to date.
Failing that please contact Vodacom support. They blew me off because I’m not using Windows but perhaps you’ll have more luck.