<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>jkroon &#187; Networking</title>
	<atom:link href="http://jkroon.blogs.uls.co.za/category/it/networking/feed" rel="self" type="application/rss+xml" />
	<link>http://jkroon.blogs.uls.co.za</link>
	<description>Ultimate Linux Solutions</description>
	<lastBuildDate>Wed, 25 Aug 2010 21:57:29 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.2</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Cell C following in the footsteps of Vodacom?</title>
		<link>http://jkroon.blogs.uls.co.za/it/security/cell-c-following-in-the-footsteps-of-vodacom</link>
		<comments>http://jkroon.blogs.uls.co.za/it/security/cell-c-following-in-the-footsteps-of-vodacom#comments</comments>
		<pubDate>Sun, 04 Jul 2010 18:52:48 +0000</pubDate>
		<dc:creator>Jaco Kroon</dc:creator>
				<category><![CDATA[Networking]]></category>
		<category><![CDATA[Security]]></category>

		<guid isPermaLink="false">http://jkroon.blogs.uls.co.za/?p=296</guid>
		<description><![CDATA[Most people that know me well will know that I really don&#8217;t like the way Vodacom runs their firewalls for their 3G consumers.  In fact, they&#8217;ve managed to make it onto my blog no less than 3 times now &#8211; and not once for anything they&#8217;ve done right.  And now Cell C have [...]]]></description>
			<content:encoded><![CDATA[<p>Most people that know me well will know that I really don&#8217;t like the way Vodacom runs their firewalls for their 3G consumers.  In fact, they&#8217;ve managed to make it onto my blog no less than 3 times now &#8211; and not once for anything they&#8217;ve done right.  And now Cell C have decided to join the crowd of braindead arseholes who can&#8217;t run firewalls.  I present to you the man-in-the-middle TCP connection reset.  As it stands right now I can&#8217;t ssh.  I can&#8217;t connect to my jabber server.  I can&#8217;t even browse.  At least, not using my Cell C internet connection.<br />
<span id="more-296"></span><br />
<b>UPDATE</b>:  Please note that Cell C has already contacted me regarding this.  See comment #1 below for more details.</p>
<p>Unfortunately it&#8217;s insanely hard to prove conclusively where the TCP resets are coming from, again the only evidence I&#8217;ve got that it has to be Cell C is the fact that it works flawlessly from everywhere else (SAIX ADSL, Mweb ADSL and Vodacom 3G).  So the first things I started noticing yesterday was ssh connections going something down these lines (serenity is my local machine, linux.delter.co.za a relatively big mail server from one of my clients):</p>

<div class="wp_syntax"><div class="code"><pre class="txt" style="font-family:monospace;">jkroon@serenity ~ $ ssh root@linux.delter.co.za 
ssh_exchange_identification: read: Connection reset by peer</pre></div></div>

<p>Now my employees knows, if I can&#8217;t ssh and it&#8217;s your fault, you&#8217;re going to get it.  Firstly I will hunt you down, then I will do things which cannot be considered polite, and if you&#8217;re name is bigger than mine and my client believes that because your name is bigger than mine that implies you&#8217;re right and I&#8217;m wrong I will ensure that I prove them wrong and make very sure that they understand that I don&#8217;t take these things lightly.  Well, not when it affects my work anyway, but I do understand if things breaks periodically, but at the moment I can&#8217;t even browse and in excess of 95 % of the connections I&#8217;m pushing out over my Cell C SIM is outright being reset.</p>
<p>So after seeing the above for for approximately 3 out of 5 connections this morning whilst sitting in a data center in johannesburg I just ran a tcpdump in a different shell on serenity to see what happens:</p>

<div class="wp_syntax"><div class="code"><pre class="txt" style="font-family:monospace;">08:53:56.835503 IP 196.35.70.139.ssh &gt; 10.213.51.133.47019:
    Flags [R.], seq 2902460094, ack 3806794395, win 199,
    options [nop,nop,TS val 26312228 ecr 4876698], length 0</pre></div></div>

<p>Now I KNOW the way I set up my servers.  And if my server is in fact generating that RST then there is something severely broken.  But I&#8217;m getting this from different servers.  So, having had one of the craziest weekends for a while going on I decided to push this to the back of my mind and concentrate on more urgent matters.  It&#8217;s only about 30 minutes back that I wanted to quickly check mail, browse a bit and just unwind a little that I couldn&#8217;t actually browse, ssh to my servers for a quick checkup after the weekend&#8217;s events and write an official complaint to a certain hosting company that I decided enough is enough.  Got a jump box and ssh&#8217;ed via another route to linux.delter.co.za (and no surprises) it worked flawlessly.  Fire up pppd and add a route for linux.delter.co.za over that, fire up tcpdump on both ends and I get this, first on serenity (sorry for the horizontal scrolling, and also note that the time on my laptop is out by ~30 minutes due to ntp failing and the CMOS on this Lenovo being of the ultra crappy kind):</p>

<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
</pre></td><td class="code"><pre class="txt" style="font-family:monospace;">18:46:14.706636 IP 10.212.100.200.46247 &gt; 196.35.70.139.ssh: Flags [S], seq 8197450, win 5840, options [mss 1460,sackOK,TS val 187563 ecr 0,nop,wscale 7], length 0
18:46:15.326350 IP 196.35.70.139.ssh &gt; 10.212.100.200.46247: Flags [S.], seq 1497501202, ack 8197451, win 5792, options [mss 1460,sackOK,TS val 2837150 ecr 187563,nop,wscale 6], length 0
18:46:15.326445 IP 10.212.100.200.46247 &gt; 196.35.70.139.ssh: Flags [.], ack 1, win 46, options [nop,nop,TS val 187626 ecr 2837150], length 0
18:46:15.659155 IP 196.35.70.139.ssh &gt; 10.212.100.200.46247: Flags [P.], seq 1:21, ack 1, win 91, options [nop,nop,TS val 2837190 ecr 187626], length 20
18:46:15.659267 IP 10.212.100.200.46247 &gt; 196.35.70.139.ssh: Flags [.], ack 21, win 46, options [nop,nop,TS val 187659 ecr 2837190], length 0
18:46:15.659458 IP 10.212.100.200.46247 &gt; 196.35.70.139.ssh: Flags [P.], seq 1:22, ack 21, win 46, options [nop,nop,TS val 187659 ecr 2837190], length 21
18:46:15.969153 IP 196.35.70.139.ssh &gt; 10.212.100.200.46247: Flags [.], ack 22, win 91, options [nop,nop,TS val 2837221 ecr 187659], length 0
18:46:15.969221 IP 10.212.100.200.46247 &gt; 196.35.70.139.ssh: Flags [P.], seq 22:814, ack 21, win 46, options [nop,nop,TS val 187690 ecr 2837221], length 792
18:46:16.349157 IP 196.35.70.139.ssh &gt; 10.212.100.200.46247: Flags [P.], seq 21:805, ack 22, win 91, options [nop,nop,TS val 2837221 ecr 187659], length 784
18:46:16.386434 IP 10.212.100.200.46247 &gt; 196.35.70.139.ssh: Flags [.], ack 805, win 58, options [nop,nop,TS val 187732 ecr 2837221], length 0
18:46:16.599149 IP 196.35.70.139.ssh &gt; 10.212.100.200.46247: Flags [.], ack 814, win 116, options [nop,nop,TS val 2837284 ecr 187690], length 0
18:46:16.599227 IP 10.212.100.200.46247 &gt; 196.35.70.139.ssh: Flags [P.], seq 814:838, ack 805, win 58, options [nop,nop,TS val 187753 ecr 2837284], length 24
18:46:16.926374 IP 196.35.70.139.ssh &gt; 10.212.100.200.46247: Flags [.], ack 838, win 116, options [nop,nop,TS val 2837315 ecr 187753], length 0
18:46:16.986396 IP 196.35.70.139.ssh &gt; 10.212.100.200.46247: Flags [P.], seq 805:957, ack 838, win 116, options [nop,nop,TS val 2837315 ecr 187753], length 152
18:46:16.986458 IP 10.212.100.200.46247 &gt; 196.35.70.139.ssh: Flags [.], ack 957, win 71, options [nop,nop,TS val 187792 ecr 2837315], length 0
18:46:16.988362 IP 10.212.100.200.46247 &gt; 196.35.70.139.ssh: Flags [P.], seq 838:982, ack 957, win 71, options [nop,nop,TS val 187792 ecr 2837315], length 144
18:46:17.477898 IP 196.35.70.139.ssh &gt; 10.212.100.200.46247: Flags [.], ack 982, win 140, options [nop,nop,TS val 2837372 ecr 187792], length 0
18:46:17.798919 IP 196.35.70.139.ssh &gt; 10.212.100.200.46247: Flags [P.], seq 957:1677, ack 982, win 140, options [nop,nop,TS val 2837372 ecr 187792], length 720
18:46:17.801802 IP 10.212.100.200.46247 &gt; 196.35.70.139.ssh: Flags [P.], seq 982:998, ack 1677, win 83, options [nop,nop,TS val 187873 ecr 2837372], length 16
18:46:18.048892 IP 196.35.70.139.ssh &gt; 10.212.100.200.46247: Flags [R.], seq 1677, ack 982, win 0, length 0
18:46:18.088924 IP 196.35.70.139.ssh &gt; 10.212.100.200.46247: Flags [R.], seq 1677, ack 998, win 0, length 0</pre></td></tr></table></div>

<p>And on linux.delter.co.za:</p>

<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
</pre></td><td class="code"><pre class="txt" style="font-family:monospace;">20:17:52.448213 IP 41.157.80.24.57790 &gt; linux.delter.co.za.ssh: S 8197450:8197450(0) win 5840 &lt;mss 1460,sackOK,timestamp 187563[|tcp]&gt;
20:17:52.448247 IP linux.delter.co.za.ssh &gt; 41.157.80.24.57790: S 1497501202:1497501202(0) ack 8197451 win 5792 &lt;mss 1460,sackOK,timestamp 2837150[|tcp]&gt;
20:17:52.840000 IP 41.157.80.24.57790 &gt; linux.delter.co.za.ssh: . ack 1 win 46 &lt;nop,nop,timestamp 187626 2837150&gt;
20:17:52.846789 IP linux.delter.co.za.ssh &gt; 41.157.80.24.57790: P 1:21(20) ack 1 win 91 &lt;nop,nop,timestamp 2837190 187626&gt;
20:17:53.079622 IP 41.157.80.24.57790 &gt; linux.delter.co.za.ssh: . ack 21 win 46 &lt;nop,nop,timestamp 187659 2837190&gt;
20:17:53.161323 IP 41.157.80.24.57790 &gt; linux.delter.co.za.ssh: P 1:22(21) ack 21 win 46 &lt;nop,nop,timestamp 187659 2837190&gt;
20:17:53.161345 IP linux.delter.co.za.ssh &gt; 41.157.80.24.57790: . ack 22 win 91 &lt;nop,nop,timestamp 2837221 187659&gt;
20:17:53.162056 IP linux.delter.co.za.ssh &gt; 41.157.80.24.57790: P 21:805(784) ack 22 win 91 &lt;nop,nop,timestamp 2837221 187659&gt;
20:17:53.750503 IP 41.157.80.24.57790 &gt; linux.delter.co.za.ssh: P 22:814(792) ack 21 win 46 &lt;nop,nop,timestamp 187690 2837221&gt;
20:17:53.784763 IP linux.delter.co.za.ssh &gt; 41.157.80.24.57790: . ack 814 win 116 &lt;nop,nop,timestamp 2837284 187690&gt;
20:17:53.839959 IP 41.157.80.24.57790 &gt; linux.delter.co.za.ssh: . ack 805 win 58 &lt;nop,nop,timestamp 187732 2837221&gt;
20:17:54.100058 IP 41.157.80.24.57790 &gt; linux.delter.co.za.ssh: P 814:838(24) ack 805 win 58 &lt;nop,nop,timestamp 187753 2837284&gt;
20:17:54.100072 IP linux.delter.co.za.ssh &gt; 41.157.80.24.57790: . ack 838 win 116 &lt;nop,nop,timestamp 2837315 187753&gt;
20:17:54.102989 IP linux.delter.co.za.ssh &gt; 41.157.80.24.57790: P 805:957(152) ack 838 win 116 &lt;nop,nop,timestamp 2837315 187753&gt;
20:17:54.479858 IP 41.157.80.24.57790 &gt; linux.delter.co.za.ssh: . ack 957 win 71 &lt;nop,nop,timestamp 187792 2837315&gt;
20:17:54.630280 IP 41.157.80.24.57790 &gt; linux.delter.co.za.ssh: P 838:982(144) ack 957 win 71 &lt;nop,nop,timestamp 187792 2837315&gt;
20:17:54.664784 IP linux.delter.co.za.ssh &gt; 41.157.80.24.57790: . ack 982 win 140 &lt;nop,nop,timestamp 2837372 187792&gt;
20:17:54.673005 IP linux.delter.co.za.ssh &gt; 41.157.80.24.57790: P 957:1677(720) ack 982 win 140 &lt;nop,nop,timestamp 2837372 187792&gt;
20:17:55.236939 IP 41.157.80.24.57790 &gt; linux.delter.co.za.ssh: R 982:982(0) ack 1677 win 0</pre></td></tr></table></div>

<p>Upon initial inspection I have to say, I don&#8217;t see the tell-tale signs of tcp splicing as I did with Vodacom.  There doesn&#8217;t appear to be any sequence number adjustments.  There is some NAT going on which isn&#8217;t desirable (and Vodacom moved away from using NAT once they&#8217;re user base started getting beyond a certain point because &#8220;it didn&#8217;t scale&#8221; according to one of their lead technicians).</p>
<p>When I say I can&#8217;t find signs of tampering I really mean it.  Looking at the above you&#8217;ll see there is 19 packets on linux.delter.co.za and 21 on serenity.  The first 18 of both these traces ARE IDENTICAL (other than the NAT&#8217;ed IP).  After this 18th packet the server side receives an RST packet directly after it sent the data for 957:1677, along with a correct ACK for 1677.  The client side receives this data, and no surprisingly doesn&#8217;t actually respond with an RST but instead with an ACK.  Directly after sending this ACK it receives two identical RST packets, which again, has not been sent by the server.</p>
<p>So I ask this &#8211; who is generating these RST packets?  Who can I have beaten with a blunt object?  I want to unwind &#8211; it&#8217;s been a bad weekend with this little cherry on top.</p>
<p class="facebook"><a href="http://www.facebook.com/share.php?u=http://jkroon.blogs.uls.co.za/it/security/cell-c-following-in-the-footsteps-of-vodacom" target="_blank"><img src="http://jkroon.blogs.uls.co.za/wp-content/plugins/add-to-facebook-plugin/facebook_share_icon.gif" alt="Share on Facebook" title="Share on Facebook" /></a><a href="http://www.facebook.com/share.php?u=http://jkroon.blogs.uls.co.za/it/security/cell-c-following-in-the-footsteps-of-vodacom" target="_blank" title="Share on Facebook">Share on Facebook</a></p>]]></content:encoded>
			<wfw:commentRss>http://jkroon.blogs.uls.co.za/it/security/cell-c-following-in-the-footsteps-of-vodacom/feed</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Capped, Uncapped and Unmetered</title>
		<link>http://jkroon.blogs.uls.co.za/it/networking/capped-uncapped-and-unmetered</link>
		<comments>http://jkroon.blogs.uls.co.za/it/networking/capped-uncapped-and-unmetered#comments</comments>
		<pubDate>Sun, 04 Jul 2010 10:13:27 +0000</pubDate>
		<dc:creator>Jaco Kroon</dc:creator>
				<category><![CDATA[Networking]]></category>

		<guid isPermaLink="false">http://jkroon.blogs.uls.co.za/?p=292</guid>
		<description><![CDATA[Recently we&#8217;ve seen an explosion of uncapped accounts entering the market.  We&#8217;ve also seen that they are typically horribly slow in comparison to capped accounts &#8211; and if one goes and reads most of the acceptable use policies it becomes clear that they are in fact not uncapped, but rather, severely shaped capped accounts.

Shaping [...]]]></description>
			<content:encoded><![CDATA[<p>Recently we&#8217;ve seen an explosion of uncapped accounts entering the market.  We&#8217;ve also seen that they are typically horribly slow in comparison to capped accounts &#8211; and if one goes and reads most of the acceptable use policies it becomes clear that they are in fact not uncapped, but rather, severely shaped capped accounts.<br />
<span id="more-292"></span><br />
Shaping implies that certain traffic classes will get priority over others, so for example http traffic and smtp to their local services will be OK&#8217;ish and mostly everything else will be rather bad.  This, dear providers, is NOT broadband.  This is abuse of the word uncapped and you should clearly mark your products for what they are.  If you have a &#8220;soft cap&#8221; of 30GB &#8211; that is still a cap.  If you have a (like IS&#8217;s newly released business class uncapped accounts) floating CAP &#8211; state it (thanks IS, at least you don&#8217;t try to mislead here, 20 GB over 10-days floating CAP with the top 20 % of users throttled sounds reasonable).</p>
<p>Now, the price tag on the IS uncapped account mentioned above is slightly in excess of R2000 ex VAT.  For my office I&#8217;ll need two of those accounts (we run through approximately 100GB worth of traffic each month and I&#8217;m NOT willing to sacrifice on quality of the service).  For those doing the math already, R4400 vs R5000 &#8230; yea.  If those were the complete facts I might consider switching and doing some careful load balancing over the two accounts to try and stay out of the top 20 % as well as to remain under the floating cap.  However, the price difference isn&#8217;t severe enough for me to honestly consider it, and I&#8217;ve got another trick up my sleave:  split routing.</p>
<p>I&#8217;ve approximately three years ago figured out how to separate local and international bandwidth at the client premises into two separate accounts.   This allows me to utilize cheap local-only ADSL accounts for local bandwidth, and normal blended ADSL accounts for my international traffic.  Seeing that our split is about 60 % local 40 % international this means that my cost ends up being around R2500 to R3000 per month, ex VAT for my bandwidth every month.  And I get this at the same quality that you&#8217;ve come to expect from SAIX&#8217;s ADSL accounts.  No frills, no fuss, good international latencies (usually at around 250 to 300 ms) and excellent local latencies of as low as 10ms (compared to around 25 to 30 on IS accounts).</p>
<p>Even three years ago this was beneficial, and I thought that this concept was going to be killed when the uncapped accounts started entering the market &#8230; yet the opposite has become true &#8211; I&#8217;ve now got even more inquiries asking WHY uncapped is so bad, and what alternatives are there.  And this is only from a &#8220;consumer dsl&#8221; perspective (ie, sme and home market).<br />
When one starts looking at data centres the costs associated with bandwidth starts looking even worse.  No more el-cheapo ADSL (yes, trust me when I tell you ADSL bandwidth is EXTREMELY cheap).  Now you have to start getting things like metro-ethernet.  You need to start buying transit.  If you&#8217;re hosting you&#8217;re most likely paying per GB over a certain thresshold (eg, first 3GB for your server is included with monthly and after that you&#8217;re paying per GB).  If you&#8217;re the hosting environment you&#8217;re most likely buying bandwidth in per mega-bit chunks.<br />
No matter in which of these arenas you&#8217;re playing there is NO SUCH THING as uncapped bandwidth.  Either you&#8217;re being limited by an artificial cap such as you can use 3GB at any rate you please (which is also a lie as the upstream BW has a limit in terms of bits per second) or you&#8217;re being limited by the bits per second.</p>
<p>What uncapped really means is unmetered.  In other words:  We will allow you to consume bandwidth at an average of X bits per second, and we won&#8217;t actually (for billing purposes) measure how many bytes you push over the link.  This means immediately that you pay for capacity instead of per byte.  It&#8217;s also possibly to buy such unmetered solutions in an oversubscribed manner, for example, you can buy &#8220;gold&#8221; or &#8220;silver&#8221; transit from SAIX, &#8220;gold&#8221; means that you will have a contention ratio of 1:1 (meaning you will always be able to use your full capacity), or with silver you can get a contention ratio of 3:1 &#8211; which means that permitting that the other people aren&#8217;t consuming bandwidth you can burst up to your pipe size, but you&#8217;re only guaranteed of a third of it.  Either way &#8211; 4Mbps of this is likely to make you understand why uncapped ADSL is a bad idea.</p>
<p class="facebook"><a href="http://www.facebook.com/share.php?u=http://jkroon.blogs.uls.co.za/it/networking/capped-uncapped-and-unmetered" target="_blank"><img src="http://jkroon.blogs.uls.co.za/wp-content/plugins/add-to-facebook-plugin/facebook_share_icon.gif" alt="Share on Facebook" title="Share on Facebook" /></a><a href="http://www.facebook.com/share.php?u=http://jkroon.blogs.uls.co.za/it/networking/capped-uncapped-and-unmetered" target="_blank" title="Share on Facebook">Share on Facebook</a></p>]]></content:encoded>
			<wfw:commentRss>http://jkroon.blogs.uls.co.za/it/networking/capped-uncapped-and-unmetered/feed</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>The Great Wall of Vodacom &#8211; FAIL</title>
		<link>http://jkroon.blogs.uls.co.za/it/security/the-great-wall-of-vodacom-fail</link>
		<comments>http://jkroon.blogs.uls.co.za/it/security/the-great-wall-of-vodacom-fail#comments</comments>
		<pubDate>Tue, 25 May 2010 09:35:05 +0000</pubDate>
		<dc:creator>Jaco Kroon</dc:creator>
				<category><![CDATA[Networking]]></category>
		<category><![CDATA[Security]]></category>

		<guid isPermaLink="false">http://jkroon.blogs.uls.co.za/?p=254</guid>
		<description><![CDATA[Right, so Kevin (one of my staff) had the savvy to take a few tcpdump traces on both the client and the server side of a failed PPtP VPN connection over the weekend.  The result?  It seems the great firewall of Vodacom has yet again taken another victem.
I&#8217;m not sure whether this is [...]]]></description>
			<content:encoded><![CDATA[<p>Right, so Kevin (one of my staff) had the savvy to take a few tcpdump traces on both the client and the server side of a failed PPtP VPN connection over the weekend.  The result?  It seems the great firewall of Vodacom has yet again taken another victem.<span id="more-254"></span></p>
<p>I&#8217;m not sure whether this is a result of too little testing, total ignorance or just incompetence.  Either way, it would seem it&#8217;s a bit of a race condition, and hits something similar to what we in the office refer to as the &#8220;connection tracking bit bucket&#8221;.  Basically it seem most connection tracking implementations (when combined with a state full firewall such as that used by Vodacom &#8211; as per their own admission in their <a href="/it/security/vodacom-responds">last letter</a> to me) results in certain flows being prematurely marked as &#8220;invalid&#8221;.  In particular in the example that Kevin has captured for me the server ends up being the first entity to send a GRE packet, this then gets (or got, seeing that it&#8217;s fixed again) intercepted by the firewall, perceived as an inbound connection to the client and the uni-directional flow gets marked as invalid.  When the client now sends GRE traffic to the server this gets allowed, but the return traffic still bites the &#8220;invalid&#8221; mark.  I can only speculate as to the exact state (seeing that Vodacom doesn&#8217;t reveal exactly what software they are using &#8211; probably proprietary anyway) of things, making it difficult.  This I will attempt to speculate as objectively as possible (not always easy).</p>
<p>Seeing that there are two entities involved in this dump, and I want to do a side-by side comparison, some ASCII art is in order.  Essentially three columns being used, the sending agent will indicate what is being sent, and if it was received by the destination I&#8217;ll mark that column with an ACK.  I&#8217;ll also add (R) to retransmits on the sending side.  The ISN mods still applies to the TCP connections, however, the data itself isn&#8217;t being tampered with in this case.  Note that in the GRE traffic case there are still ACK packets being sent by the server, these ACK packets however goes lost (as indicated in the packet sequence).</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">Client                Direction  Server
SYN                        -&gt;    ACK
ACK                        &lt;-    SYN
PPTP (Start Req)           -&gt;    ACK
ACK                        &lt;-    PPTP (Start Resp)
                           &lt;-    GRE (PPP-LCP Conf Req)
GRE (PPP-LCP Conf Req) (R) -&gt;    ACK (goes lost)
GRE (PPP-LCP Conf Req) (R) -&gt;    ACK (goes lost)
... a few more of these ...
GRE (PPP-LCP Conf Req) (R) -&gt;    ACK (goes lost)
PPTP (Call Clear Req)      -&gt;    ACK</pre></div></div>

<p>Once the Call Clear Req is received TCP/IP teardowns happens, surprisingly without the flurry of injected RST packets I&#8217;ve growned accustomed to, just a single out-of-order delivery between one ACK and FIN/ACK packet.</p>
<p>What I would want (not sure what resolution they picked) is for them to either perform a routine inspection of the PPTP control traffic (specifically the Start Request and Start Reply packets) to determine the GRE traffic parameters (based on what I can see, just mark the fact that GRE is to be expected between the two given end points) and allow that traffic, or, stop this firewalling nonsense.  It&#8217;s only the Cellular &#8220;ISPs&#8221; performing actions such as these.  The arguments for providing this protection is sound.  But then it needs to be done sanely.  For the most part I&#8217;ll have to admit that the firewall works and doesn&#8217;t cause too many problems.</p>
<p>Seeing that the problem has been resolved by Vodacom, I&#8217;ll let it rest, for now.</p>
<p class="facebook"><a href="http://www.facebook.com/share.php?u=http://jkroon.blogs.uls.co.za/it/security/the-great-wall-of-vodacom-fail" target="_blank"><img src="http://jkroon.blogs.uls.co.za/wp-content/plugins/add-to-facebook-plugin/facebook_share_icon.gif" alt="Share on Facebook" title="Share on Facebook" /></a><a href="http://www.facebook.com/share.php?u=http://jkroon.blogs.uls.co.za/it/security/the-great-wall-of-vodacom-fail" target="_blank" title="Share on Facebook">Share on Facebook</a></p>]]></content:encoded>
			<wfw:commentRss>http://jkroon.blogs.uls.co.za/it/security/the-great-wall-of-vodacom-fail/feed</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>FQDN &#8211; ala Fully Qualified Domain Name</title>
		<link>http://jkroon.blogs.uls.co.za/it/networking/fqdn-ala-fully-qualified-domain-name</link>
		<comments>http://jkroon.blogs.uls.co.za/it/networking/fqdn-ala-fully-qualified-domain-name#comments</comments>
		<pubDate>Tue, 09 Feb 2010 12:43:31 +0000</pubDate>
		<dc:creator>Jaco Kroon</dc:creator>
				<category><![CDATA[Networking]]></category>

		<guid isPermaLink="false">http://jkroon.blogs.uls.co.za/?p=229</guid>
		<description><![CDATA[Right, so I conceptually know what this is, but how is it determined?  What&#8217;s the impact?

As far as I can tell most applications couldn&#8217;t really care about the fqdn, exceptions would be email, and apache likes having one.  The FQDN is basically the full name that a remote machine will use to recognize [...]]]></description>
			<content:encoded><![CDATA[<p>Right, so I conceptually know what this is, but how is it determined?  What&#8217;s the impact?<br />
<span id="more-229"></span><br />
As far as I can tell most applications couldn&#8217;t really care about the fqdn, exceptions would be email, and apache likes having one.  The FQDN is basically the full name that a remote machine will use to recognize your machine with.  Eg, my VoIP server sits at 196.35.70.140 but users use voip.uls.co.za to resolve it.  If you do a reverse resolution of the IP then the name is r2d2.uls.co.za &#8211; and it&#8217;s this part that is important for us, the FQDN for that machine is r2d2.uls.co.za, it&#8217;s hostname is r2d2 with it&#8217;s dns domain name uls.co.za.</p>
<p>So how does Linux determine this?  Dead simple actually.</p>
<ol>
<li>It does a forward lookup of it&#8217;s hostname (r2d2 in this case), including any domain and search domains configured in /etc/resolv.conf, if nsswitch actually reaches DNS lookups.
<li>It does a reverse lookup of the obtained IP and this result is used as the FQDN.
</ol>
<p>Complex?  Not really, until you start looking at what all needs to work in order for this to be correct.  A common thing done by many a sysadmin is to in /etc/hosts modify this line:</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">127.0.0.1   localhost</pre></div></div>

<p>To:</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">127.0.0.1   localhost hostname</pre></div></div>

<p>Well, this will result in hostname -f giving you &#8220;localhost&#8221; as the FQDN.  That&#8217;s broken.  The correct line above (for r2d2) would be:</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">127.0.0.1 r2d2.uls.co.za r2d2 localhost localhost.localdomain</pre></div></div>

<p>That is overly complete and localhost.localdomain can probably be left out.  There is another way though, because let&#8217;s face it, 127.0.0.1 isn&#8217;t our public IP.  So here we go, with the &#8220;correct&#8221; way:</p>
<ul>
<li>Set the domain/search lines in /etc/resolv.conf correctly.  Basically I&#8217;d recommend leaving &#8220;search&#8221; alone unless you want to search multiple domains for a &#8220;short&#8221; name, and only set up &#8220;domain&#8221;.
<li>Make sure that there is a properly configured PTR record for you public IP.
<li>Make sure that hostname.${domain} resolves to your, and only your, public IP.
</ul>
<p>Eg, in my /etc/resolv.conf I&#8217;ve got &#8220;domain uls.co.za&#8221; along with my nameservers (ok, 127.0.0.1 which points to a djb dnscache).  Then my hostname is set to &#8220;r2d2&#8243;.  With this configuration when I resolve r2d2 I get an IP of 196.35.70.140, then when I reverse-lookup that IP then I get r2d2.uls.co.za which is my FQDN.  Forward looking up r2d2.uls.co.za again resolves to that IP which is exactly what I want.</p>
<p class="facebook"><a href="http://www.facebook.com/share.php?u=http://jkroon.blogs.uls.co.za/it/networking/fqdn-ala-fully-qualified-domain-name" target="_blank"><img src="http://jkroon.blogs.uls.co.za/wp-content/plugins/add-to-facebook-plugin/facebook_share_icon.gif" alt="Share on Facebook" title="Share on Facebook" /></a><a href="http://www.facebook.com/share.php?u=http://jkroon.blogs.uls.co.za/it/networking/fqdn-ala-fully-qualified-domain-name" target="_blank" title="Share on Facebook">Share on Facebook</a></p>]]></content:encoded>
			<wfw:commentRss>http://jkroon.blogs.uls.co.za/it/networking/fqdn-ala-fully-qualified-domain-name/feed</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Linux, 3G, bluetooth and DUN</title>
		<link>http://jkroon.blogs.uls.co.za/it/networking/linux-3g-bluetooth-and-dun</link>
		<comments>http://jkroon.blogs.uls.co.za/it/networking/linux-3g-bluetooth-and-dun#comments</comments>
		<pubDate>Wed, 20 Jan 2010 21:37:43 +0000</pubDate>
		<dc:creator>Jaco Kroon</dc:creator>
				<category><![CDATA[Networking]]></category>

		<guid isPermaLink="false">http://jkroon.blogs.uls.co.za/?p=207</guid>
		<description><![CDATA[So I got this working on my older Nokia 3120, then got an E52 and then had to go through all of this again.  So I decided that&#8217;s that, I need a reference.  Posted here because it&#8217;s least likely to go away, and easiest place to find it again.  Also, other people [...]]]></description>
			<content:encoded><![CDATA[<p>So I got this working on my older Nokia 3120, then got an E52 and then had to go through all of this again.  So I decided that&#8217;s that, I need a reference.  Posted here because it&#8217;s least likely to go away, and easiest place to find it again.  Also, other people may find this useful.<br />
<span id="more-207"></span><br />
First things first, you obviously need the Linux bluetooth stack.  Look for a package containing &#8220;bluez&#8221;, on Gentoo Linux there is two sets, the first is bluez-utils and a bunch of other dependencies, and &#8220;bluez&#8221; (which from what I can tell is the newer stuff).  Once you&#8217;ve got this installed you should probably try and start the bluetooth daemon to see if it starts, also whether the drivers are loaded etc &#8230;</p>
<p>Finding the Bluetooth drivers for your chip is not something I&#8217;m going to discuss, lspci and lshw is your friends in this respect, and most distro&#8217;s will auto-load kernel modules based on available hardware.  If you can type &#8220;ip ad sh&#8221; and see a pan0 device you&#8217;re probably good to go.</p>
<p>At this point I suggest looking at /etc/bluetooth and editing as you see fit (starting with main.conf, probably only want to edit Name = &#8230; and even that&#8217;s a maybe).</p>
<p>To find out whether all actually works as expected, you should be able to run &#8220;hcitool scan&#8221;, this should output something like:</p>

<div class="wp_syntax"><div class="code"><pre class="txt" style="font-family:monospace;"># hcitool scan
Scanning ...
        34:7E:39:62:7B:65       Reboot
#</pre></div></div>

<p>Reboot is my new Nokia phone.  The MAC-addr like number on the left is called a bdaddr.  Get that string in your cut buffer &#8211; you&#8217;re going to need it a lot.</p>
<p>Now you need to &#8220;pair&#8221; your phone with your laptop.  This is the hardest part of the whole exercise and depends on which of the tools you&#8217;re using.  I had quite a lot of trouble doing this from command-line only and eventually just installed gnome-bluetooth and used the wizard to pair, then on my phone set my laptop to authorized to allow it to connect without having to go through the whole pairing process every time.  On the laptop side I didn&#8217;t actually change anything or had to set things explicitly.</p>
<p>The next step is to locate the bluetooth &#8220;endpoint&#8221; on the phone to use for dial-up networking.  This is done using sdptool:</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;"># sdptool search --bdaddr 34:7E:39:62:7B:65 DUN
Searching for DUN on 34:7E:39:62:7B:65 ...
Service Name: Dial-Up Networking
Service RecHandle: 0x10047
Service Class ID List:
  &quot;Dialup Networking&quot; (0x1103)
Protocol Descriptor List:
  &quot;L2CAP&quot; (0x0100)
  &quot;RFCOMM&quot; (0x0003)
    Channel: 5
Language Base Attr List:
  code_ISO639: 0x454e
  encoding:    0x6a
  base_offset: 0x100
Profile Descriptor List:
  &quot;Dialup Networking&quot; (0x1103)
    Version: 0x0100
&nbsp;
#</pre></div></div>

<p>The important bit there is Channel: 5.  Now edit /etc/bluetooth/rfcomm.conf, in particular we want to create a serial device called rfcomm0, that always binds to our phone, on the correct channel.  The config file ends up looking like this:</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">rfcomm0 {
    bind yes;
    device 34:7E:39:62:7B:65;
    channel 5;
    comment &quot;DUN on Reboot&quot;;
}</pre></div></div>

<p>Now restart bluetooth again and you should now have a rfcomm0 device node in /dev/.  If you do, try opening two terminals, in the one, issue cat /dev/rfcomm0, and in the other issue &#8220;echo ATZ > /dev/rfcomm0&#8243;, the cat side should echo back your ATZ and hopefully not spit too much gunk at you.  From here on it&#8217;s pretty simple.  I use my chat perl script from my <a href="http://jkroon.blogs.uls.co.za/it/networking/3g-pins-prompts-and-pppd">3G, PINs, prompts and pppd</a> entry, along with this file in /etc/ppp/peers/:</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">/dev/rfcomm0
linkname 3g-rfcomm0
defaultmetric 5000
&nbsp;
# Don't require the peer to authenticate (this _seems_ to get the connection going in a shorter time)
noauth
&nbsp;
# If you want more detailed logs, enable this (it doesn't generate that much
# noise, but assists a lot in finding problems):
#debug
&nbsp;
# Make this one second since the modem probably has to be re-initialized at least
# once.
holdoff 1
&nbsp;
# modem initialization
connect /usr/local/sbin/uls_3g_connect.pl
&nbsp;
# We probably want to use the DNS as advertized by the peer
usepeerdns
&nbsp;
# Use this link as the default gateway
defaultroute
&nbsp;
# Inform out ip-up script about what this is:
ipparam &quot;type=intl dnsroutes=yes routemetric=10&quot;
&nbsp;
# 3G doesn't like all kinds of compression ... which they did - it's slow
# enough.
noccp
nobsdcomp
novj
&nbsp;
# Make the connection persistent, and not terminate if/when errors occur.
persist
maxfail 0</pre></div></div>

<p>There is no real difference between this peers file and the one in 3g from the blog mentioned above.  This one just adds a linkname and explicit reference to the rfcomm0 device.  I saved this file as bluetooth.  Now you should be able to run &#8220;pon bluetooth&#8221;, or more directly &#8220;pppd call bluetooth&#8221; or even &#8220;pppd call bluetooth debug nodetach&#8221; if you prefer.</p>
<p class="facebook"><a href="http://www.facebook.com/share.php?u=http://jkroon.blogs.uls.co.za/it/networking/linux-3g-bluetooth-and-dun" target="_blank"><img src="http://jkroon.blogs.uls.co.za/wp-content/plugins/add-to-facebook-plugin/facebook_share_icon.gif" alt="Share on Facebook" title="Share on Facebook" /></a><a href="http://www.facebook.com/share.php?u=http://jkroon.blogs.uls.co.za/it/networking/linux-3g-bluetooth-and-dun" target="_blank" title="Share on Facebook">Share on Facebook</a></p>]]></content:encoded>
			<wfw:commentRss>http://jkroon.blogs.uls.co.za/it/networking/linux-3g-bluetooth-and-dun/feed</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>ibdriver and the 2.6.31 kernel (iBurst Linux drivers)</title>
		<link>http://jkroon.blogs.uls.co.za/it/networking/ibdriver-and-the-2631-kernel-iburst-linux-drivers</link>
		<comments>http://jkroon.blogs.uls.co.za/it/networking/ibdriver-and-the-2631-kernel-iburst-linux-drivers#comments</comments>
		<pubDate>Sat, 31 Oct 2009 08:23:03 +0000</pubDate>
		<dc:creator>Jaco Kroon</dc:creator>
				<category><![CDATA[Networking]]></category>
		<category><![CDATA[drivers]]></category>
		<category><![CDATA[iBurst]]></category>
		<category><![CDATA[kernel]]></category>

		<guid isPermaLink="false">http://jkroon.blogs.uls.co.za/?p=195</guid>
		<description><![CDATA[So in the 2.6.31 kernel the older (deprecated) network API finally got removed &#8211; biting quite a number of people rather badly.  I can think of at two projects that&#8217;s problematic due to this:
1.  The ibdriver package &#8211; used for the iBurst usb and pcmcia devices.
2.  The dahdi 2.0.x drivers used for [...]]]></description>
			<content:encoded><![CDATA[<p>So in the 2.6.31 kernel the older (deprecated) network API finally got removed &#8211; biting quite a number of people rather badly.  I can think of at two projects that&#8217;s problematic due to this:</p>
<p>1.  The <a title="ib wireless broadband driver" href="http://sourceforge.net/projects/ibdriver/" target="_blank">ibdriver</a> package &#8211; used for the iBurst usb and pcmcia devices.<br />
2.  The dahdi 2.0.x drivers used for telephony in Asterisk.</p>
<p>The latter isn&#8217;t that serious a problem as I really need to move to dahdi-2.2.x anyway, the ibdriver however caused me some embarrasment as I plugged in the usb device, downloaded the drivers and &#8230; it didn&#8217;t compile.  Oops.  So I decided it&#8217;s time to return to some of my older roots and just make the driver work &#8211; and that&#8217;s exactly what I did.</p>
<p><span id="more-195"></span>This morning, after reading a blog post yesterday about it taking a guy two hours (without posting patches) I set aside three, and figured, at a minimum I&#8217;ll remember why I don&#8217;t do kernel hacking for a living.Â  That happened, but it also only took me about 30 minutes to build a patch for running the iburst drivers against the 2.6.31 kernel.Â  The main changes are that you&#8217;re now to use netdev_priv(dev) for accessing private data instead of dev-&gt;priv and a bunch of the device operations which has previously been part of the net_device struct is now in net_device_ops, with a few slight name changes.Â  The only other thing really is that one struct changed name in the pcmcia subsystem (config_info_t -&gt; socket_state_t).</p>
<p>With no further ado, you can find the patch <a href="http://jkroon.blogs.uls.co.za/wp-content/uploads/2009/10/ibdriver-134-linux-2628-2631.patch">here</a>.</p>
<p>Please note that whilst I&#8217;m busy running the code with this patch I can&#8217;t guarantee that there are no mistakes in it.Â  I just took the kernel headers, and updated the ibdriver code according to what I could tell from these headers, use at your own risk.</p>
<p class="facebook"><a href="http://www.facebook.com/share.php?u=http://jkroon.blogs.uls.co.za/it/networking/ibdriver-and-the-2631-kernel-iburst-linux-drivers" target="_blank"><img src="http://jkroon.blogs.uls.co.za/wp-content/plugins/add-to-facebook-plugin/facebook_share_icon.gif" alt="Share on Facebook" title="Share on Facebook" /></a><a href="http://www.facebook.com/share.php?u=http://jkroon.blogs.uls.co.za/it/networking/ibdriver-and-the-2631-kernel-iburst-linux-drivers" target="_blank" title="Share on Facebook">Share on Facebook</a></p>]]></content:encoded>
			<wfw:commentRss>http://jkroon.blogs.uls.co.za/it/networking/ibdriver-and-the-2631-kernel-iburst-linux-drivers/feed</wfw:commentRss>
		<slash:comments>8</slash:comments>
		</item>
		<item>
		<title>ARP Spoofing &#8211; a lost art?  Maybe not.</title>
		<link>http://jkroon.blogs.uls.co.za/it/security/arp-spoofing-a-lost-art-maybe-not</link>
		<comments>http://jkroon.blogs.uls.co.za/it/security/arp-spoofing-a-lost-art-maybe-not#comments</comments>
		<pubDate>Sun, 10 May 2009 10:09:21 +0000</pubDate>
		<dc:creator>Jaco Kroon</dc:creator>
				<category><![CDATA[Networking]]></category>
		<category><![CDATA[Security]]></category>

		<guid isPermaLink="false">http://jkroon.blogs.uls.co.za/?p=104</guid>
		<description><![CDATA[Just over a month back we had an incident where the default gateway on our servers would just sporadically stop responding, we first observed this as our servers sporadically just stopping to respond and only once we realized we could log on to other servers and during these &#8220;outages&#8221; we could still communicate with our [...]]]></description>
			<content:encoded><![CDATA[<p style="text-align: left;">Just over a month back we had an incident where the default gateway on our servers would just sporadically stop responding, we first observed this as our servers sporadically just stopping to respond and only once we realized we could log on to other servers and during these &#8220;outages&#8221; we could still communicate with our servers via our other servers (ie, we could access them from the local LAN but not anywhere else) did we start pointing fingers at the gateway.<span id="more-104"></span></p>
<p style="text-align: left;">At the time we figured this was a misconfigured or faulty gateway (Seeing as Internet Solutions is using HSVRP we figured these outages were just the &#8220;fail over&#8221; time).Â  Quintin however was told that our gateways are misconfigured and they told him he should be using x.y.z.253 and NOT x.y.z.1 as we were told during the installation of these servers.Â  Since then the Windows server which he was working on was not affected again and seeing as most of my Linux servers just keep running I didn&#8217;t think too much about this again until last weekend when I was actively working on one of them.Â  I re-reported the issue, making it clear that the explanations I were given previously was inaddequate and not acceptable (I were given answers such as someone accidentally bumped a cable etc &#8230;).</p>
<p style="text-align: left;">After running mtr traces from my dsl into the DC to three of the servers (web/mail, voip and windows) since reporting it at 15:54 (first response at 16:48 &#8211; reasonable response time for ONCE, thanks IS) I realized two things:</p>
<ol style="text-align: left;">
<li>~ 30% round-trip packet loss on the final hop, and around 0.4 &#8211; 0.7 % loss on the hops before that (meaning, probably the DSL side of things).</li>
<li>The Windows server was not affected at all!Â  This made no sense whatsoever to me until I realized it was using a different gateway.</li>
</ol>
<p style="text-align: left;">Either way, after sending these traces off to IS @ 17:25 &#8211; showing even higher loss on the last hop than the above 30 % (67 % and 57 % respectively), and later (18:40) passing on an update showing that I could still get responses from my servers internally inbetween each other in spite of not being able to get access to the outside world.Â  At 7:38 the Sunday I looked at the traces again and since the loss decreased to 5 % and 6 % I figured they must&#8217;ve restored sanity and fixed the problem.Â Â  I was wrong, very very wrong.</p>
<p style="text-align: left;">One of my clients phoned me around 8:45ish and said I should look at my email &#8211; they think I might want to look at it, it looks like my admin site has been compromised.Â  Of course I immediately loaded the site, just to see a funny bar at the top-left and all my fonts looking rather funky.Â  A few page source showed a &lt;script&gt; tag embedded right at the beginning of my html document, even before the &lt;!DOCTYPE declaration!Â  If you&#8217;ve ever seen me fire up a shell and ssh into a server you know it can be done quickly &#8211; I believe I broke all known records that morning &#8230; the grep was fired off so quickly on the php scripts I barely had time to think about it &#8230; nothing.Â  mysqldump with the grep &#8230; nothing, on /etc &#8230; nothing, /home &#8230; nothing (this took a while), / &#8230; nothing (this one took a LONG time).</p>
<p style="text-align: left;">So off I went to GLUG &#8230; ah, gotta love GLUG &#8230; and this thread ensued:Â  http://www.linux.org.za/Lists-Archives/glug-tech-0905/msg00009.html</p>
<p style="text-align: left;">As per the thread my initial thought was compromised proxy &#8230; however, seeing as I described the gateway problem first you can hopefully already guess that this was NOT the case. The post that can be located on http://www.linux.org.za/Lists-Archives/glug-tech-0905/msg00013.html was the breakthrough.Â  Specifically Quintin mentioned that he has issues loading certain pages and then I tried to load pages from other servers in the DC, and this in particular should explain:</p>
<blockquote style="text-align: left;">
<pre>Come to think of it - there is some correlation between the servers
that's available via our gateway and those that aren't.  I can reproduce
this "page hack" on the web pages that sporadically goes awol, but not
on those that doesn't (In our particular little subnet).  I wonder
whether those two are not perhaps related.  ARP spoofing anyone?  I
suspect this issue is going to be handed off to IS... sorry for the IS
guys on the list, but there is some work coming your way.</pre>
</blockquote>
<p style="text-align: left;">After this realization it became much easier to look for what I feared:Â  A router compromise.Â  So I started sniffing for arp traffic (inside of screen writing to a file), and it wasn&#8217;t long before I found sequences like this:</p>
<blockquote style="text-align: left;">
<p style="text-align: left;">12:57:07.871279 ARP, Reply x.y.z.1 is-at 00:11:43:dc:15:24 (oui Unknown), length 46<br />
12:57:07.884269 ARP, Reply x.y.z.1 is-at 00:11:43:dc:15:24 (oui Unknown), length 46<br />
12:57:07.889516 ARP, Reply x.y.z.1 is-at 00:11:43:dc:15:24 (oui Unknown), length 46<br />
12:57:08.433296 ARP, Reply x.y.z.1 is-at 00:00:0c:07:ac:0a (oui Cisco), length 46<br />
12:57:08.433350 ARP, Reply x.y.z.1 is-at 00:00:0c:07:ac:0a (oui Cisco), length 46</p></blockquote>
<p style="text-align: left;">That actually tells us three things, not just one:</p>
<ol style="text-align: left;">
<li>There is definitely somebody spoofing the router address, specifically, some device which thinks it&#8217;s a Dell router.</li>
<li>There is likely a loop on the physical network seeing as we&#8217;re receiving the same packet multiple times (It&#8217;s actually an IS engineer that pointed this one out for me).</li>
<li>IS is using HSRVP for their routers (which I already knew, but the MAC address from Cisco confirms this.</li>
</ol>
<p style="text-align: left;">At 19:50 I once more sent an email to IS, asking them once more to get serious with this since this was a security issue, I also had to spell out what exactly was going on (not something that comes overly easy to me).Â  So for those that have been raising their eyebrows at the above &#8211; the best analogy that I could come up with is something down this lines:</p>
<p style="text-align: left;">A LAN is essentially like a room full of people, where each person represents a computer.Â  Just about none of these people generally know each other, they cannot communicate with anybody outside of the room.Â  In order to communicate to the outside you&#8217;ve got to speak with a special person that stands in a door &#8211; you only have his name and you&#8217;ve got no way to confirm who&#8217;s standing in doors and who not &#8211; not even by asking them.Â  So if I want to send a message to sombody not in the room I basically call out:Â  Hey Mr Router &#8211; who are you? and then Mr Router is supposed to call back, hey you, here I am!Â  What&#8217;s happening above is that two people is calling back, so which one is the real Mr Router and which one is the impersonator?Â  You&#8217;ve got to make a choice and follow through.Â  If you pick the right one, your message goes where it&#8217;s supposed to, if you don&#8217;t, well, nasty things can happen, as in this story:</p>
<p style="text-align: left;">The computer that pretended to be the router took the message, looking for certain things in the message, modified the message and then passed in on to the real Mr Router &#8211; pretending to be us.Â  Very, very naughty.</p>
<p style="text-align: left;">Even after all of this non-circumstantial evidence and definite proof (well beyond speculation) I received this shortly before 21:00 the sunday evening:</p>
<blockquote style="text-align: left;"><p>Network engineer found that there is a problem with the available<span class="moz-txt-citetags"> </span>uplink bandwidth from the switch and that this is entering planning<span class="moz-txt-citetags"> </span>for rectification. We will get our Netops engineers to assist, but<span class="moz-txt-citetags"> </span>will not be able to resolve it tonight.</p></blockquote>
<p style="text-align: left;">I thought I was going to kill someone.  My reply to this was simple, yet effective:</p>
<blockquote style="text-align: left;"><p>What does available uplink bandwidth have to do with a machine that&#8217;s spoofing ARP responses?</p>
<p>Yes, the uplink bandwidth is a problem, I&#8217;ve been saying that for a while now &#8230; but this issue is happening over a WEEKEND (A LONG WEEKEND I MAY ADD) when it&#8217;s dead quiet, so there is a better chance of hell freezing over than that it&#8217;s an uplink bandwidth issue.</p></blockquote>
<p style="text-align: left;">I also immediately called in to the GSC in order to try and speak with whatever engineer made the assessment and to try and hammer it into his head what was going on, as they tried to connect me he fortunately called me.Â  This helped somewhat with my mood.</p>
<p style="text-align: left;">This is fortunately where things turned, within a few minutes of him calling me I managed to explain to him what was actually happening and he took action and disabled the compromised host&#8217;s port and all went back to normal by 21:30.Â  Confirmed by 21:34 with ARP traces that finally looked normal:</p>
<blockquote style="text-align: left;"><p>21:22:16.089287 arp reply x.y.z.1 is-at 00:00:0c:07:ac:0a (oui Cisco)<br />
21:22:51.888723 arp reply x.y.z.1 is-at 00:00:0c:07:ac:0a (oui Cisco)<br />
21:23:27.688410 arp reply x.y.z.1 is-at 00:00:0c:07:ac:0a (oui Cisco)<br />
21:24:14.599464 arp reply x.y.z.1 is-at 00:00:0c:07:ac:0a (oui Cisco)</p></blockquote>
<p style="text-align: left;">Much better, no duplicates, no more issues with ssh just going awol.Â  No more ping timeouts, no more infected HTTP responses.</p>
<p style="text-align: left;">The HTTP infections themselves were also pretty ingenious.Â  In particular, a typical HTTP response looks something like this:</p>
<blockquote>
<pre>HTTP/1.1 200 OK
Date: Sun, 03 May 2009 08:49:07 GMT
Server: Aapche
Content-Length: 2698
Keep-Alive: timeout=15, max=98
Connection: Keep-Alive
Content-Type text/html

&lt;!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
"<a rel="nofollow" href="http://www.w3.org/TR/html4/loose.dtd%22">http://www.w3.org/TR/html4/loose.dtd"</a>;&gt;
&lt;html&gt;</pre>
</blockquote>
<p>As you can see, there is a number of unneeded headers.Â  Now what would happen if we were to strip some of these out?Â  Eg, the Server, Keep-Alive, and Connection: headers?Â  Firstly, the lack of Keep-Alive would cause the connection to be closed, a new one will be re-opened for subsequent requests (slight slowdown &#8230; whoopdy do), The lack of the Server: header makes no difference at all, but now we do need (due to technical requirements of not disturbing the packet length) to adjust the Content-Length: header, and the opened up space gets padded with spaces.Â  So basically if we rip out X bytes from the headers we pre-pad the content (from &lt;!DOCTYPE onwards) with spaces which doesn&#8217;t change the meaning of the content, we increase the Content-Length by X (being careful around magnitude boundaries, eg, going from 3 digit values to 4 digit values in which case we need to consume one of our added spaces again).Â  Since we now have a bunch of spaces at the beginning of the content we&#8217;ve got space to add some stuff in there (this broke the HTML standard in my page&#8217;s case but generally html devs don&#8217;t have those &lt;!DOCTYPE headers and by the time anybody notices this it&#8217;s already too late anyway).Â  In this case the &#8220;payload&#8221; was a &lt;script&gt; tag using a src attribute pointing to a .gif file which actually contains some javascript.Â  This then proceeded to exploit some Windows bug and actually compromise the machine for whatever nefarious purpose the hacker had in mind.</p>
<p>There is more detailed explanations in the GLUG thread for some of these things, in particular you may also want to read these posts:</p>
<ul>
<li>http://www.linux.org.za/Lists-Archives/glug-tech-0905/msg00018.html</li>
<li>http://www.linux.org.za/Lists-Archives/glug-tech-0905/msg00022.html</li>
<li>http://www.linux.org.za/Lists-Archives/glug-tech-0905/msg00026.html</li>
</ul>
<p style="text-align: left;">I really need to learn how to write this kind of garbage in english and not my variant of techbabble.</p>
<p style="text-align: left;">Some further discussion regarding possible fixes ensued, and I made this post after discussion with various people, including the orignal person asking the question, I quote from http://www.linux.org.za/Lists-Archives/glug-tech-0905/msg00028.html:</p>
<blockquote>
<pre style="text-align: left;">&gt; So we have a neat and devilishly cunning way of getting content onto a browser
&gt; machine virtually anonymously. Ouch.
&gt;
&gt; What's the defense?

You don't want the answer to this.  In a word:  NONE.

I've been thinking really hard, and I can come up with only a handful of
"solutions":

1.  Hardcode the router's MAC in /etc/ethers.  This still doesn't
prevent the injection from happening with MAC spoofing (some host can
confuse the switch into overflowing it's MAC tables and thus effectively
sending all traffic everywhere, but getting the switch to only send it
to that host would be significantly harder, but not impossible, and
HSRVP could in this case actually provide a stepping stone for actually
making the exploit possible again.

2.  In this particular setup there are a few routers high in the range,
I could hard code my router to one of them seeing as only .1 was
targeted in this case.

3.  There is apparently some options on the CISCO switches which can
guess as to whom the attacker is and shut down that port, or at a
minimum detect and report it.  This will probably be quite effective in
the subnet.

4.  It's possible to configure CISCO switches to only allow
communication with the router port, but then you need to also configure
exceptions between servers that are allowed to communicate.

And all of those, without exception, only protects the local subnet.  It
DOES NOT prevent this attack from happening between other hops further
down the route.

As a friend of mine (IPv6 fanatic) would say:  This is a fundamental
flaw in IPv4.  We need IPv6 with build in security mechanisms and HMACs
which guarantees authenticity (Ok, we have it in IPv4 in IPSec, which is
basically the "security" features from IPv6 back-ported).  Not that I
understand how you can authenticate a packet unless you have a shared
key to run the HMAC with, which also implies that we're going to need
X509 certificates on a per-IP basis to be issued to each and every
machine on the Internet.  Which brings me to my next question:  How much
will Thawte and/or Verisign ask for such a certificate?  And how much
are they going to put into MS's pockets to prevent other parties from
entering the arena?

Also, considering that you now need to run two cryptographic hashes per
packet coming in off the wire, or going out on the wire, what
implications will this have on already loaded servers?

Just some things to ponder.

Jaco</pre>
</blockquote>
<p>The option referenced in 4 is called PVLAN (An extension of the VLAN aka virtual LAN concept).Â  Re option three, <a title="Local Loop" href="http://localloop.co.za/" target="_blank">Simeon Miteff</a> mentioned there is a tool called <a title="LBNL's Network Research Group" href="http://www-nrg.ee.lbl.gov/" target="_blank">arpwatch</a> which does this same thing on your Linux servers.Â  At a minimum this will allow one to detect this crap much quicker, and take appropriate action in a shorter time without having to go through the trouble of running sniffers and the like.</p>
<blockquote></blockquote>
<p class="facebook"><a href="http://www.facebook.com/share.php?u=http://jkroon.blogs.uls.co.za/it/security/arp-spoofing-a-lost-art-maybe-not" target="_blank"><img src="http://jkroon.blogs.uls.co.za/wp-content/plugins/add-to-facebook-plugin/facebook_share_icon.gif" alt="Share on Facebook" title="Share on Facebook" /></a><a href="http://www.facebook.com/share.php?u=http://jkroon.blogs.uls.co.za/it/security/arp-spoofing-a-lost-art-maybe-not" target="_blank" title="Share on Facebook">Share on Facebook</a></p>]]></content:encoded>
			<wfw:commentRss>http://jkroon.blogs.uls.co.za/it/security/arp-spoofing-a-lost-art-maybe-not/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>The correlation between network traffic, tea and lunch times</title>
		<link>http://jkroon.blogs.uls.co.za/it/networking/the-correlation-between-network-traffic-tea-and-lunch-times</link>
		<comments>http://jkroon.blogs.uls.co.za/it/networking/the-correlation-between-network-traffic-tea-and-lunch-times#comments</comments>
		<pubDate>Sat, 28 Mar 2009 10:56:02 +0000</pubDate>
		<dc:creator>Jaco Kroon</dc:creator>
				<category><![CDATA[Networking]]></category>

		<guid isPermaLink="false">http://jkroon.blogs.uls.co.za/?p=69</guid>
		<description><![CDATA[So I could for a long time now spot public holidays, weekends, and general stuff on network graphs, but never did I suspect I&#8217;d be able to detemine lunch and tea times based on network traffic graphs!Â  Anyway, check this one out:
One can clearly see the 10:00 tea time, then lunch at 13:00 and tea [...]]]></description>
			<content:encoded><![CDATA[<p>So I could for a long time now spot public holidays, weekends, and general stuff on network graphs, but never did I suspect I&#8217;d be able to detemine lunch and tea times based on network traffic graphs!Â  Anyway, check this one out:</p>
<div id="attachment_70" class="wp-caption alignnone" style="width: 510px"><img class="size-full wp-image-70" title="network_traffic" src="http://jkroon.blogs.uls.co.za/wp-content/uploads/2009/03/network_traffic.png" alt="VoIP Traffic Graph" width="500" height="135" /><p class="wp-caption-text">VoIP Traffic Graph</p></div>
<p>One can clearly see the 10:00 tea time, then lunch at 13:00 and tea again at 15:00 before things start quieting down for the weekend at around 16:30.Â  Surprisingly there is still a decent amount of activity for a saturday <img src='http://jkroon.blogs.uls.co.za/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> .</p>
<p class="facebook"><a href="http://www.facebook.com/share.php?u=http://jkroon.blogs.uls.co.za/it/networking/the-correlation-between-network-traffic-tea-and-lunch-times" target="_blank"><img src="http://jkroon.blogs.uls.co.za/wp-content/plugins/add-to-facebook-plugin/facebook_share_icon.gif" alt="Share on Facebook" title="Share on Facebook" /></a><a href="http://www.facebook.com/share.php?u=http://jkroon.blogs.uls.co.za/it/networking/the-correlation-between-network-traffic-tea-and-lunch-times" target="_blank" title="Share on Facebook">Share on Facebook</a></p>]]></content:encoded>
			<wfw:commentRss>http://jkroon.blogs.uls.co.za/it/networking/the-correlation-between-network-traffic-tea-and-lunch-times/feed</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Broadband &#8230; really?</title>
		<link>http://jkroon.blogs.uls.co.za/it/voip/broadband-really</link>
		<comments>http://jkroon.blogs.uls.co.za/it/voip/broadband-really#comments</comments>
		<pubDate>Tue, 10 Mar 2009 19:10:08 +0000</pubDate>
		<dc:creator>Jaco Kroon</dc:creator>
				<category><![CDATA[Networking]]></category>
		<category><![CDATA[VoIP]]></category>

		<guid isPermaLink="false">http://jkroon.blogs.uls.co.za/?p=66</guid>
		<description><![CDATA[How does one define broadband?Â  Is it all defined around throuput?Â  Latency?Â  Between which endpoints?Â  Using ADSL currently one can expect a local latency of around 30ms to anywhere within SA, and depending on whether you&#8217;ve got a SAIX (Fibre) or IS (both Fibre and Satellite) you can expect anything between 300ms and 800ms to [...]]]></description>
			<content:encoded><![CDATA[<p>How does one define broadband?Â  Is it all defined around throuput?Â  Latency?Â  Between which endpoints?Â  Using ADSL currently one can expect a local latency of around 30ms to anywhere within SA, and depending on whether you&#8217;ve got a SAIX (Fibre) or IS (both Fibre and Satellite) you can expect anything between 300ms and 800ms to international destinations (obviously depending on the exact geographical location and the type/distance of the peerings being used).</p>
<p><span id="more-66"></span></p>
<p>What kinds of speeds is &#8220;fast enough&#8221;?Â  What is the trade-off between throughput and latency?</p>
<p>Personally, I could do with &gt;4Mbps links, but this isn&#8217;t essential for me.Â  Heck, I still use a 384 DSL line at home.Â  However, I obviously won&#8217;t be running a company off of it.Â  Nor am I a big leecher that tries to download a gazillion gigabytes every month.</p>
<p>Half the time, I find that people that complain about &#8220;slow&#8221; internet isn&#8217;t even saturating their link at all.Â  So why do they consider their links to be slow?Â  Simple, latency.Â  Let&#8217;s say you&#8217;re browsing to google.co.za, (hope page, excluding images is 2909 bytes, that&#8217;s _two_ TCP segments).Â  At 4Mbps that&#8217;s a fraction of a second, then why does it feel like google.co.za takes a second to load?Â  Because it actually does:</p>
<ol>
<li>SYN packet gets sent.</li>
<li>~300ms later the SYN/ACK arrives.</li>
<li>ACK gets sent + request gets sent off.</li>
<li>~300ms later the response starts coming in.</li>
</ol>
<p>So yes, that&#8217;s around 600ms (on a low-latency round trip) before we even START receiving the data from google back.Â  Please note that local caches and other factors obviously influences this.</p>
<p>Then, there is probably the single factor that I consider to be the most crucial piece of the puzzle at the moment:Â  packet loss!Â  I&#8217;ve recently started seeing round-trip packet loss of around 5 %!Â  Sounds low?Â  Well, it&#8217;s actually less than 5 % as the 5 % is round-trip, so it&#8217;s probably like 2.6 % per direction.Â  Either way, for VoIP we are trying to send 50 packets per direction per second, at 2 % that means we&#8217;re losing a fragment per second (0.02ms worth of audio).</p>
<p>As previously stated we can get away with relatively little bandwidth for VoIP, but there are a few things we do want:</p>
<ol>
<li>Lowish (&lt;100ms) latency.</li>
<li>Low jitter (The lower ping&#8217;s mdev value, the better, &lt;5ms works well).</li>
<li>NO PACKET LOSS.Â  We do NOT want to lose packets at all.</li>
</ol>
<p>It is mostly for the latter two reasons that we HIGHLY recommend dedicated internet links for VoIP purposes in SA (No QoS on the network).Â  However, it seems recently the major providers started competing to see who can provide the highest latency (iBurst is mostly winning here at around 150ms+ latency for local traffic) and packet loss (it&#8217;s hard to say who&#8217;s leading here &#8230; I saw iBurst at 40% the other weekend, I&#8217;m getting more DSL lines at 5%+), and let&#8217;s not even go into jitter &#8230;</p>
<p>No, internet in South Africa at this point in time is a mixture of disappointment and utter frustration.Â  I&#8217;ve been with my hands in my hair the last while, and I&#8217;ve no idea where to turn next.Â  Diginet?Â  If only it was as good as they claim, and if only it was affordable.</p>
<p class="facebook"><a href="http://www.facebook.com/share.php?u=http://jkroon.blogs.uls.co.za/it/voip/broadband-really" target="_blank"><img src="http://jkroon.blogs.uls.co.za/wp-content/plugins/add-to-facebook-plugin/facebook_share_icon.gif" alt="Share on Facebook" title="Share on Facebook" /></a><a href="http://www.facebook.com/share.php?u=http://jkroon.blogs.uls.co.za/it/voip/broadband-really" target="_blank" title="Share on Facebook">Share on Facebook</a></p>]]></content:encoded>
			<wfw:commentRss>http://jkroon.blogs.uls.co.za/it/voip/broadband-really/feed</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>VoIP Load Balancing over PPPoE links</title>
		<link>http://jkroon.blogs.uls.co.za/it/voip/voip-load-balancing-over-pppoe-links</link>
		<comments>http://jkroon.blogs.uls.co.za/it/voip/voip-load-balancing-over-pppoe-links#comments</comments>
		<pubDate>Sun, 15 Feb 2009 09:05:38 +0000</pubDate>
		<dc:creator>Jaco Kroon</dc:creator>
				<category><![CDATA[Networking]]></category>
		<category><![CDATA[VoIP]]></category>

		<guid isPermaLink="false">http://jkroon.blogs.uls.co.za/?p=62</guid>
		<description><![CDATA[I initially wanted to say over DSL, but then realized that that&#8217;s not quite appropriate since we&#8217;ve just actually completed the naÃ¯ve approach making use of iBurst.Â  So on my way back home &#8230; I churned some ideas that I&#8217;d like to share (And log, before I forget them).Â  The ideas will build around the [...]]]></description>
			<content:encoded><![CDATA[<p>I initially wanted to say over DSL, but then realized that that&#8217;s not quite appropriate since we&#8217;ve just actually completed the naÃ¯ve approach making use of iBurst.Â  So on my way back home &#8230; I churned some ideas that I&#8217;d like to share (And log, before I forget them).Â  The ideas will build around the IAX/2 protocol as it&#8217;s much, much simpler, but the concepts should apply equally to SIP.Â  Obviously the ideal is to just get a bigger pipe &#8230;</p>
<p><span id="more-62"></span></p>
<p><strong>The NaÃ¯ve approach<br />
</strong></p>
<p>Yesterday I still thought this would work quite well.Â  I was wrong.Â  Essentially, on the server just add three additional IPs for a total of four IPs, so we now have $ip1, $ip2, $ip3 and $ip4, all in the same /24 subnet, and all assigned to the same physical NIC.Â  The routing table ends up looking (ignoring other the rest of the NICs on the setup):</p>
<p>${subnet}/24 dev eth0 scope link src ${ip1}<br />
default via ${subnet_gw}</p>
<p>Now anybody familiar with how udp and the sendto() function works will immediately spot the problem, whatever we send back to the client, no matter on which IP we recieved the request, the response will always have ${ip1} as it&#8217;s source.Â  So whilst it&#8217;s perfectly legal to set up four iax accounts on the client pointing to the four different IPs on the server this just ends you in a spot of trouble (Set up the four peers with qualify=yes and routing to each of the four server IPs over separate PPPoE connections and you&#8217;ll notice that three of them end up being unreachable).</p>
<p>To clarify, let&#8217;s say on the client machine we have four pppoe connections, with four IPs of $dsl1, $dsl2, $dsl3 and $dsl4, so we don&#8217;t do a default route on any of them (that goes via our normal gateway anyway), so our routing ends up something like (please note there are some additional prohibit routes in order to prevent stuff going over our data link but in an ideal case the below is sufficient):</p>
<p>${ip1} scope link dev ppp0<br />
${ip2} scope link dev ppp1<br />
${ip3} scope link dev ppp2<br />
${ip4} scope link dev ppp3<br />
192.168.0.0/24 scope link dev eth0<br />
default via 192.168.0.1</p>
<p>Now, when we try to POKE on $ip2 we&#8217;d want to receive an ACK response from $ip2, instead we get an ACK from $ip1, which then evokes an INVAL response back to $ip1 (from $dsl1 no less).</p>
<p>In a similar fashion call setup also fails, never mind actual voice streams.</p>
<p><strong>Making the naÃ¯ve aproach work</strong></p>
<p>The best way of making this work is probably to update asterisk to actually explicitly set the from address on outbound packets which are replies to requests (or related to already associated connections).Â  This doesn&#8217;t look too trivial and the asterisk devs are going to raise more than just a few eyebrows.Â  Probably not too hard either, we already need to store the peer&#8217;s IP somewhere &#8230; we could just add the local IP there as well.</p>
<p>You&#8217;d reckon I&#8217;d simply give up there.Â  But fortunately there is a relatively simple fix:Â  Add a specific route for $dsl[234] on the server side.Â  This works, but is a nasty, nasty hack imho.Â  Basically you need to perform some kind of DNS/registration tracking on the server side which knows how to keep track of the installed routes, when to remove them and when to add new ones.Â  It also needs to know which _source_ IP to use for each of these.Â  The simple, stupid way is a script like:</p>
<blockquote><p>#! /bin/bash</p>
<p>ip1=??<br />
ip2=??<br />
ip3=??<br />
ip4=??</p>
<p>function check()<br />
{<br />
local name=$1<br />
local dst=$2<br />
local src=$3<br />
local olddst=&#8221;"</p>
<p>[ -r /var/lib/route-check-${name} ] &amp;&amp; olddst=$(&lt;/var/lib/route-check-${name})</p>
<p>if [ "${dst}" != "${olddst}" ]; then<br />
[ -n "${olddst}" ] &amp;&amp; /sbin/ip ro del ${olddst}<br />
[ -n "${dst}" ] &amp;&amp; /sbin/ip ro ad ${dst} via 196.35.70.1 src ${src}</p>
<p>echo &#8220;${dst}&#8221; &gt; /var/lib/route-check-${name}<br />
fi<br />
}</p>
<p>check ${ppp2_name} &#8220;$(/usr/bin/dnsip ${ppp2_dnsname} |sed -e &#8217;s/ //&#8217;)&#8221; $ip2<br />
check ${ppp3_name} &#8220;$(/usr/bin/dnsip ${ppp3_dnsname} |sed -e &#8217;s/ //&#8217;)&#8221; $ip3<br />
check ${ppp4_name} &#8220;$(/usr/bin/dnsip ${ppp4_dnsname} |sed -e &#8217;s/ //&#8217;)&#8221; $ip4</p></blockquote>
<p>There is no need to explicitly monitor ppp1 as this will just make use of the defaul IP anyway.Â  Surprisingly this does actually work.</p>
<p><strong>Abusing policy based routing</strong></p>
<p>This was quite annoying actually.Â  I set up policy-based routing on the ppp devices anyway such that any traffic that gets originated with that IP will actually be sent back using the appropriate device.Â  This then made me think about the server case &#8230; what if we only bound asterisk to ${ip1} and then had a small relay agent run on each of the other IPs that litterally consisted of about 50 lines of C code that opens port 4569 on the IP, and whatever it receives from ${ip1} it forwards to the appropriate dynamic IP (since we have three IPs we&#8217;re listening on we can accomodate up to three additional links.Â  The downside here is that asterisk loses some source information regarding the IPs, also, I don&#8217;t think we can use this for multiple clients unless the agent actually understand at least a small amount of the IAX protocol (or use a port other than the default 4569).Â  For a two-ip on the server case we&#8217;d thus have three channels, in ${ppp1} -&gt; ${ip1}, ${ppp2} -&gt; ${ip2}, and lastly ${ip2} -&gt; ${ip1}.Â  This should work as far as I can tell.Â  Asterisk on the server will see the calls as coming from ${ppp1}, ${ip2}, ${ip3} and ${ip4} but I&#8217;m fine with that.Â  It does reduce our redirect options significantly though.</p>
<p>Each call is also limited to a dsl line, and if that line goes down mid-call the call is screwed.</p>
<p><strong>Extreme routing</strong></p>
<p>This idea steams from the first, basically we run a small packet capturing sniffing for packets coming into the system on port 4569 (and 5060 would actually work cleanly for this too as far as I can tell) for each packet we look at {source, dest} and see if we have a route to source via our normal gateway and with <em>src</em> set to dest.Â  If not, update our routing table. This will likely hammer the system quite badly though with continuous routing table updates, result in an insanely large routing table unless the program also flushes routes again from time to time that it hasn&#8217;t seen in a longer than maxexpiry period.</p>
<p>Thus, if a packet comes in from ${ppp2} to ${ip2}, ensure that we have the route ${ppp2} via ${gateway_ip} src ${ip2}.Â  This will effectively fix the redirect problem too, and at the same time keep latency down as there is not an additional process for packets to go through.Â  One could even possibly rather make use of ipsets in iptables (along with the SNAT target)&#8230; efficiency will need to be trialed though.</p>
<p>Scary enough, this actually seems quite feasible to me.Â  Except for the resources that will be spent keeping track of the IP pairings I actually quite like this solution.Â  It&#8217;s relatively simple, doesn&#8217;t require assistance from asterisk, can happen realtime and without the need for third-party intervention (the less the system needs to rely on other systems the better).</p>
<p><strong>Truly load balancing the IAX data stream</strong></p>
<p>Then I came to the realization that we&#8217;re splitting up the calls, and packing them into partitions in order to get load balancing done.Â  Now if I tell you that the dialplan to get this done is fugly, please understand that I&#8217;m not trying to brag about getting it right.Â  It truly is FUGLY, and I really fail to see a way of cleaning this up.Â  It&#8217;s a hack making use of the GROUP functions to count the number of calls over the individual lines and balancing them out.Â  Nor does it solve for the inbound case!Â  And it still drops calls on the lines that die (and with the iBurst connections I&#8217;ve seen the last 24 hours or so &#8230; damn).</p>
<p>Ok, so the idea becomes to have a really, really dumb relay agent.Â  Doing something like ip-in-ip, but since the end points don&#8217;t give a rats about the original IPs (IAX/2 is very friendly towards both SNAT and DNAT &#8230; thanks be to Mark Spencer) we can do some really cool forwarding.</p>
<p>So on both servers we bind the IAX in asterisk to 127.0.0.1, we assign a secondary IP of 127.0.0.2 to the local loopback, and we bind a &#8220;proxy&#8221; on that, which also binds the four actual IPs.Â  Now we just need to know what the &#8220;pairing&#8221; IP is for each of our public facing IPs, so we need to register a set somehow.Â  So a client will let us know &#8220;these are my IPs, I&#8217;m going to be sending from all of them to your IPs, please round-robin between them when sending back&#8221;, after which it should be able to immediately send a register and start getting going.Â  At this point I can address another issue, since we now round-robin the packets, and we know which set of peer IPs belong together we can actually get rid of our additional IPs on the server!Â  I&#8217;d still keep two public-facing ones, one for the raw IAX/2 asterisk port (for efficiencies sake for those clients that don&#8217;t need the load balancing), and the latter for balancing the IAX/2, which then dynamically creates an alias on local loopback and transmits to the other IP (last I checked the Linux kernel could get quite confucious about this &#8230;. but it should be possible to get something working).</p>
<p>The downside here is that we lose the ability to redirect our calls, but seeing as we&#8217;re load balancing we may not actually want those redirected streams as they mess with our ability to &#8220;trunk&#8221; calls.Â  Instead I&#8217;d say just charge the client a premium for the ability to load balance.</p>
<p>Even more scary with this is that if a link does go down we just gained the ability to (permitting we realize the dead link quick enough) not drop the call (we can now actually &#8220;jump&#8221; the voice data over to different links).Â  Let&#8217;s say we do have a 4-way setup, and we do lcp echo requests every 5 seconds, and 3 consecutive no-responses kills the link it&#8217;s going to imply a bi-directional packet loss of 25% for around 15 seconds.Â  This is NOT so hot, but in many cases better than dropping the call entirely.</p>
<p><strong>Just flippen update the protocol!</strong></p>
<p>Ok, by now I&#8217;m slightly fed up actually (tired).Â  Come to think of it though &#8211; what is there to prevent this being added directly into asterisk and the IAX/2 protocol?Â  Basically tel asterisk to perform the round-robin on a set of local interfaces (source IPs?) for a particular peer/user, and if the peer supplied us with multiple IPs in it&#8217;s registration, round robin it on that side too!Â  Really, consider something like this quickly:</p>
<p>[ulsvoip]<br />
type=friend<br />
host=voip.uls.co.za<br />
local_ifaces=ppp0,ppp1,ppp2,ppp3<br />
&#8230;</p>
<p>Also assume that voip.uls.co.za (for now, resolves to a single IP only), now, when we register we supply all active IPs for ppp0, ppp1, ppp2 and ppp3 (so if one of more of the devs is done they simply get ignored for that round but if a device does change state we need to refresh our registration).Â  Now we have a set of IPs for local, and we simply alternate with using those IPs as source.Â  Let the kernel handle the actual routing (just make sure your policy based routing is set up properly &#8211; not difficult to do).</p>
<p>On the server side it&#8217;s dead simple, when the registration comes in it&#8217;ll contain multiple IPs, so any packets coming from any of those IPs are treated as coming from a single source.Â  When sending packets back, we simply rotate the destinations so as to effectively round-robin between the different links on it&#8217;s side.</p>
<p>And lasty, there is no reason why this can&#8217;t be done on both ends simultaniously, even with different numbers of links on both sides!Â  In other words, the example above, but voip.uls.co.za resolves to 4 or even 5 different IPs (Rather moot at this point since all IPs are assigned to the same interface anyway).</p>
<p class="facebook"><a href="http://www.facebook.com/share.php?u=http://jkroon.blogs.uls.co.za/it/voip/voip-load-balancing-over-pppoe-links" target="_blank"><img src="http://jkroon.blogs.uls.co.za/wp-content/plugins/add-to-facebook-plugin/facebook_share_icon.gif" alt="Share on Facebook" title="Share on Facebook" /></a><a href="http://www.facebook.com/share.php?u=http://jkroon.blogs.uls.co.za/it/voip/voip-load-balancing-over-pppoe-links" target="_blank" title="Share on Facebook">Share on Facebook</a></p>]]></content:encoded>
			<wfw:commentRss>http://jkroon.blogs.uls.co.za/it/voip/voip-load-balancing-over-pppoe-links/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
