Calculating VoIP Bandwidth requirements

I’ve racked my head a few times now trying to calculate how big a pipe I need for various call concurrencies.  So I’ve decided it’s finally time to sit down and do the calculations for both SIP/RTP and IAX2.


SIP accounts make use of RTP packets to transmit the voice data.  The frequency and size of these packets varies based on the actual codecs in use, but hopefully the information here will give you enough info to re-perform these calculations.  For the sake of being uniform I’m going to assume the G.729 codec right through this article (The best compression/quality ratio, at 8Kbps there is simply nothing better that I’m aware of).

The general structure of an RTP packet is something like (pardon the ascii art):

[IP Header][UDP Header][RTP Header][Payload]

The IP header is generally of a fixed size at 5 32-bit words (20 bytes) as most IP packets does not contain “options”.  This is just because that’s the way IPv4 was designed.  In a similar fashion the UDP Header is 8 bytes because it was designed that way.  So for TCP/IP overhead we’re looking at 28 bytes per packet.

The RTP header (according to RFC1889) is at least 12 bytes, but can be longer.  A lot of sniffing and staring at packet traces shows that 12 bytes is the norm and I’ve very seldomly seen something different.

The G.729 codec can encode at either 10ms intervals, or 20ms (or any multiple of 10ms really), generally though it’s 20ms, so the frequency is 50Hz (or 50 RTP packets per second).  The payload size of G.729 at 20ms intervals is 20 bytes (8Kbps => 1000 bytes/second => 20 bytes/frame).

In total we’re thus looking at 20 + 8 + 12 + 20 = 60 bytes per packet, 50 packets per second, or 3000 bytes per second, ie, 24Kbps.  Per active channel.

So that’s it hey?  Well, no. If you look at for example DSL on ATM, the ATM frame sizes is 48 bytes.  So for a 60 byte packet we’re consuming two frames, or 96 bytes per packet.  So in fact we’re looking at 96 x 50 = 4800 bytes per second, or 38.4Kbps.  Does this difference sound scary?  Yes, it does to me.  So in essence you have to round the base packet size up to the closest “cell” boundary size (eg, up to the nearest 48 on DSL) before multiplying by the number of packets.

In South Africa there is 3 ADSL line speeds commonly available:

  1. 384/128Kbps
  2. 512/256Kbps
  3. 1024/384Kbps (or 4096/512 Kbps if you’re lucky).

So, now to calculate how many calls we can carry on each of those we take the “up” spead (after the /) and divide by 38.4, and we get to the numbers 3, 6 and 10 (or 13).  Experience however has shown that the 384 lines are not that good as Telkom’s congestion ratio’s are just too high and on a 384 line you’re pretty much guaranteed to suffer exhaustion on your “uplink” (which results in you having a stuttering voice).  The 512 lines work pretty well, but again, we wouldn’t recommend pushing more than 4 actual calls … over-subscription is a BITCH.  Especially in the UP direction where we suspect the contention ratios are higher than on the DOWN link.  Telfree claims that you can’t make more than 6 concurrent calls on a single 1Mbps line, we’ve yet to actually perform a scientific test, for the moment we’ll say that you should be able to make 8 but contention might start hammering you.  We’ve not had the luxury of being able to test on a 512 uplink yet, but based on the Telfree report 8 should be perfectly viable and we’d reason even up to 10 (and maybe 11).  The discrepency between our calculated figures and those reported by Telfree could also be due to codec choices.  We do not know which codecs they used during their testing.


Whilst IAX2 is not always the best choice, we must say that the voice quality results we’ve seen has been phenominal. Typically we get significantly better quality with IAX2 rather than SIP, however, for an on-site system SIP provides some tangible benefits (possibility to send extension specific commands to phones etc … also, some features are not available in IAX2 – or we couldn’t make them work).  This isn’t really surprising as IAX2 was designed as an Inter-Asterisk-eXchange protocol, not an end-point protocol.  Why bother?  Well, IAX2 has some bandwidth benefits, and with trunking the “cell” effect from above becomes less problematic.

For starters, let’s consider a single call.  As per SIP we have the IP/UDP overhead of 28 bytes.

Then we head of to the internet draft (currently at RFC-5456), to look at the IAX format. This reveals that for single-calls the per-packet IAX2 overhead is 12 bytes for “full” voice frames (every few seconds whenever the timestamp for the voice stream is a multiple of 32768 – 2 ^ 15).  For normal “mini frames” we’re looking at a 4-byte header, and this is the majority of frames that we will see on a single call.

This means that we’re looking at 32 bytes of overhead in total compared to the 40 on SIP.  Not a big saving.  Well, ok, for a single call we end up with 52 bytes per packet or 20.8kbps vs 24kbps using SIP (No saving on DSL since we’re still looking at a 96 byte ATM frame size).  Without trunking this will be the per-call overhead.  Obviously when using technologies that doesn’t have this much overhead on it’s cell sizes (ethernet has a minimum size of 64 bytes which afaik includes a 14 byte ethernet header, so even at 52 bytes we’re exceeding that). I’m not sure about diginet.  Point being that on dsl we don’t gain anything on a single call, and even on ethernet we don’t really gain much.  On diginet we might already start seeing a difference.

Now, however, we add a second call.  Instead of creating a new IP frame we simply add the two payloads into the same packet (they’re headed to the same IP/port, so why not?). This is done by using trunked frames, which has an 8-byte frame header, and a 4-byte per-call header (6 bytes if your trunking timestamps – which you should be doing). So packet transmission ends up being: 28 + 8 + (6 + payload_size) x N – where payload_size is 20 for our purposes, and N is the number of calls. So for carrying 2 G.729 compressed calls, we need 36 + 26 * 2 = 88 bytes, so we fit in the 96-byte cell, so for DSL we get *two* calls for the price of one. At 88bytes/frame we’re looking at a rate of 35.2kbps. For three calls this increases from 88 bytes to 114 bytes or 45kbps. Thus per additional call we’re looking at an increase of 26 bytes, and 10.4kbps of additional bandwidth!

There are some restrictions, for example, creating trunked frames of larger than 1240 bytes isn’t recommended, so we can’t really trunk more than 47 calls at a time, but that’s already 503kbps worth of bandwidth. ULS also tends to use 10ms trunk frequency instead of the default 20ms times in order to reduce the actual amount of jitter introduced, which probably reduces this slightly (an additional 28bytes/frame, 100 frames/second instead of 50 frames/second). Assuming no such overhead the above formula holds, if we’re using a 10ms trunk time the formula still goes to (36 x 2) + (6 + payload_size) x N (upper limit). So at higher call volumes it should be comparable.

Compared to SIP, trunked IAX/2, 20ms G.729 with 20 and 10ms trunkfreq:

N SIP (raw) SIP (DSL) IAX/2 (20ms) IAX/2 (10ms)
1 24kbps 38.4kbps 20.8kbps
2 48kbps 76.8kbps 35.2kbps 49.6kbps
3 72kbps 115.2kbps 45.6kbps 60kbps
4 96kbps 153.6kbps 56kbps 70.4kbps
8 192kbps 307.2kbps 97.6kbps 112kbps
16 384kbps 614.4kbps 180.8kbps 195.2kbps
32 768kbps 1228.8kbps 347.2kbps 361.6kbps
64 1536kbps 2457.6kbps 694.4kbps* 694.4kbps

* Note that the calculation is different as we need to deal with the trunksize of 1240, so effectively we’re going to be transmitting the same data as for a 10ms trunkfreq.

As can be seen at higher concurrencies the 10ms vs 20ms becomes less overwhelming, but even at lower concurrencies it’s not that major. It basically reduces the effective call concurrency at a specified bandwidth size by 1.

A little math get’s us a formula for the number of calls given the available bandwidth (this doesn’t take into account the cell size and all answers needs to be verified with the above if cell size is an issue):

N = ((bw / 8 / f) – 36) / (s + 6)


bw = bitrate of available bandwidth in kbit, eg, 128000 for 128kbps link.
f = frequency of transmissions (50)
s = payload size (20).

In the case of trunkfreq of 10ms just use 72 instead of 36.

So if you’ve got a 1Mbps (1000000bps) link you should be able to get 94 concurrent calls (20ms trunkfreq). This however ignores the 1240 limit, so let’s just rather assume a trunkfreq of 10ms in this case, which gives us 93 concurrent calls, but even that will overflow the 47 calls, so at this stage we will either need to tweak the trunkfreq further, or rely on the trunkmtu option to fragment (or IP fragmentation), so intead of 72 let’s rather just double up, so every 10ms we transmit two trunked frames, so instead of 72 we use 144, and we end up with 90 concurrent calls on 1Mbps. For SIP this would have been 41 calls.

It’s been shown that IAX can definitely handle significantly many more calls than SIP on the same link.  Please note that these values may well vary based on the actual networking conditions and these are guidelines only.  If you give your clients any figures they need to be informed that these are estimates only and cannot be guaranteed unless the underlying bandwidth can be guaranteed.

One Response to “Calculating VoIP Bandwidth requirements”

  1. hi there,
    Could you please give me a solution where i could compress to minimum possible amount of kbps using asterisk. My gateway server is in US data center and my client is under local DHCP host on 3g wireless network, where the public ip is unknown and the client get ip via DHCP.
    All i want a minimum bandwidth utilization and connecting/perring my server on a LAN with dynamic ip to a remote public ip.
    [or you can provide my off the shelf solutions for lower bandwidth consumption for voip bandwidth].
    Please send the total architecture or if you have any demo network to test it. we will buy it out from you.
    Mr Dewan