Asterisk – massively speeding up those REGISTER requests

So recently I started bumping into an issue where I would see a buildup of traffic in the RX queue of asterisk’s SIP port 5060 (udp bound). After some scavenging of the code I quickly came to realize that asterisk only processes a single incoming SIP request (or response) at a time. So I cooked a rather crude patch (that I for the shame of it won’t share here) in an attempt to figure out what went wrong.

I set an arbitrary limit to log SIP requests taking longer than 100ms to process and it very quickly became apparent that REGISTER packets were the only ones triggering this. I dug deeper and logged the time taken for each and every SIP packet, and it was quite scary to learn that the far majority of packets was dealt with in under 100us (yes, micro-seconds), so why was REGISTERs such an order of magnitude slower?

A few questions was raised, and some flames thrown in #asterisk and the blame went everywhere except the total lack of parallel processing in chan_sip, everything from heavy network IO, reverse DNS lookups right through to the OS got blamed. As it turns out, the astdb should have took it. I got one of my clients whose an absolute wizzard with MS Excel to graph the badness of the RX queue for me and it became clear that things are worse between 22:00 and 2:00 (when our backups run), but that it was nowhere near perfect at other times. We were at times losing requests simply because there wasn’t sufficient buffer-space for the socket!

We’ve logged numerous cases of SIP REGISTER requests taking over 300ms, but typically between 150 and 250ms when the system was not under heavy IO load, but up to 2.6s during backup runs. Needless to say, this explained a few strange things we’ve experienced over the last few months.

We tested the theory by moving astdb.sqlite3 onto a tmpfs mounted filesystem, and immediately the problem went away, and processing of register packets dropped to within reasonable (<1ms) ranges. After a quick perusal of the contents of the astdb we decided that the only important information we keep in there is SIP and IAX/2 registration information, and that should the server crash (OS level) we're willing to lose that data. Our SIP clients mostly use SRV records so they can easily enough fall back to one of our other servers, and our IAX/2 clients are typically configured to register to multiple servers on our end, as well as to dial out via multiple servers, so they should not be a problem either. Our biggest concern is still the dropped calls when this happens. WARNING: Utilizing the procedure outlined below may very well end up losing your asbdb data if you’re not careful!

So, to speed up astdb in order to give reasonable performance to asterisk’s chan_sip, this is what we did (NOTE THE WARNINGS ABOVE):

Firstly, we created a folder /var/lib/asterisk/astdb. We then added the following line to /etc/fstab (broken up for ease of reading):

none /var/lib/asterisk/astdb tmpfs \
  noatime,nodev,nosuid,noexec,mode=0750,uid=asterisk,\
  gid=asterisk 0 0

Then we issued “mount /var/lib/asterisk/astdb”. In /etc/asterisk/asterisk.conf, you need to add this:

[directories]
astdbdir => /var/lib/asterisk/astdb

And then came the patient part, first we need to wait for asterisk to have no calls (ie, very, very late at night), then we did this:

cd /var/lib/asterisk/astdb
/etc/init.d/asterisk stop
cp -a ../astdb.sqlite3 .
/etc/init.d/asterisk start

After this your asterisk should be running again, but using astdb.sqlite3 on the tmpfs instance, which should result in MUCH (orders of magnitude, at least 200x in our case, from ~150ms best case to ~750us worst case) faster REGISTER request processing.

Now, if you want your sqlite database to survive reboots, then I suggest putting these commands before your asterisk start in your init script, and the inverse after asterisk stopped:

# Before start:
cp -a /var/lib/asterisk/astdb.sqlite3 /var/lib/asterisk/astdb/

# After stop:
cp -a /var/lib/asterisk/astdb/astdb.sqlite3 /var/lib/asterisk/

You may also want to periodically make a live copy of the database. I’m not sure what’s required from sqlite’s perspective to make this transactionally safe and to not make a copy corrupted copy. Consult the sqlite documentation most likely. For our purposes starting with a clean DB after every boot is good enough.

4 Responses to “Asterisk – massively speeding up those REGISTER requests”

  1. thomas says:

    nice article! does this help speed up sip registrations with mysql realtime setup aswell?

  2. Jaco Kroon says:

    Thanks! I honestly don’t know, I’ve not used mysql realtime, but I suspect it should. As I understand it the registration data will be retrieved from mysql, but the actual registration data will be stored to astdb still (perhaps back to mysql too). So whilst I can’t vouch for the data going back to mysql (this would depend a LOT on the mysql configuration) the astdb portion should still benefit.

  3. seik0 says:

    Asked in #asterisk for experience on two asterisk instances on one machine and people pointed you as Experienced One =). But cannot find any other place to write to you.
    I need to run two asterisk in order to not depend on db-connections failure. Second asterisk will be very simple, processing (as proxy, mainly) 2 e1, so they shouldn’t interfere. But if you have experience in configure, I would be very happy to hear what I should know to run two asterisks.
    If you see my email – write please.

  4. Don Viszneki says:

    We’re seeing Asterisk consistently writing 200KB/s to disk (according to iotop’s “DISK WRITE” column and atop’s “wrdsk” column,) with bursts approaching 4MB/s. We have less than 60 SIP clients, and this morning we’re calling at a rate of about 145 calls per hour.

    This combined with a consumer-grade HDD and Linux MD RAID5, and we seem to have bound Asterisk performance by its disk i/o (that’s probably a first.)

    We’ve determined asterisk only had three files open for writing. Two were log files and showed little writing according to tail -f. The remaining one was astdb. This is what brought us to your blog 🙂

    Our dialplan is written by FreePBX and includes a lot of database operations.

    For those wondering: sqlite3 can backup a database with its “.backup” command, so that will keep your database integrity in-tact while doing a hot-copy.

    We’re going to give tmpfs a shot 🙂