The very first thing to note is that SIP was NOT designed to work with NAT. There are subsequent standards, hacks, workaround, kludges etc. to try and make it work but the original SIP designers somehow deemed it beneath them or put it in the too hard basket to bother coming up with a proper solution (there is not one instance of the string “NAT” in the whole SIP RFC).

“So what” you may be saying. Well if you’re bothering to read this it’s probably because you are either having or have had audio problems with your SIP VoIP phone. That’s purely down to this massive oversight by the SIP designers. If you look at Skype on the other hand their proprietary protocol has a much smaller incidence of audio problems. The Skype protocol designers went to great lengths to come up with a pragmatic design (of course it was in their interests since they were aiming to make a profit). They even went so far as to enable their traffic to be tunneled through HTTPS proxies so that calls would have a good chance of working behind a corporate firewalls; no chance of that with SIP even now. The SIP designers in their wisdom didn’t even bother to cope with the average home broadband connection.

If someone was to add up the cost in engineering man hours, user frustration and faulty VoIP calls as a consequence of the SIP standard it would be astronomical. As someone who has run a VoIP company in the past I’d estimate that somewhere between 30 to 50% of all support issues are due to one way audio or other NAT related problems.

Right that’s the rant out of the way now on to explain the technicalities of the problem particularly in relation to the sipsorcery service.

The first thing to look at is how a VoIP SIP call is supposed to work in an ideal scenario (which is the only one the SIP standard bothers to accommodate).

SIP Call - Ideal Scenario

SIP Call - Ideal Scenario

In the above diagram the end user SIP device and the SIP server are both on public IP addresses and everything is fine and dandy. To understand the diagram and subsequent ones the legend is:

  • The grey boxes on either side represent the Session Description Protocol (SDP) payloads that are carried in the SIP INVITE requests and responses,
  • The red circles over the grey boxes highlight the critical information within the SDP which is the IP address and port number that the sending device is going to be using for sending and receiving its RTP,
  • The blue lines represent a SIP transmission,
  • The green line represents an RTP stream,
  • A red line represents an RTP stream that could not be established,
  • Public or Private indicate the type of IP address the server or user agent are using.
  • Also as some of the diagrams used in this post get fairly wide and I haven’t spent the time to work out how to widen the columns in the blog software a larger version of the images is available here.

    In the ideal scenario both ends of the SIP call place a publicly accessible IP socket in their SDP and the device at each end of the call has no issues sending and receiving to and from the other’s socket and all is good.

    About the only time you come across the ideal scenario shown above is for SIP trunks between two VoIP Providers. The average residential and business internet connection uses a NAT and that changes the landscape for a SIP call subtly in appearance but dramatically in effect.

    NAT Scenario Basic

    NAT Scenario Basic

    The key point now is that the SIP Phone on the left is operating on a private IP address and that’s what it has placed in its SDP. The call proceeds the same as in the ideal scenario but when the SIP device at the other end, in this case a SIP Softswitch, attempts to send RTP to the phone it can’t because the SDP contains a private address which is not routable on the public internet.

    This diagram represents the classic one-way audio situation. The person on the IP Phone can’t hear the person on the other end of the call. The person on the server side can hear the person on the IP Phone since the phone is happily sending RTP to the server’s public SDP socket.

    You may want to take a break or grab a coffee at this point. If you thought it was hard understanding things so far it only gets worse!

    For SIP to get used in the real World it obviously had to overcome the NAT problem shown in the previous diagram (I probably shouldn’t say obviously as it doesn’t appear to have been that obvious when SIP was being devised). There are actually a number of different ways that NAT can be overcome with SIP but they fall into two categories:

  • The first category is where SIP devices on private IP addresses attempt to determine their public IP address and then use that in their SDP instead of their private address. STUN is one protocol designed for this purpose. Some devices let the user manually specify the public IP address and there are other mechanisms. It doesn’t matter so much how the SIP device gets its public IP address just that it places it into the SDP when making or answering a call,
  • The second category is where SIP Servers will attempt to cope with clients sending them SDP packets with private IP addresses normally be replacing the private address with the public address the packet came from.
  • The important thing to realise is that neither of these mechanisms is foolproof. And that’s worth repeating: there is no 100% foolproof mechanism that can guarantee a SIP call can cope with NAT. Although most of the time it can. The reason there isn’t a guaranteed mechanism is because of the nature of NATs and more specifically NATs using Port Address Translation (PAT). It’s explained a bit more further on but if a NAT translates the port on the outgoing RTP stream of SIP device then it means the port that was set in the SDP is now wrong and sending RTP to the requested socket will fail and result in one-way audio.

    Lets look at one of the mechanisms from the second category that a SIP Server can use to cope with a call from a private device.

    NAT Handling - Server Mangling

    NAT Handling - Server Mangling

    In this diagram the small unreadable text on the right is explaining how the SIP Server is configured to recognise private IP addresses in the SDP and replace them with the IP address the request was received on. The sipsorcery server does exactly that. The problem is that it’s not a particularly robust mechanism. If for example a SIP Proxy is in between the end device and the SIP Server doing the mangling then the public IP address of the Proxy will be placed into the SDP and the RTP will never reach the end device. Or as occasionally happens a faulty NAT will actually leave the source IP address of the packets it transmits as a private IP address giving the SIP Server the choice between a private SDP address and a private origination address (which are probably the same so no choice really).

    The other more common thing that breaks a SIP Server’s attempt at using the request origination address for RTP is the one alluded to previously, PAT.

    NAT Mangling - Broken by PAT

    NAT Mangling - Broken by PAT

    In the diagram above the SIP Server has correctly detected the phone’s public IP address and has attempted to send its RTP packets there. However because the NAT in front of the phone has performed a port translation on the phone’s RTP stream the NAT has no mapping for the socket the Server is attempting to send to and simply drops the packets. The result is again one-way audio.

    So mangling does help and is better than nothing it doesn’t always work depending on what type of NAT device is in front of your phone. Because the sipsorcery service only deals with SIP, and not RTP, mangling is the ONLY thing it can do. There is no other magic it can do to try and get the RTP streams connected up. The best advice for one-way audio and using sipsorcery is to try and set up your router to NOT do port translations on the range your phone uses for RTP.

    The next mechanism a SIP Server can use to cope with NAT is to forget about packet mangling and reflect back RTP to whichever socket it receives on.

    NAT - RTP Reflection

    NAT - RTP Reflection

    In the above diagram the SIP Server will start off sending RTP to whatever socket is specified in the call request’s SDP irrespective of whether it’s a private address or not. Then as soon as it receives an RTP packet from the other end it will assume that is the socket it should be sending its own RTP to and switch to that. This is the mechanism Asterisk uses when you set nat=yes on a SIP account. It does pose a potential security hole in that an attacker could monitor the SIP traffic and then try and get an RTP packet to the SIP Server before the genuine device and thus hijack the RTP stream. In practice there are easier ways to break into SIP systems so it’s unlikely an attacker would bother with that approach under normal circumstances.

    This reflection mechanism is better than mangling because it gets around any port translation the NAT in front of the end user’s phone may have done. As mentioned above the sipsorcery server cannot use this mechanism since it never sees any RTP however when the SIP Server at the destination end of the call is using this mechanism the sipsorcery server doesn’t need to do any NAT handling anyway. If you’re having one-way audio problems it’s not a bad idea to try and find a SIP Provider that has their servers configured to use the RTP reflection mechanism. It saves you having to fiddle with your router and in practical terms is going to cope with most NATs.

    Generally speaking where one end of the call is a SIP Server on a public IP address one-way audio problems should be resolvable. The cases where they are not usually involve either a faulty NAT (there are more around than you would think) or where the a phone is behind multiple NATs. The latter can occur when ISPs run transparent NATs on their network because they are short of IP addresses or for some other reason. In theory multiple NATs should also be coped with by the RTP Reflection handling but in practice as the number of NATs on the audio path increases above one the risk of audio problems seems to rise exponentially!

    The other common situation with sipsorcery users is where there is no SIP Server in the call and instead the call is between two end user devices. Most people think that this will be a simpler situation and their should be less chances of audio problems but that’s not the case and in fact it’s the opposite. Now instead of having one device on a private IP address their will generally be two.

    NAT - User Agent to User Agent

    NAT - User Agent to User Agent

    The above diagram illustrates a call between two sipsorcery users where each user’s phone is on a private network. In this case both user’s have NATs that are doing port translation and neither of the RTP streams get through so neither user hears anything. More common is that only one of the user’s NATs do port translation so one of them will get audio and the other won’t. In this situation the success of the call depends on both the sipsorcery server being able to mangle the SDP so it contains the public IP address and also that the NATs involved do NOT do port translation. If either of those conditions are not met then one or both the RTP streams and therefore audio streams will fail.

    I do have more diagrams and explanations for NAT scenarios around locally installed versions of a sipsorcery server which are even more complicated since now not even the server is on a public IP address. However I’ll leave them for the next post.

    The best advice I can give to anyone having consistent audio issues on VoIP calls is to google their router model and see if there are any other people having the same issue and if they were able to fix it. If that doesn’t yield anything try and borrow a different router from a friend and see if the audio is better with that. If it is I’d personally replace the router as the long term frustration of audio problems on calls far outweighs $100 or less on a new router.