Continuing on from my rant on P2PSIP I have been doing a bit more reading of the various NAT related standards and solutions.

The most popular NAT traversal mechanism I have come across in my time in working with SIP is STUN. I thought I knew what STUN was, Simple Traversal of User Datagram Protocol (47 pages), and have even implemented a rudimentary server for that protocol as part of the SIPSorcery project. However during my reading and much to my surprise I came across a different STUN protocol, Session Traversal Utilities for NAT (51 pages) which obsoleted the previous STUN “proposed standard” (most IETF proposed standards never get to the official standard stage). To my mind it’s crazy to introduce a new standard with exactly the same acronym as an earlier standard, even adding a 2 giving STUN2 would at least gives implementors and users a way to differentiate between the two as it is there is going to be a lot of confusion. I had a good understanding of the purpose of the original STUN standard: to allow an application to determine its public IP address and an indication of the behaviour of the NAT it was operating behind. The purpose of the new STUN standard, which I will refer to as STUN2, is a lot less clear; from a quick reading it has now added NAT keep-alives, connection checking and also now includes TCP NAT handling mechanisms. My initial thought when I saw the new STUN2 standard was, great maybe a more robust solution has been found to cope with situations where STUN fails. Unfortunately that’s not the case and instead it seems to me that there are no real enhancements in STUN2 and it just handles a few more esoteric edge cases that a more of a solution looking for a problem.

The standard that is touted as the silver bullet for NAT and SIP – and that should therefore solve all one way audio issues (aside from codec incompatibility ones) – is the “expired proposed standard” Interactive Connectivity Establishment (ICE) (119 pages). ICE makes use of STUN2 and yet another proposed standard Traversal Using Relays around NAT (TURN) (81 pages). In simple terms ICE states that an attempt will be made to establish a media connection using STUN2 and if that fails the media will be proxied via a TURN server. As I’ve blogged previously proxying media is a BAD BAD solution; it limits the features that can be used on a session to the lowest common denominator between the user-agents and proxy server rather than just the user-agents; it introduces latency and quality degradation into the media path; it introduces security concerns and the list goes on and it’s worth noting again that as video begins to replace voice those factors will be exacerbated. The classic example of this is Skype, they have arguably the best VoIP widely deployed protocol on the internet and they also use an ICE equivalent mechanism to deal with NAT. When a direct connection cannot be established between two Skype callers the media will be proxied through a super-node, that works well for voice but with video things are not so rosy. Anecdotal evidence (admittedly from a very small sample set of the people I know using Skype) has shown that Skype video calls almost always break or chop up after 5, 10 or 15 minutes.

The crazy thing about the whole situation with the burgeoning explosion of standards to deal with NAT for SIP are all the result of one very very big design failure that being the lack of IPv4 addresses. Of course VoIP and SIP are not the only protocols that have to deal with NAT, FTP is another protocol that has real problems and in fact there are very few application protocols that are not impacted by NAT in some way. The sequence of events has been: IPv4, with its huge design flaw, was adopted as the standard network protocol on the internet; to overcome the shortage of IPv4 addresses NAT was adopted so that ISPs could continue to sell internet connections; new application protocols (such as SIP) failed to fully accommodate NAT and were not able to work robustly with NATs; more application protocols (such as STUN) were introduced to help other application protocols overcome their NAT handling deficiencies; yet more application protocols (STUN2) were introduced to fix the earlier application protocols that failed to help the other application protocols handle their NAT handling deficiencies. It’s like an inverted pyramid with the IPv4 design flaw at the bottom and the size and effort for solutions to NAT and application protocol flaws growing wider and wider. It could just be me but everytime I think about it or see people on VoIP forums getting frustrated with one-way audio issues I struggle to comprehend how the situation has been allowed to reach this point. Sure IPv6 is a massive expense in time and effort to implement but it could actually be the silver bullet in this case.

Back to reality. After shaking my head at the thought of implementing more proposed standards in sipsorcery to solve a few more edge cases but not really help that much I went looking for some empirical data of how bad the one-way audio problem is for SIP. I found a NAT Tester Site that has managed to collate a survey of over 1360 NAT devices. If a SIP user-agent is used in conjunction with a STUN server and the NAT it is behind preserves the port (see the preserves column in the results table) than in theory one-way audio problems will not occur. Out of the 1360 devices there are 173 that are listed as not preserving the port that’s 12.7% of devices. That means ICE, STUN2, TURN and the more standards that are sure to follow are being written and implemented for approximately 1 in 10 devices! Being the lazypragmatic programmer that I am I’m very disinclined to implement new features that are only beneficial to such a small number of users.

My recommended solution to anyone experiencing persistent one-way audio issues is to forget about all the solutions except STUN (that’s STUNv1 – Simple Traversal of User Datagram Protocol). If STUN doesn’t fix the problem for you then throw away your router and get a new one. Routers are cheap enough these days that the time spent stuffing around with trying to work around a crap one is just not worth it. The ideal type of router for VoIP/SIP is a full cone, port preserving one. At all costs avoid symmetric and/or non port preserving ones. Check the survey results for a matching one AND then do a web search for the router model and “one way audio” just to make sure.

If you’re a SIPSorcery user you don’t even need to use STUN with your SIP device. The SIPSorcery application serve will automatically replace private IP addresses in SDP payloads with the IP address that the INVITE request or Ok response is received from. STUN acheives exactly the same outcome but instead of relying on a SIP server the SIP client utilises a STUN server to replace the private IP address BEFORE the INVITE request or response is sent. The SIPSorcery application server also has an additional feature that lets the mangling – mangling is a commonly used term used when doing a string replacement – be controlled from the dialplan. The ma dial string parameter allows the SIPSorcery application server manglig to be turned off which is very useful when the call is between two SIP user-agents on the same private network. That’s something you can’t do with STUN and instead you have to hope the router is clever enough to substitute the private IP addresses back in for the public IP addresses, in my experience consumer routers are not very clever.