As a consequence of being stuck inside on a miserable Dublin December day and being away from my development machine I ended up spending the last few hours looking over some of the proposals that are around for Peer-to-Peer SIP (P2PSIP). My interest is mainly academic, I don’t have any work planned for sipsorcery in the area, but does partly derive from interest in contemplating how Goozimo will go about it in order to compete with Skype. One thing’s for sure they won’t be wasting too much time with the proposed P2PSIP enhancements that are floating around in the IETF space which are taking a bad situation, the existing bloated SIP standard and making it much, much worse!

What’s my problem with the P2PSIP efforts? It’s building a house on foundations of sand. The efforts are relying on a bloated set of SIP standards coupled with relying on hacks, and even hacks to hacks to overcome shortcomings in the original SIP standard in dealing with NAT.

The core SIP standard document is 269 pages long and deals with six request types: ACK, BYE, CANCEL, INVITE, OPTIONS and REGISTER; there are additional standards to deal with what are considered core SIP functionality REFER (aka transfers) (23 pages), INFO (9 pages), NOTIFY (38 pages) and more. Then there are the enhancement standards to fix the things the original SIP standard stuffed up rport (13 pages) and PRACK (14 pages). And only then do the extensions and options, including P2PSIP, come in. The size and complexity of the core SIP standard and the excess of addon SIP standards translates into big problems for implementors. A classic example is the SIP stack in the most popular VoIP server around, Asterisk, it’s taken a massive monolithic 27,000+ line C file to implement and has had some serious difficulties with even the core features; the best example is the 3+ years it took to write the SIP TCP implementation.

The P2PSIP efforts are taking a bad situation, the existing SIP standards and implementation difficulties, and building on top of it to make things worse. It’s not only server implementations like sipsorcery and Asterisk that are implementing SIP stacks but also the hundreds of SIP phones, ATAs and softphone manufacturers. In order to make SIP features work a majority of implementors must put the effort in to implement and test them. As the effort required continues to snowball two things are likely to happen: alternative standards will be developed (Skype’s proprietary protocol is an example); SIP device manufacturers will decide it’s all too hard and restrict themselves to the core standard and/or cherry pick SIP add-on standards thus creating big interoperability problems.

The other problem is that even with the reams of SIP and associated standard documents the NAT problem for even the simplest call scenario has not been solved, a very good and succinct explanation of the problem can be found here. As a consequence a further set of standards has sprung up to help SIP (or more correctly the media streams being initiated by SIP) to cope with NAT: STUN, TURN, and ICE being the most popular ones. The paradox is that in the most prevalent type of SIP calls on the internet today, the call between an end-user and a SIP Provider, the NAT mechanisms are often completely ignored and instead a SIP Provider will reflect the media stream back to the socket the client sends from, not a particularly secure mechanism but a pretty robust one and certainly a far superior one to those offered by STUN, TURN and ICE. So all these NAT standards really do is add to the implementation effort for SIP device manufacturers and while they help in some cases they don’t definitively solve the problem and therefore end up creating more confusion for poor users trying to ascertain why their calls have one way or no audio. A side effect I have observed of the failure of the NAT coping mechanisms is that public forums dealing with SIP services have people suggesting STUN settings for completely unrelated problems such as things like callerid.

The foundations of sand are constituted firstly by an overbloated set of SIP standards that are difficult and error prone to implement and secondly by a set of standards to deal with NAT that are not required in the majority of cases and fail to work in a large number of the remaining cases.

Those are the foundations the P2PSIP efforts are proposing to build on. P2P networks are difficult to design, the biggest problem is how to bootstrap peers into the network. To overcome that problem most P2P designs are actually hybrid P2P networks that rely on a central server for a number of critical functions. Napster is the original example of the hybrid P2P network and the failure of it. Napster facilitated mp3 file sharing, largely illegal sharing as it turned out, between peers in the network and because it relied on a central server to allow peers to join the network the authorities were able to easily shut it down. The networks that followed on from Napster, the likes of Gnutella operated without a central server. Without a central server the peer-to-peer networks are very difficult to shut down which is why these type of networks are still around today and largely immune to authorities.

The P2PSIP documents propose a new type of hybrid P2P network that relies on a central server for boot strapping. The P2P network in P2PSIP is primarily a storage layer utilising a Distributed Hash Table (DHT) approach. The DHT replaces the function of a SIP Registrar and SIP Proxy in a traditional “client-server” SIP network (the inverted commas are because SIP is not a client-server protocol but user agents assume client or server roles for certain operations) and the DHT is used to store the contact details of peers that have joined the network. In theory it’s not a bad idea, SIP registrations are a big burden on traditional SIP networks and offloading them to a peer-to-peer network would seem to have merit. It comes down to a trade-off between the server load for SIP registrations and the complexity of implementing yet another new SIP standard in hundreds of SIP devices. If the P2PSIP proposals were restricted to SIP softphones which have the advantage of operating on flexible general purpose hardware then it would be a debate with merit but the strength of SIP is its universality and unless a new proposal is practical IP phones, ATAs, mobile device softphones etc. then it should not even be considered. Another point is a DHT even the best way to scale the storage SIP location information? A standard called ENUM already exists that utilises DNS, a proven scalable storage service, for location information. SIP user agents already need more sophisticated than normal DNS stacks in order to process SRV records that yet another supplementary SIP standard A DNS RR for specifying the location of services (DNS SRV) relies on and in this case one that is already well supported by existing implementations.

Going back to the original question pondered about how Goozimo will implement a peer-to-peer SIP mechanism in order to compete with Skype my guess is that they won’t go near the current P2PSIP efforts with a barge pole. Scaling server side is something Google are experts at so they’ll handle the SIP registrations using the existing mechanism. How they will deal with media and NAT is the big question. In fact it’s always the question when it comes to SIP and NAT. The paradox in this case is that the solution won’t be found by only considering SIP and NAT: NATs are already deployed everywhere and have to be considered a set fixture; as discussed above SIP is already complicated enough. Instead the media layer, in SIP’s case this is usually RTP, has to get smarter. The solution is not TURN which involves proxying the media or even a Skype like mechanism that uses a more scalable approach with super nodes doing the proxying. The media streams are only getting larger with video and conferences and it’s not scalable to proxy it and also not desireable as every extra hop the media goes through adds latency and potential degradation. The solution is to introduce a mechanism into the media carrying protocols that makes them able to cope with NAT instead of ignoring it. It will may mean the media protocol has to become aware of the signalling protocol, or at least some services offered by it, something which is undesirable from a design point of view with a clean separation of layers between protocols. However if it’s a means to fix the problem where all previous attempts have failed than violating a design principle is worth it. Apart from the time it has taken to write this paragraph I have not put any thought into what such a solution would even look like, perhaps some kind of broker service offered by the signalling layer where the media protocol could send a single rendezvous packet to the signalling server so that the public media socket is known and then can be used in the call request. Perhaps the signalling and media protocols can be multiplexed over a single socket although that would be a big change and I suspect there would be a portion of NATs that would fail to cope properly with a single private socket mapping to multiple public sockets. Fingers crossed the engineers at Goozimo will come up with not just a solution but a good solution and then use Google’s muscle power to prevail it on the industry and solve the abysmal vision of the SIP designers.