WEBRTC and SBCS Part 3 Encryption

This is the third in a series of blog posts about WebRTC.  It describes the encryption schemes used in WebRTC.  WARNING – DISCUSSIONS OF ENCRYPTION ARE NOT FOR THE FAINT-HEARTED.

  • Part 1, outlined the reasons why we expect to see a WebRTC gateway co-located with an SBC.
  • Part 2 provided an example end-to-end WebRTC solution including an SBC.

Originally the WebRTC specs mandated DTLS-SRTP was used to secure WebRTC communications.  However the SDES-SRTP scheme is much more common amongst telecoms equipment today, and I expect it will start to be used alongside DTLS_SRTP in WebRTC as well.

In SDES-SRTP, symmetric keys are exchanged by the signalling protocol.  These keys are used to encrypt and decrypt the media.  For the media to be secure, the signalling protocol needs to be fully encrypted and trusted, e.g. using TLS.

In DTLS-SRTP, the TLS exchange occurs in the media plane.

However DTLS-SRTP still has some security requirements on the signalling.

  • A fingerprint is required so the endpoints can use self-signed certificates, instead of paying to get their certificates signed by a trusted third party.  Self-signed certificate exchanges are susceptible to Man-In-The-Middle (MITM) attacks.
  • The fingerprint is uniquely generated from the endpoint’s certificate using an irreversible algorithm.
  • To prevent a MITM attack the signalling must also be integrity protected, else a device could tamper with the fingerprint.  However the fingerprint isn’t secret – so full encryption isn’t required.  E.g. the lightweight SIP Identity mechanism without requiring full-blown TLS).

Once the media pinholes have been set up, the standard TLS exchange occurs in the media plane.

  • The endpoints exchange certificates – probably using RSA to authenticate one another’s identity.
  • The endpoints use Diffie-Hellman to generate symmetric keys.
  • The symmetric keys are used for the duration of the call to encrypt and decrypt the media.

The DTLS-SRTP scheme follows from the philosophy of WebRTC not to define any particular signalling protocol and for media to flow directly between the endpoints in the call.  When used over the internet this is quite neat, as it means the entities on the signalling path never get to be able to understand the media.  However it has significant drawbacks when interoperating with existing telecoms networks, where the media stream has to be decrypted at the edge of the telecoms core network.

  • Operators have to supply media data when mandated by Lawful Intercept requests.
  • To provide other advanced call services (e.g. PSTN breakout, announcements, voicemail, transcoding from Opus to some other codec etc) the media needs to be decrypted.

Considering these requirements, SDES-SRTP becomes much more attractive than DTLS-SRTP.

  • Exchanging certificates for every media session is extremely performance intensive due to the asymmetric RSA.  In traditional VOIP equipment (using SDES-SRTP) this expensive operation occurs on a much less frequent per-subscriber basis.  So, many more WebRTC Gateways will be required to support DTLS-SRTP than SDES-SRTP, which is more expensive.
  • Existing equipment already supports SDES-SRTP, not DTLS-SRTP.  So interoperability is much easier with SDES-SRTP

Overall, as WebRTC matures and is used to interwork with existing networks,we expect WebRTC will support both schemes.  This can already be seen today – e.g. Google Chrome supports both schemes.  Any best-of-breed gateway will need to support both schemes as well.  Then it’s up to the endpoint developers whether to use SDES-SRTP (with its lower performance requirements and greater interop) or DTLS-SRTP (with its greater security for the media plane).

References

Further parts to follow…