The missing link: VNF to SDN
NFV and SDN both play key roles in the ongoing evolution of telco networks to support flexible, programmable and composable services on a generic hardware platform. The relationship between NFV and SDN in this picture is quite complex – but most experts would describe it in terms of layered architecture, where SDN takes care of Layers 1 through 3, and NFV address services at Layer 4 and above. In a sense, virtualized network functions (VNFs) sit on top of a programmable network fabric provided by SDN. And by “SDN” in this context, we are talking about the end-to-end network fabric, comprising both physical network functions that support wide area links and the virtualized network fabric that exists in data centers.
The ETSI NFV architecture does not clearly identify SDN controllers as a separate function. Instead, they are an implicit part of the virtual infrastructure manager (VIM) and virtual network elements in this architecture. The VIM is driven by the orchestrator and by VNF managers (VNFMs) in support of VNF deployment and life-cycle management. So when the orchestrator wants to deploy a new VNF, it instructs the VIM to program the SDN fabric to provide the required virtual network connectivity for that VNF. And when a VNF is scaled out under the control of a VNFM, the VNFM instructs the VIM to make the necessary changes to virtual network connectivity to accommodate the increased population of VNF component instances.
This architecture implies that all of the inputs to the SDN controller are associated with VNF deployment and life-cycle management. We believe that this misses an important trick. We have identified two clear use cases where our own VNFs could usefully interact directly with the SDN controller, and we believe that there are likely to be many other such use cases out there.
So let’s describe the use cases that we’ve identified.
The first has to do with DDoS attack protection provided by session border controllers. SBCs, such as our Perimeta product, provide specialized security gateway functions at the edge of VoIP service providers’ networks. They recognize malicious traffic by matching incoming messages against the signatures of known attack vectors, and then blacklist the source IP addresses from which this traffic is originating. They then discard all incoming packets from blacklisted IP addresses. This enables a well-designed SBC like Perimeta to handle its full rated load of legitimate signalling traffic while simultaneously filtering out DDoS attack traffic at multi-Gigabit line rates.
So far, so good. But we could do better. Perimeta includes incredibly efficient software for filtering packets against an IP source address blacklist, but this software still consumes processing resources. Filtering based on IP addresses could be performed more efficiently with specialized hardware, such as that which exists in the physical routers deployed at the data center edge and in the access network. And secondly, for Perimeta to do the blacklist filtering job, we have to bring all of the malicious traffic into the data center, and it could be argued that it would be better to filter this traffic out closer to its sources.
Suppose instead that Perimeta was to export an API designed to program some external system to filter out packets based on blacklisted source IP addresses. Perimeta would do the work to identify the sources of malicious traffic, but would then rely on the network to perform the filtering. This API would be consumed by an SDN controller, which would program suitable network elements by configuring source address filtering with the IP address blacklist. DDoS traffic could then be dropped much closer to its sources, greatly improving the scalability of DDoS attack protection. And filtering could be performed in specialized hardware with much greater efficiency than relying on x86 processors to do the job.
The second use case we’ve identified has to do with fault tolerance and high availability. The media relay component of a large-scale carrier-grade vSBC may be in the media path for many thousands of concurrent voice or video sessions. The failure of such a component, if unprotected, would result in the loss of all these sessions – which is generally not acceptable. Therefore, vSBCs typically incorporate a high degree of fault tolerance, which is usually implemented by deploying the media relay function as an active / standby pair that shares a virtual IP address. If the active member of the media relay function pair fails, the standby member claims the virtual IP address, and all the RTP streams are quickly and automatically redirected over the LAN to the standby member. This approach, which is very widely used today, suffers from two main drawbacks. Firstly, half of all the hardware resources assigned to the media relay function are standing idle, just waiting to take over in the event of a failure. And secondly, the active and standby media relay functions have to be connected via a common L2 network segment. Some cloud networking architectures make this impossible, and it’s also hard to arrange for a common L2 segment to span between two data centers, so geographic redundancy is a lot harder to achieve.
If the vSBC control plane were able to interact with the underlying network via an SDN controller, then it would become possible to overcome both of these limitations. In the event of the failure of a given media relay function instance, the SBC control plane could instruct the SDN to send all the RTP packets destined for the IP address of the failed instance to a different IP address associated with a backup media relay function instance. This would make it easy to provide protection against media relay function failure with an N+k pool, where a small number (k) of standby instances protect a much larger number (N) of active instances, a far more efficient approach than the pairwise one in terms of hardware resources. And it would eliminate the requirement that active and standby instances be connected to the same L2 network segment, allowing for the vSBC to be deployed in L3-centric network architectures and supporting the live failover of RTP sessions between geographically separated data centers.
There must surely be many more use cases like this in application realms outside SBC and IMS, but so far we have seen no evidence that the SDN controller community is waking up and recognizing this opportunity to leverage the power of software defined networks to maximize the value of NFV. Maybe that’s because the architecture diagrams we are all working to don’t show even a dotted line between VNFs and the SDN controller. We think it’s time to recognize that missing link.
Martin Taylor is chief technical officer of Metaswitch Networks. He joined the company in 2004, and headed up product management prior to becoming CTO. Previous roles have included founding CTO at CopperCom, a pioneer in Voice over DSL, where he led the ATM Forum standards initiative in Loop Emulation; VP of Network Architecture at Madge Networks, where he led the company’s successful strategy in Token Ring switching; and business general manager at GEC-Marconi, where he introduced key innovations in Passive Optical Networking. Martin has a degree in Engineering from the University of Cambridge. In January 2014, Martin was recognized by Light Reading as one of the top five industry “movers and shakers” in Network Functions Virtualization.