Tackling the NFV Packet Performance Challenge

Network Functions Virtualization (NFV) is all about putting cloud technology to work to support software-based appliances that deliver network service capabilities.  Network functions come in two flavors: those that implement control plane functionality only (for example an IMS core or a Policy Charging and Rules Function) and those that implement user plane functionality, requiring them to handle user payloads such as voice, streaming video or Web pages.

Experience with virtualization of control plane functions has shown that existing IT-oriented cloud technology copes remarkably well.  Control plane functions are typically quite compute-intensive and not exceptionally network-intensive.  From the point of view of the virtualization infrastructure, they resemble conventional IT workloads, so it’s no great surprise that we don’t see any serious problems with their performance.

User plane functions are a great deal more challenging, because they are often extremely network-intensive.  A packet data gateway function in a mobile network is not doing a lot of work on each packet, but it’s having to shift a lot of them.  Experience has shown that performance of these types of network functions can be severely constrained by limitations in the virtualization infrastructure.

Metaswitch CTO Martin Taylor

Metaswitch’s Perimeta SBC is a good example of a network function that operates in both the control plane and the user plane.  It handles SIP signaling at the secure border of VoIP and IMS networks, and it also relays and processes RTP media streams that support voice or video calls. Perimeta’s media handling function is particularly demanding on the underlying virtualization infrastructure because it’s dealing exclusively with small packets – and lots of them.  A mid-sized session border controller handling 10,000 concurrent voice calls will be passing a million packets per second.  The data path between the physical network interface at each compute node and the virtual machine that’s running Perimeta’s media relay function needs to be able to pass two million packets per second to handle this load, since each packet has to make its way from the wire to the VM and back again.

The Perimeta SBC is engineered to make extremely good use of standard x86 hardware, and it achieves outstanding performance when running on bare metal – i.e. not virtualized.  But when we deploy it in a standard virtualized environment -- for example on a KVM hypervisor in conjunction with the Open vSwitch -- we typically see a big drop-off in performance.  This is because the data path between the physical network and the virtual machines, which is provided by the Open vSwitch software, is relatively inefficient.

We should not be too surprised by this. Mainstream cloud and virtualization technology has grown up around IT workloads, which are not particularly network-intensive.  If we throw a workload into this kind of environment, which expects to exchange millions of packets per second with the network, we’re expecting to be disappointed by the outcome.

The good news is that we’re seeing a range of emerging solutions, both hardware-based and software-based, for this problem, and we’ve proven that both classes of solution can get us to where we need to be for virtualized SBCs to compare very favorably with current products based on proprietary hardware.

The hardware-based solution we’ve tried is based on Single Root Input / Output Virtualization (SR-IOV).  This is an Ethernet NIC technology that provides multiple virtual hardware interfaces toward the software.  Each VM running on a compute node can be configured to bind directly to one or more instances of the SR-IOV interfaces on the NIC, providing a data path between the physical network and the VM that bypasses the vSwitch and hypervisor software completely.  Not surprisingly, this approach can deliver performance that is comparable to running the network function software directly on bare metal.

The software-based solution we’ve tried is based on a commercial accelerated vSwitch solution from 6WIND.  This implements a highly efficient software-based data path between a conventional Ethernet NIC installed in the compute node and the VMs, using shared memory to minimize the number of packet copy operations – or even eliminate packet copying altogether.  Our tests show that this approach can deliver performance levels approaching 90 percent of that achieved with SR-IOV.  However, the accelerated vSwitch software itself consumes substantial processing power, so the overall efficiency of the solution is somewhat lower than SR-IOV.

Naturally, each solution has its advantages and disadvantages, and deciding which approach to take requires an understanding of the trade-offs between them.  For example, with SR-IOV it isn’t currently possible to take advantage of overlay-based network virtualization, which is commonly used in large-scale virtualization environments.  On the other hand, while accelerated vSwitch solutions do support overlay-based network virtualization, they are less efficient overall and they may also expose additional security risks through the use of shared memory that may be accessible from untrusted VMs on the same compute node.

There are other shortcomings of both approaches at present, in that cloud environments such as OpenStack don’t support either SR-IOV or accelerated vSwitches, but work is under way in the OpenStack community to address these.  It’s also worth noting that the Open vSwitch open source project has recently embraced data plane acceleration based on Intel’s Data Plane Development Kit (DPDK) and we can expect to see substantial performance improvements in the next release of Open vSwitch, due the first quarter of 2015.

Finally, it’s worth putting this performance challenge into context.  While new techniques such as SR-IOV and accelerated vSwitches may be necessary to achieve the highest possible performance with user plane network functions such as session border control, respectable performance is already being achieved with standard and widely available virtualization technologies.  For example, Metaswitch has several production deployments of Perimeta SBC running on VMware, achieving up to 1,300 concurrent media sessions on just two virtual CPUs using the standard vSwitch, with the ability to scale out simply by instantiating more VMs running the Perimeta media relay function.  This level of performance is sufficiently cost-effective to be highly attractive to network operators that want to move quickly to take advantage of NFV.