Container Mania Grips the Internet of Talk: Going cloud native for VoLTE with Kubernetes
I know many of you have been itching to hear more about my adventures with Pokémon Go, following this wildly unsuccessful attempt to garner more page visits by lashing a highly targeted piece (QoS for the Internet of Things) with a universally broad and trending general interest topic. Having made it through the end of the 2016 summer holidays, when most people of sound mind had stopped playing, I hung in there until mid-November, ultimately bidding adieu to Pikachu and hanging up my training pants having reached the moderately respectable level of 23. Knowing that status would forever be immortalized somewhere in the cloud, I somewhat unceremoniously and to the surprise of my kids, who had simultaneously mocked me while fraudulently assuming my persona when it suited them, deleted the app. What life lessons did I learn, in all that time spent catching critters, that I can now impart in a public forum like this? Absolutely nothing. Seriously. I wouldn’t recommend it to anyone. It really was a colossal waste of time.
Which is actually a shame because, as I imparted in the aforementioned post, the story of how Pokémon Go was born was actually quite interesting. While I touched on the game itself, however, I didn’t really spend any time talking about where or how it was hosted. Not surprisingly, for an app originating from the parent company of Google, it was the Google Cloud Platform. And fortunately, at the very same instance Pokémon Go was wasting a sizable chunk of my life, it was busy proving an important point: Containers are cool.
Codifying Container Characteristics
As I have a penchant for pointing out in these posts, however, while they are cool, they are not exactly new. If you wanted to be pedantic, you could point all the way back to 1979 and the introduction of the Unix chroot system call1 as the birthplace of process isolation. Fast-forward two decades and BSD introduced the aptly named “jails” partitioning in 2000, followed by Solaris in 2004, with the first system separation technique referred to as a “container.” Even the self-contained execution environments we know today as LinuX Containers (LXC) are nearly a decade old, having been released in 2008. It was this foundation that Docker used, starting in 2013, to build its container empire, making the lowly LXC more portable and flexible to use and therefore a more attractive alternative to the now mighty virtual machine (VM).
Anyone who hasn’t had their DevOps head in the sand for the last decade will know that virtual machines and containers differ greatly in their partitioning methodologies. Simply put, a VM hypervisor (or virtual machine monitor VMM) carves up hardware resources while a container separates at the operating-system level. Being one to shun convention, I chose not to employ the oft-used Docker illustration here to detail the deployment differences diagrammatically. The do-over was based on a couple of factors. OK – three, if you count the fact that I like doing diagrams. First, I believed it to be a bit too melodramatic for my liking as it shows the Guest OS boxes being four times as large as any others -- four times larger than the host operating system box! Maybe it was because it included the abstraction layer that represents the hardware -- the VM itself? Maybe it was making a bold statement that the additional guest operating systems each VM must instantiate to host the application represent major overhead. Personally I don’t subscribe to such histogram-attic hysteria so mine are the same size as the others, but I added in the VM H/W layer.
Secondly, I wasn’t keen on the broad distinction between the hypervisor and the host operating system. With the exception of (the admittedly wildly popular) KVM, which is a type 2 host hypervisor, the others (VMware’s ESXi, Microsoft Hyper-V and Xen2) are type 1 bare metal hypervisors, meaning there is no host OS. I will concede that those hypervisors are big, lumbering OS-like beasts, still, so I’ve ultimately made them the same size as the container alternative… if I were representing the container engine horizontally… which I’m not. I like it this way as I believe it better represents the fact that the containers operate within a single instance of the host OS and the engine (i.e. Docker) actually acts on the (Linux) kernel resource isolation features, such as namespaces and cgroups. Ultimately, the applications run “on” a VM or “in” a container. One final efficiency note: If the applications are identical, then, in container land, they can share a common set of dependencies (e.g. binaries and libraries), whereas each VM instance must have its own copy.
My totally confusing interpretation of virtual machines vs. containers.
Now don’t get me wrong, I’m not anti-VM. Indeed, while they are old-hat for IT heads, to us telecom types, we felt like we’d struck gold when we started packing commodity servers with a variety of virtualized network functions and orchestrating their lifecycles with practically off-the-shelf toolsets. Whilst we indeed had pay dirt under foot, however, calculations were increasingly suggesting that the VMs, favored by many in the NFV field for their flexibility when hosting legacy (read: non-cloud native) network functions, were not providing the level of capital savings needed to justify this radical operational upheaval. This was not only because of the (guest) OS overhead but also because their unwieldy spin-up times demanded a higher degree of instances than we originally expected -- VMs that were otherwise sitting idle in case of demand up-ticks or catastrophic failures.
Comparing Containers to the Competition
There are various statistics floating around that pit container start-up times with those of a virtual machine. Those that employ OpenStack (Nova) to orchestrate the instantiation of each, as one might expect in real world scenarios, clock a container booting up between 1.5 and 3.5 seconds. Remove the cloud OS from the equation and rely solely on command line interfaces (CLIs) and containers can be ready in the sub-second range. In the best-case scenarios, virtual machines (in all cases, using the popular open source type 2 KVM hypervisor) take nearly twice that time. We are talking three seconds at the low end, all the way up to 11.5 seconds on the high end of the scale. Actually, there were some peg VM boot-up times in the 30-45 second range, but those same guys also had containers ready in sub 50ms, so I’m taking the approach of ignoring those high-end and low-end figures. That said, I will take the second-lowest and second-highest numbers for my benchmark, here, as they are statistics documented in an independent white paper.3
Container vs. VM with OpenStack -- average boot time performance comparison.3
Other pertinent statistics include reboot and shutdown times. Measurements for the latter vary from 2.4 to 3.5 seconds (containers) and 3.5 to 64 seconds (VM), which seems incredibly out-of-whack until one considers that most sources record soft reboot times for VMs in the two-minute (124 second) range, likely due to the delay in gracefully shutting down processes before restarting. Still, even with containers clocking in at between 2.5 and 4 seconds for a soft reboot, we can look at that number in isolation or take into account the redundancy that will immediately will kick in and say “so what?” Multiply that number out to tens of thousands -- even hundreds of thousands -- of virtual machines, however, and those figures quickly compound to a considerable amount of stranded capacity or (temporarily) unavailable resiliency.
So, containers are looking increasingly attractive for their availability, but what about their throughput? From an NFV perspective, we can look at this a couple of ways: a data plane-centric virtualized network function, and one that is dedicated to a control plane operation, which can be divided again into two categories -- network control (à la SDN) and service control (aka signalling). That good ol’ American research and scientific development company Bell Labs, now owned by that good ol’ Finnish company Nokia, released a white paper in January 2017 that detailed the findings of their testing of a decomposed (data and control plane) virtualized eNodeB. Using this virtualization philosophy (which I outlined in a recent blog post), Bell Labs performed measurements on a control plane eNB element hosted initially on a KVM hypervisor-based VM and then within a Docker container. The setup essentially employs an eNB Agent handling Radio Net Flow (RNF) protocol calls (also developed by Bell Labs, but when they were owned by that good ol’ French company Alcatel) from an OpenDaylight (ODL) software defined network (SDN) controller all hosted, along with its supporting MongoDB, on a common virtualization infrastructure. The RNF protocol employs the stream control transport protocol (SCTP) to prevent gnarly head-of-line (HOL) blocking in such time-sensitive signaling functions, a point I’m highlighting here simply because SCTP is interesting in its own right.
While the primary purpose of the test was to compare the processing performance, the authors noted that their setup took “several minutes” to instantiate on a VM, but “less than one minute” in the container-based implementation. With anywhere between one and 30 eNodeBs running within the virtualized infrastructure, the Docker-based architecture came out on top for all three measured packet behaviors, namely interarrival time, average waiting time and service rate.4
Virtualization of the radio access network: performance analysis.4
As you might expect, having read this far, the paper blames the heavyweight nature of the hypervisor for the poor showing of the virtual machine implementation.5 As the service rate chart indicates, the container has “virtually zero performance overhead” when supporting a small number of eNB instances, while the VM testbed displays the complete opposite behavior owing to the large overhead and slow resource allocation.
I should note, for those with concerns regarding container security, that you can instantiate containers within virtual machines, thereby enabling your prickly cybersecurity teams to employ exactly the same precautions they typically apply when locking down their infrastructure. Pet peeve warning: With that additional deployment option in mind, you may hear native containers (those not running in a VM) referred to as “bare metal containers,” which is nomenclature I fundamentally disagree with, given that the very nature of a container is that it runs on an intervening operating system -- the antipode of bare metal. But I get the point... reluctantly… I guess.
Now for a quick word from our sponsor
Around the same time Docker version 1.0 was publically released (March/April 2013), Metaswitch introduced a signaling-centric cloud-native IP multimedia subsystem (IMS) core implementation to the open source community. Project Clearwater was built specifically to enable network operators to realize the cost and performance benefits of containers over bare metal or even virtual machines for this critical control plane infrastructure. Many didn’t recognize it at the time, but this was genuinely prescient -- especially when you consider that the ETSI NFV White Paper #1 had only been out six months and most in the telecom world were still reeling from the prospect of ditching their hardware security blankets and porting even the most basic network functions to virtual machines. Indeed, while Metaswitch has been recognized as spearheading NFV, we were actually leading what is only now being acknowledged by the industry at large as the only approach to making NFV viable: communications software built specifically for the cloud. NFV 2.0
Clearwater Core loves containers.
A Complete Cluster
Like the previous example of a decomposed eNodeB featuring decoupled control and data plane elements, network functions that have historically integrated signaling and media components -- such as the Session Border Controller (SBC), by way of a totally random example (ahem) -- must also undergo similar treatment as they move from proprietary hardware to highly orchestrated SDN and NFV environments; as central offices are re-architected as data centers; and as operators generally insist that long-imagined, decomposed, hierarchical NGN models (like IMS) are extended into “the box.”
Metaswitch has, once again, been commanding the industry in this area, delivering custom silicon-like performance from standard commercial off-the-shelf (COTS) x86 server platforms. As mentioned, an SBC has a signaling element along with the media component, which should be decomposed in any network infrastructure worth talking about, thereby enabling these disparate functions to be deployed at different points in the network (i.e. core vs. edge) while having unique initial load capacity metrics and being able to scale completely independently. As a security device that must provide guaranteed protection against service-crippling DDoS flooding attacks, an SBC must handle signaling traffic differently from an IMS SIP proxy. While we initially achieved the required signaling and media processing goals through a combination of some kernel-level smarts and data plane acceleration techniques, like single root I/O virtualization (SR-IOV), containerization meant eliminating the dependency on the specialized host kernel module, which is no longer accessible. This was achieved by utilizing the Data Plane Development Kit (DPDK), which (as we know from this blog post) performs all of its acceleration magic firmly in the user space. Employing SR-IOV from a container means connecting them to a (NIC) virtual function (VF), something that was proven in late 2015 with some early hacks.6
Commercializing those kludges, however, demanded their addition to container cluster orchestration automation toolsets. There are really only two that are viable in this space: Docker with its Swarm, and Google with Kubernetes. Those who love Apache Mesos have now scrolled to the bottom of the page to leave me vitriol-ladened comments, which are always welcome, regardless. A cluster of containers are, as the name suggests, a collection of independent containers spread across two or more distinct host platforms, each with its own autonomous container engine. The containers would be Docker, while the container engine, resident in each host machine, could be either Docker or the Google Container Engine, which is somewhat confusingly known by the acronym GKE.7 It is the role of the Cluster Manager to treat those highly distributed containers as a single resource by allowing each independent engine to be aware of all others, dynamically internetworking (via a virtual overlay network of your choice) plus load balancing traffic and workloads. This could all be done manually, of course… but it’s not for the faint hearted.
A container cluster.
Owing in no small part to its openness, there has been an almost herd-like industry-wide adoption of Kubernetes, in the three years since its initial release. Even CoreOS is ditching Fleet in favor of it, come February 2018, touting the fact that it has become the de facto standard for cluster orchestration. Although Swarm’s tight coupling with the Docker container engine can simplify implementation, it also has an air of dependence that is generally frowned upon. While this contrary level of abstraction complicates Kubernetes, its incorporation into OpenStack (and consequently VMware’s vSphere integrated OpenStack offering) can provide an adequate layer of abstraction. Perhaps more importantly, Kubernetes can run on all public cloud environments (i.e. the Google Cloud Platform, of course, plus Amazon Web Services (AWS) and Microsoft Azure), meaning that the instantiation of applications that run within either public or private clouds or across a hybrid public/private cloud infrastructure can be somewhat simplified.
Recognizing this, our friends at Intel were quick to develop the required SR-IOV plugin for Kubernetes, enabling us to deploy a commercially viable SBC within a container, thereby rounding out the portfolio of products required to deliver a complete cloud-native Voice over LTE (VoLTE) offering.
While demanding a complete service infrastructure overhaul, VoLTE is table stakes for mobile network operators. Whereas the majority are looking to network functions virtualization to deliver a viable, cost-effective and resilient service offering, it is now clear that initial approaches to NFV, built on classic virtual machine constructs, cannot meet these requirements. Although virtual machines mask the inadequacies of legacy communications software components, which can be ported from proprietary hardware with little effort, we have seen that these hypervisor-based environments have extremely high overheads and lack the speed to protect against infrastructure failures without costly redundancy that practically mirrors today’s non-virtualized implementations.
Metaswitch is the first company to deliver a complete VoLTE offering built using cloud-native software methodologies that can be deployed in public, private or hybrid cloud environments using lightweight containers. As we now know, virtualizing the operating system, rather than at the hardware level as with hypervisor approaches, dramatically reduces overheads, thereby enabling our individual virtualized network functions to instantiate immediately, providing both capacity on-demand and resiliency only when required. Featuring the highly automated commercial container cluster orchestration, the resulting solution represents millions of dollars in capital and operational cost savings over traditional NFV practices while dramatically increasing overall service agility.
Those previous 200 words might have read like marketing-esque bowel movements to you, but I’ve already been able to repurpose them twice already, where such highly scripted diarrhea was warranted, so I’m a happy boy right now. This writing stuff is hard, you know!. More importantly, I can defend the drivel with hard evidence of our expertise in this area. This short video shows how, at a ridiculously low cost, we can spin up an entire one-million-subscriber VoLTE infrastructure on a private cloud using Kubernetes, while employing AWS to provide 100 percent service redundancy from practically a standing start. Take a look -- it’s pretty cool.
Developing cloud-native communications software with a philosophy of deploying modular, decoupled components in lightweight containers also enabled Metaswitch to be the first company to deliver on the promise of NFV Microservices -- highly granular and reusable components that can support critical features across multiple virtualized network functions.
Containers Are Cool
Which takes me right back to the beginning of this short missive. While Metaswitch was the first to prove what Kubernetes can do in the mobile network services infrastructure arena, the developers of Pokémon Go already knew its potential, having somewhat accidentally proven Kubernetes’ ability to dynamically orchestrate GKE powered containers at a “planetary-scale.”8 I say “accidentally” because, back in early 2016, no one could have predicted the stratospheric success of the game on its release. Within 15 minutes of its release in Australia and New Zealand on July 5, 2016, player traffic surged past Niantic’s “worst case” estimated traffic expectations by a factor of 50.
Pokémon Go 2016-2017 daily usage statistics.9
Although Kubernetes fared admirably through that ridiculous surge, between mid-July and mid-August, the need to dynamically scale up and scale back compute resources has not diminished. Let us not forget that, while the hype has long since died off, Pokémon Go remains the most popular (and the highest-revenue-generating) mobile app by far, currently wiping the floor with Slither and Clash of Clans. While not necessarily cool, those 7.5 million daily active users (DAU) are hardcore, as is evident by the steady traffic usage and traffic spikes during the Halloween event on the release of 80 new Pokémon to the virtual world. During that event in February, with a mere one-third of the DAU, data usage roared passed that of the initial launch period, but once again Kubernetes cranked on, thereby proving, beyond a shadow of a doubt, that containers are indeed cool. But maybe that’s just my container mania talking.
Learn more about how Metaswitch is powering the Internet of Talk at www.metaswitch.com/IoTalk
2.Even though it’s built on Linux.
5.Note: The non-linearity of the VM graph was due to the extra time required to instantiate VMs.
6.https://software.intel.com/en-us/articles/single-root-inputoutput-virtualization-sr-iov-with-linux-containers, for example.
7.It’s actually to prevent acronym glare, in that there is already a Google Compute Engine (GCE).
Simon is the Director of Technical Marketing and a man of few words.