From cats to crustaceans: The role of microservices in autotomizing NFV

The exact origin of an analogy is often hard to ascertain -- especially when it is one which is born, buried, rediscovered, revived, reimagined and repurposed over time. I’m therefore happy to muddy the waters further by taking the current “cats to cattle” comparison and adding my own ill-conceived spin.

from-cats-to-cattle.jpg

With no direct referenceable source that I can identify, it was Bill Baker (a Distinguished Engineer at Microsoft) who supposedly uttered this original equivalence, a variation of which is now wildly working its way around the telecommunications industry speaker circuit. While the chain of custody is not entirely clear, it was first employed sometime in the early part of the decade and in reference to a proposal for horizontally scaling (out) database servers versus vertically scaling them up.1

He is quoted saying the latter requires servers to be treated like precious family pets (“you named them and when they get sick, you nurse them back to health”) while in the modern compute arena they should be treated like livestock -- “numbered rather than named and when they get sick…” you take them behind the barn and... well… you get the idea.2

I get confused

While the original premise was around sharding databases,3 it was Randy Bias who was credited with mainstreaming the concept by extending it to entire cloud infrastructures. OK, it was actually the Swiss atom smashers at CERN who popularized it, with a 2013 shout-out in The Register,4 but with full credit to Randy... eventually. Confused yet? Read the footnotes. It won’t help, but why should your head be spinning any less than mine?

Yes -- I thought it would be fun to dissect this analogy in my blog post. Imagine how disappointed I was, therefore, to find that Randy had already done so in a post of his own about a year ago.5 In that article, he chronicles how he commercialized the “meme” for cloud consumption. Perhaps more interestingly, he describes how the metaphor was distorted in a pretty significant way by people who are quite key to the pets/cattle premise.

Some other people get confused

The simple principle that we should treat compute resources in a disposable manner was complicated by the Kubernetes community when they introduced the concept of PetSets in version 1.3. Rather than having anything to do with individual relevance of container instances, PetSets purely referred to the statefulness of an application. Although I continuously bow to the technological expertise of any group individuals driving this industry, it seemed like a strange segue.  Having been involved in one of the early examples of containerize-able communications applications -- Project Clearwater6 -- where the need for (distributed) state maintenance is paramount, I thought there seemed to be little correlation between the software implementation of that IP Multimedia Subsystem (IMS) Core and the statefulness of a specific host container.  This might be why PetSets became StatefulSets in version 1.5. We move on.

We attain some semblance of clarity

Clearwater was the first (and remains one of the only) network functions virtualization (NFV) virtualized network function (VNF) built, from scratch, for the cloud. As such, Clearwater epitomizes the cats/cattle comparison and has been a gold standard VNF implementation for proving NFV infrastructure (NFVI) proposals since ETSI ISG PoC No. 1 in 2014. 7

With the propensity for core network equipment vendors to short-circuit the otherwise monumental task of adapting to the demands of a virtualized network infrastructure, most took the highly inefficient approach of simply peeling and porting existing codebases from their proprietary roots onto an isolated virtual machine (VM). Although this is a comparatively quick process, there are many fundamental flaws to this tactic.

Without question, when we first started talking NFV way back in 2012, it was natural to think VMs. The first time the carrier virtualization proposition went sideways, however, it was at the realization that simply replacing proprietary hardware with commercial off-the-shelf (COTS) x86-based server platforms was not enough to make the heady jump into data-center-centric operational models. Automation, in the form of VNF lifecycle management and software-defined networking (SDN), took the driver’s seat. In the management and orchestration (MANO) fervor that ensued in the years following that realization (which was admittedly a boon for interop communities, partner managers and PR Newswire) the question of how and where the actual VNFs were deployed took a backseat.

Practically screaming above the din of industry interop announcements, the network operator community was eventually able to make the findings of their latest business case analysis known. With all the automation-will in the world, suppliers still needed to address the inefficiencies of the virtual machines and the complacency around simply porting code.

The exorbitant overheads of hardware virtualization are dramatically compounded by the fact that we must essentially replicate the redundancy practices of the physical realm, scaling up not only the active elements but also a commensurate number of hot standby instances. This is due to an inherent inability to leverage web-scale distributed resiliency techniques and the lethargy they exhibit in terms of spin-up and reboot times. Even in the virtual world, as their very label indicates, these instances stand idly by while still consuming budget-breaking compute capacity.

Simply put, the excessive care and attention that operator must extend to these crudely ported VM-based VNFs represents an archetypal pet proposition, represented by the reality that we have to be aware of and concerned about their well-being and solidified by the fact that we’ve given them the names “active” and “standby.” How cute.

I bring up Pokémon… again

Although centered around Pokémon rather than pets, previous posts have detailed the benefits of containers over virtual machines and the virtues of building cloud native.8 On the evolutionary path from cats to cattle, however, replacing an amorphous blob of a VM-hosted application with a container-based counterpart affords only a modicum of value. The more intricate and bloated the individual process instantiated in a container are, the more we must care. Not only about them, but about the people who build them and supply them.

In the biological sense, the term “autotomize” (as witnessed in the title of this post) refers the ability of certain animals to deliberately shed parts of their own bodies, usually to escape from a predator. It’s a trait that is reasonably widespread amongst reptiles, salamanders and various invertebrates (like crabs) in the wild. It’s also the name of a fifth-generation Pokémon, but I swore not to go down that road again. Ever.

Applying a word I just found on the Internet

It is the similarities between a crab’s ability to avoid being irrevocably ensnared and the benefits microservices brings to NFV, however, that led me to the conclusion that we should be treating individual communications infrastructure components not like cattle, even, but like crustaceans.

from-cats-to-crustaceans-infographic-a.png

A microservices approach to NFV affords many advantages over just being cloud native

But, in all seriousness, this is pretty cool

That’s the potential power of delivering virtualized network functions as a collection of individual processes built and deployed on a microservices platform. In an industry driving toward granular disaggregation, adopting microservices methodologies across the entire spectrum of network functions -- from data plane to control and application servers -- is increasingly becoming known as network functions disaggregation (NFD).

Employing a microservices development toolchain can have a measurable effect on how – and how fast – VNF code is developed, deployed and enhanced. Indeed, adopting a microservices approach to software engineering is the only real way to truly embrace much-lauded DevOps philosophies. With the microservices platform extending a common, open development framework, while also abstracting the complexities of the underlying infrastructure and providing critical lifecycle management and connectivity services, developers can operate in a genuinely agile manner. Feature sets can be decomposed into smaller, stand-alone, elements that instantiate, scale and quiesce independently -- generally without anyone keeping track or otherwise caring, particularly.

Not only can smaller, specialized teams develop such microservices in relative isolation, this approach opens up the opportunity for independent software vendors (ISVs) to build distinct components based on their area of expertise. Of course, this also means that if another company develops a better version of that feature, the original one can be ejected and the new one embedded without affecting any other operational attribute of the VNF.

microservices-approach-to-building-vnfs.png

A microservices approach to building virtualized network functions

Nullifying nonsensical nomenclature

I was going to steer clear of how we identify these VNF microservices in NFV parlance, but that would be completely out of character, so here we go: Is a microservice a VNF or a VNF component (VNFC)?  The obvious answer is that it’s a VNFC. However, the ETSI specifications define a NFVC as having proprietary interfaces and APIs. As soon as those interfaces are open or standards based, the component becomes a VNF, in its own right. If we are to take the specifications literally, therefore, a microservice is a VNF, which is a crying shame, given how applicable the term “component” is to a microservice. But I’m not going to continue to lose sleep over it. I made a promise to my doctor. Maybe we’ve just moved beyond the original labels. Anyway, owing to this confusion, I will continue to simply refer to the respective elements as “microservice” (part) and “VNF” (whole).

It should come as little surprise that Metaswitch is in the driving seat, once again, when it comes to turning the industry toward microservices. It’s a little-known secret that not only is the aforementioned Project Clearwater IMS core cloud native, but it also was actually built using early microservices methodologies. Lessons learned from the web scale giants. We therefore have half-a-decade head start on others in this area. In order to realize the promise of fully autotomized NFV, we recognized very early on that there must be a general agreement on how microservices are written. A common language. Open development interfaces. A solid framework and toolkits. An extensible platform.

A little Rust adds character

As our company tends to do, we looked at this from the perspective of a blank slate. How would we achieve the goals of a microservices-based development methodology without compromising for current conventional wisdom or taking legacy into account. We started at the very foundation of a software application: The programming language. Yes, engineering teams make these decisions on a reasonably regular basis, but our technical leadership didn’t just look at the familiar options. We wound up at Rust.

Rust is a relative newcomer to the programming language arena, but even when put head-to-head against other contemporary alternatives (like Go) it has many unique, compelling qualities. Having not programmed anything since I hunched in front of a portable TV precariously balanced atop a Dragon 329 trying to get obscene words to scroll down the screen, I know nothing about the merits of one language versus another. Fortunately, we have lots of people here who do, so I asked them.

Rust, it turns out, has some pretty unique attributes. Its superior memory management pretty much eliminates the potential for memory leaks, which even I know represent an all-too-common and catastrophic bug. In the same vein, Rust eradicates the bugs that would typically be present in multi-threaded software -- which most code is, these days, as CPU cores are replacing Moore’s law in compute power evolution. Contrary to its name, Rust is also blazingly fast. That’s not to say you can’t make other languages run as fast, but they don’t get the emotive prefix as they don’t compile to native machine code programs that are run directly by the operating system and therefore need a bunch of low-level code. A problem waiting to happen. A migration from archetypal languages, Rust also does away with CPU-killing garbage collection and runs really lean from a memory-usage perspective -- efficiencies that lend themselves perfectly to microservices, of course.

Software, not sushi

The moral of this story is that we should all contract crabs. And, by that, I mean network operators and suppliers must embrace network functions desegregation by adopting a microservices approach to developing and deploying all varieties of network functions -- from data plane switching and routing to middle boxes, application layer gateways and application servers. I recognize that this isn’t an easy step, but we are currently living in an industry where we are continually making such difficult but important strides.

Successfully doing so requires vendors that are willing to wager on the benefits of microservices methodologies and a community prepared to collaborate in advancing the appropriate open interfaces and environments, much like the Acumos Project, a Linux Foundation initiative AT&T introduced for AI just this week.10 Prescient, Dredgie. Like them, I believe the migration from cats to cattle to crustaceans is central to successfully aligning network operators with their agile web-scale competitors. It’s time for us to shed some legacy and rethink our development and deployment approaches. It’s time to autotomize.

 

Download the white paper: The application of cloud native design principles to  network functions virtualization 

 

1. As far as I can ascertain, it wasn’t Bill’s presentation. It was actually by Glenn Berry who quoted him.
2. I will reference the DevOps handbook here, although it cited The Register4, which doesn’t contain the quote and (pre-correction) cites CERN and then Randy Bias as the source.
3. https://en.wikipedia.org/wiki/Shard_(database_architecture) (Generally speaking, it was proposed that the benefits of distributing databases outweighed the complexity of implementing them and that doing so brought them in line with the modern view toward how we treat even the host compute components, themselves.)
4.https://www.theregister.co.uk/2013/03/18/servers_pets_or_cattle_cern/
5. http://cloudscaling.com/blog/cloud-computing/the-history-of-pets-vs-cattle/
6.www.projectclearwater.org
7. http://www.etsi.org/technologies-clusters/technologies/nfv/nfv-poc
8. https://www.metaswitch.com/the-switch/container-mania-grips-the-internet-of-talk-going-cloud-native-for-volte-with-kubernetes
9. http://www.nostalgianerd.com/the-dragon-32 It had a real keyboard, ya’know.
10. https://www.sdxcentral.com/articles/news/att-tackles-artificial-intelligence-with-open-source-acumos-project/2017/10/

 

Footnote: My boss (in his infinite boss-like wisdom) also pointed out (post-publishing, as I let him claim plausible deniability of my public commentaries) that crabs move sideways, thereby further tying the extended analogy to sideways scalability. This is, of course, why they pay him the big bucks.