The Service Communication Proxy: 5G Caught Up in a Service Mesh

There’s no escaping 5G, that’s for sure. Like every other high-tech marketeer, I worry constantly about over-selling a technological evolution or innovation. The thought of saturating the industry narrative with a concept still in its infancy is cause for a countless number of sleepless nights. I’m totally kidding, of course. I don’t care. It’s a dog eat dog world and we Mad Ave wannabees are here to sell the dream - even if that dream is somewhat of a nightmare, once we start interpreting it. And that genuinely is the case, when it comes to the Service Based Architecture.

5g-logo-fishing-net

Outside the bumper bandwidth upgrade and the relentless pursuit of low latency, one of the more interesting facets of 5G is that it defines a core network infrastructure which undeniably demands a strict adherence to cloud native design principles. Much smarter people have come before me with a crystal clear definition of what exactly this means, so just type “cloud native” into our search bar and enjoy. In specific terms, this philosophy resulted in the definition of the 5G Service Based Architecture (SBA).

The goal of the SBA was to provide a Service-Oriented Architecture (SOA) for hosting distinct control plane components from differing vendors with disparate development cycles that could easily interwork and interact to deliver a complete 5G subsystem or service offering. SOA is a coarse-grained disaggregation concept that, for logical reference, sits between historical monolithic systems and a single network function element developed using modern fine-grained microservices methodologies.

msa soa legacyWhere do Service Oriented Architectures fit?

This SOA approach replaces those proprietary, monolithic, products with reusable network functions which closely interact and interwork to deliver a distinct service. It’s the essence of extending DevOps philosophies beyond the supplier and onto the area of carrier network operations. The precise technical differentiators are nuanced and oft debated but the fundamental requirement of an SOA is an enterprise/organizational-level broker and orchestrator, an element which is not required when developing a microservices-based offering. To understand why this is significant, we must unpack the fundamental issues around the decoupling of previously integrated service elements.

Built on a single proprietary hardware device using one lump of code, life was certainly simpler with legacy systems. Running in a common environment, moving data between subsystems required only an in-memory read/write. Even if services comprised multiple such components, they were easily connected with hard wiring or fixed addresses. Using standardized protocols aided interop but additional physical middleboxes are often required to append network services or provide topology awareness. Even products employing Network Functions Virtualization (NFV) remained as one lump of code in a single Virtual Machine (VM) with static connections.

Simplicity comes – both literally and figuratively - at a price and operators have sacrificed much, in terms of deployment flexibility and distinctiveness of their service offerings. From stranded overcapacity to wasteful redundancy, the overheads associated with a monolithic modus operandi break every modern business model. Not only that, we must deal with long waterfall release mentalities from vendors, as internal factions battle for fixes and features, or (worse) synchronizing simultaneous upgrades of multiple suppliers. Modern cloud native design and DevOps principles solve these dilemmas.

For SOAs to achieve equivalence, however, we must add elements which automate the interconnection of stand-alone network functions. We do this by implementing a service mesh.

The concept of a service mesh falls under the (albeit extremely broad) software defined networking (SDN) umbrella. In a cloud environment, where otherwise distinct network functions may be indiscriminately instantiated almost anywhere in a network infrastructure, a service mesh decouples the issue of network connectivity from an individual application. This is a fundamental requirement when deploying a service comprising distributed elements dependent on topological continuity and awareness. Again - to be clear - this is a problem that must also be solved at the micro-level (as in microservices), but those do not mandate an enterprise-wide solution. Plus, microservices are still often controlled by one supervising entity, ensuring interoperability.  

So, we have a myriad of network functions systematically spinning-up then scaling up or down to support the ebbs and flows of traffic loads. The first problem we must therefore solve is one of discovery. Specifically, how does a single instance of a network function announce its availability to initiate an east-west connection in order to participate in the delivery of a larger application or service.

This is the sort of problem network architects currently mitigate by employing a Load Balancer (LB), a device we might typically find on north-south interfaces between clients and servers, rather than on east-west, intra-datacenter, connections. Fronting a network function domain with a Load Balancer will indeed alleviate the problem of discovery: We would assign the static IP address of the front-end LB, supporting one network function, into the configuration of all others, thereby allowing the source application to remain blissfully unaware of the state, availability and individual IP address of the resources it employs.

SOA-Load-Balancer

A poor solution to the problem of discovery and traffic routing in SOA

Aside from the fact we would need to manually configure that IP address into an architecture that is supposed to be dynamic in nature, this might look like an appropriate solution at first glance. However, our example here has only four network functions where - in reality - many more would be required to deliver a 5G (or any other) service. The number of manually deployed and provisioned Load Balancers required to support east-west traffic flows quickly gets out of hand. What might be also immediately obvious is the fact that we have introduced a pretty significant single point of failure, into our solution. In essence, it doesn’t matter how many instances of a network function (NF) are up and running, if we lose that Load Balancer, we’ve lost the entire service. Lastly, we can’t ignore the additional latency inflicted by that extra bi-directional hop. This might be negligible for today’s web applications but 5G was born, in no small part, with the promise of low latency in support of massive IoT Sensor grids so even the smallest amount of indiscriminate delay is problematic. Instead of a Load Balancer, a Service Oriented Architecture depends on a central discovery and registration. In the GSMA’s 5G specifications, this was defined as the Network Function (NF) Repository Function (NRF).

sba-with-nrf-a

The NRF meets the requirement of our macro SOA controller in the 5G SBA

As you might expect, the 3GPP adopts SOA nomenclature, when talking about the relationship between elements in their distributed system. That means I’m going to start referring to NF’s as either consumers or producers. Typically, a producer is a component that sends traffic to multiple consumers (think Database) but consumers can be - or become - producers and vice versa. Don’t say I didn’t warn you!  At a high level, the NRF allows producers (NF-C, in the example above) to automatically advertise their availability to consumers (NF-A), allowing the direct exchange of data between the two. Conversely, if a producer fails, the consumer can discover that state change and traffic can be directed elsewhere. This does put the emphasis on the consumer to perform load balancing across multiple instances of a producer but a simple round robin would suffice.

The NRF, however, operates at the macro-level of our SBA. Thinking and acting in terms of a global control plane architecture, comprising logically large NFs, the NRF still fails to adequately address the issues of granular scalability at a meso-level. Not only must the SBA support a service mesh comprising multiple instances of a NF within one of many individual data centers, it must consider the possibility that an individual NF is, itself, comprising distinct components.

Appearing in the second specification iteration of 5G, or release 16 in 3GPP parlance, the Service Communication Proxy (SCP) was defined within TS23.501 - System architecture for the 5G System. It’s the sort of function that is born of necessity. The SCP is not required to make a 5G SBA work but is required to make it work in a highly distributed multi-access edge compute cloud environment. The Service Communication Proxy provides a single point of entry for a cluster of network functions, once they have been successfully discovered by the NRF. This allows the SCP to become the delegated discovery point in a data center, offloading the NRF from the numerous distributed services meshes that would ultimately make-up a network operator’s infrastructure.

full-service-mesh-with-scp-a

A service mesh implementation employing the Service Communication Proxy (SCP)

Not only does the SCP perform delegated discovery in the form of an internal service registrar and controller, it implements an individual Service Agent for each NF, allowing for indirect communications between each 5G core component in the SBA. Many service mesh implementations refer to the Service Agent as a sidecar and it can serve a multitude of purposes and solve numerous problems. Yes – the name derived from the motorcycle sidecar, in that each is attached to one NF and each NF has its own.  The Service Agent performs critical tasks that are peripheral to the primary role the NF was designed to perform. This includes interworking, service segmentation, service-centric access control and load balancing. This network abstraction layer allows application developers to be infrastructure-agnostic, ignoring the complexities of the underlying service mesh communications methodologies and topologies while retaining a close relationship with an agent that literally lives and dies with it.

Supported by the Service Mesh Controller, the Service Agent can implement global access control lists that prevent unauthorized communications between network functions. This is like a firewall except with far simpler and globally significant rulesets. Rather than individual IP blocklists or allowlists, for example, the Service Agent can implement a rule that simply says “NF-A <-> NF-C” and is applicable regardless of the IP network address ranges those functions employ. Outside the prevention of unauthorized communications, the Service Agent can also implement cryptographic verification using techniques such as mutual transport layer security (mTLS). Once again, the operator can maintain complete control over the distribution of keys and certificates with the application developer completely detached from the process.

The SCP can route messages within and even between multiple intra-data center service mesh boundaries and can route messages between data centers with information provided by the NRF. The 3GPP has also defined a deployment option where the Service Agent is augmented by a Service Router, which performs name-based routing based on the longest suffix of fully qualified domain names (FQDN). The Name-Based Routing Protocol (NBRP) employs a path-based distance-vector algorithm not unlike BGP and is therefore an ideal candidate for the Path Computation Element (PCE). Without a doubt my favorite internetworking aid, you read all about my unhealthy obsession with the PCE within the appropriately titled “Pixie Dust and Unicorn Stuff - The Magic Behind SDN” and “Ooh! Ooh! Ooh! BGP, MPLS, Segment Routing and PCE!” blog posts, to name just a couple.

But complete Service Mesh implementations do more than registration, discovery, interworking and routing. They are also an integral part of an automation strategy, eliminating per-NF configuration scripts with a centralized configuration file that can be called-on by all similar functions. This dramatically reduces the potential for errors and makes system-wide updates and upgrades much easier.

While it’s easy to get caught-up in the hype, those genuinely working towards 5G standards are not only building bigger pipes, they are revolutionizing the way mobile and fixed-line networks will be architected and installed. Before you say it – yes – I know this is the second (or third) revolution in less than a decade, for an industry that previously only revolutionized itself every quarter century, or so. It’s clear now that those were merely stepping stones along the way. We innovate, fail fast, learn and move on. 5G is more than a marketing label to be indiscriminately banded around or unsystematically slapped on stand-alone products and devices. The definition of the SCP is more proof that 5G is an entirely new way of thinking about developing and deploying complete end-to-end infrastructures. If all that sounds like the exact same insincere sycophancy I condoned in my opening paragraph, may I also direct you back to my original retort.