5G Core Automation with Kubernetes

Attaining the capital and operational cost reductions required to make the deployment of 5G networks viable necessitates a high degree of operational automation. This can only be achieved if network functions have been architected from the ground up for deployment within managed containers inside cloud infrastructures. Considering the granular, distributed, scalability requirements of 5G elements and the high degree of reliability demanded of operator services, this is a significant undertaking.

5g-core-automation-kubernetes-blog

Scaling and resiliency of container-centric web services is nothing new and the mechanisms used by web-scale service providers can be employed to support non-real-time 5G components in the Service Based Architecture. However, this is only if a vendor makes the tough decision to write their code completely from scratch using microservice methodologies and frameworks. Even then, extending such horizontal scaling and resiliency techniques to real-time packet processing platforms, such as a user plane or access gateway function (UPF/AGF), is considerably more complicated.

Solving the inherent issues requires significant innovations in the implementation of sharded graph databases for maintaining PDU session state. This is on top of the advancements in data plane packet processing needed to meet the aggressive price/performance requirements of most 5G business models. Only after these problems have been solved, can we then overlay an automation and orchestration strategy.

Metaswitch Fusion Core leverages the Kubernetes container orchestration system for automating the lifecycle of its microservices. The Kubernetes Horizontal Pod Autoscaler affords the ability to spin-up new container instances based on any number of metrics. The system is implemented as a closed control loop, with all applicable elements ensuring the correct number of hosted microservices are instantiated and maintained to the required level. Together with the Kubernetes Metrics Server, a cluster-wide aggregator of resource usage data, we employ the Prometheus monitoring system to collate service-specific data from UPF instances. With this information, we can dynamically scale up, scale down and protect our 5G core against failures.

We demonstrate all of that in this short video of a set-up on which we have configured a small, 120 session, scaling threshold. Starting with a single UPF and one hundred sessions, our traffic generator quickly exceeds the 120-session count. As it does, a new UPF is instantiated automatically.

5G-Automation-short-no-bumper

Naturally, a UPF instance will also quiesce and spin down as load decreases. Before a process is completely killed, however, there is a pre-determined timeframe in which the systems waits to ensure load does not quickly increase again. Much like adding and removing forwarding table entries in a router, this prevents flapping. In the video, we continue to increase the number of sessions to 300. Once we hit that, there are a total of three UPF instances with sessions evenly load balanced across them.

Using the Helm package manager, we then kill one of the processes. The failure is detected and in under three seconds, Kubernetes has deployed a new UPF and traffic is back to flowing as before.

When looking towards 5G for supporting private mobile networks, fixed wireless access or enhanced mobile broadband, Metaswitch Fusion Core is the faster way forward. To learn more, visit out Fusion Core product page.