Route Telemetry for IBCN and Beyond
To support the insatiable desire for immediate access to real-time information, we must not only dramatically increase the amount of distributed switching and routing capacity; we must also rethink how the core switching and routing elements themselves are designed and delivered. While these devices have grown in the last few decades, little has changed in the way they have been fundamentally built. The resulting architecture must be more dynamic and resilient than ever before but without increasing costs and complexity -- things that are, typically, completely at odds with each other. From data center fabrics to carrier access, interconnect and core infrastructures, supporting continuous access to more data and the distributed nature of its storage are driving the need to adopt switching and routing elements that are built on Composable Networking (CN) constructs.
The concept of composability is nothing new: Modern server/OS/application deployments are built on this methodology, reducing capital and operational expenditures, nurturing competitive environments while increasing the velocity of innovation in the compute arena. Fundamentally, modern compute architectures could not be where they are today without it. In contrast, switching and routing infrastructures are lagging a decade or more behind, with commercial offerings that still predominantly feature tightly coupled hardware, operating systems and control plane applications. If all that sounds like a blatant marketing plea for embracing composable networking, then I’ve made the right career choice.
While mutually exclusive, intent-based networking (IBN) complements composable networking and vice versa. As happened in the early days of network functions virtualization (NFV) and software defined networking (SDN), they are on almost identical industry trajectories so it’s worth combining the two principles: intent-based composable networking (IBCN). To better explain this, I have spent countless hours developing the following infographic:
IBCN infographic. You’re welcome.
Intent-based networking demands an unprecedented amount of real-time analytics data in order to guarantee that new configurations, when pushed into the switching and routing devices, are able to ensure operator intents can be met across all network states, including during link, node or site outages. Telemetry can also provide a closed loop within this system, feeding network operation and performance information related to the routing protocol (namely the border gateway protocol, or BGP) back into the IBN algorithms, enabling it to preempt problems or fine-tune provisioning information. This can also help secure infrastructure by anticipating vulnerabilities in the network, preventing zero-day exploit attacks.
Outside facilitating core IBN objectives like policy and SLA compliance, through load balancing and traffic engineering, detailed routing telemetry information can also be employed (indirectly) for root cause analysis of network failures. However, conventional techniques for garnering analytics information are not enough to help isolate and identify issues with the speed and granularity we need. While the simple network management protocol (SNMP) is the stalwart of network monitoring -- with an incredible lineage harking back thirty years to 1988 (RFC 1067)1 -- it’s no match for the demands of today’s highly distributed and dynamic infrastructures. As a poll-based protocol implementation, SNMP has a high processing overhead and has a low data rate. The same goes for proprietary (custom) alternatives built around CLI scraping, which can again only be low-frequency, or else the network devices themselves are quickly overrun with requests, causing their packet processing performance to dramatically degrade.
Because of this, the level of granularity achieved using polling techniques is nowhere near enough to feed IBN engines or deliver sufficient data to predict or analyze failures proactively. On the other hand, passive devices, such as probes, require a significant amount of additional resources and management, in the form of dedicated deployments of additional hardware or software instances. They also lead to the generation of way too much irrelevant or redundant data, which must then be handled by the network itself, then sorted out by collectors or the analytics application.
In the case of BGP, there are route server options that peer with every other network device solely for the purpose of receiving BGP prefix information. These are often front-ended by a Web server that acts as a Looking Glass (LG) for external users and other Internet service providers to request reachability information from inside an infrastructure they would not normally have access to. While an appropriate starting point for Internet service providers (ISPs) to troubleshoot connectivity problems, route and LG servers can only ultimately provide limited information about the operation of a BGP infrastructure, as they only have a subset of the potential data. By its nature, the BGP algorithm is limited to advertising the best path to its neighbors, so that’s the only data a route or LG server would have or can expose or utilize.
First proposed (as an IETF draft) in 2008 and finally ratified (eight years later) within RFC 7854 in 2016, the BGP Monitoring Protocol (BMP) mitigates the issues with current route monitoring techniques, providing not only a view into all the updates an individual router is receiving but also exporting the entire unaltered contents of its routing information base (RIB). By seeing the exact information the BGP algorithm used to make its best path decisions, BMP provides external applications the most comprehensive insight possible into the real-time (and historical) operation of a network. BMP implementations comprise an agent client, which resides in the routing component and a collector server. The Streaming Network Analytics System (SNAS) project2 (formally OpenBMP) is an open source BNP collector that people generally use to kick BMP’s tires.
If it’s not already glaringly obvious, BMP proactively streams BGP details to the collector, rather than relying on a server to poll the devices. Applications requiring this level of granular routing information can then subscribe to the data, which is normalized by the collector and stored using a logical structure that is easy to consume. In the case of SNAS, Kafka3 is employed as the message-oriented middleware (MOM) between the collector subsystem (the producer) and the database (the consumer). This is the same MOM employed in PNDA,4 the big data analytics platform also being driven by the Linux Foundation. SNAS is an optional plugin in that project, which is being touted as the mother of all monitoring solutions.
While BMP is connection-oriented in that it operates over TCP, it’s only at the transport layer (4) and no BMP messages are ever sent from the collector to the agent in the monitored routers. There are 7 messages defined within the BMP standard:
A fancy table of the BMP message types.
A BMP agent (the router) can stream BMP messages to one or more collector. Once the TCP session is established, a BMP Initiation message is sent, followed by a Peer Up Notification for each of the monitored BGP adjacencies. Route monitoring messages are then used to dump the contents of the Adj-RIB, which are then followed by incremental updates using the same message type.
So, our increasing number of routing devices can now relentlessly spew efficiently stream granular routing data to a multitude of monitoring and analytics platforms. In the future, these would include the real-time verification and validation engines running within intent-based networking servers, but they can be put to effective use right now, employing Grafana to visualize the data or building your own simple (or not-so-simple) front end.
BGP analytics from BMP: A home-grown Web UI and one using Grafana, courtesy of snas.io.
The Metaswitch Composable Network Protocol (CNP) suite -- already an integral part of emerging intent-based networking propositions -- includes BMP for streaming route telemetry information. You can see it in action in this short, introductory, video or learn more about our entire portfolio of CNP’s here.
Simon is the Director of Technical Marketing and a man of few words.