The next best action: Is automatic problem resolution and network performance repair really possible?

Ron Groulx, Sales Engineer, discusses the theory and reality of self-diagnosing and self-healing networks. Network performance monitoring has come a long way.

The next best action: Is automatic problem resolution and network performance repair really possible?By Ron Groulx    18 November 2016      Thinking

Imagine that you could deliver a brand new application to the business over a complex network without worrying about how well it is performing, or what impact it might have on existing apps that share the same network infrastructure. We’ve all heard of the concept of ‘self-healing networks’: that with today’s technology could easily manage to recover from a complete disconnect or outage by architecting intelligent device/link redundancies, as an example. But from a user’s perspective of ‘network performance’, it is entirely different and much more complicated, to say the least.

So back to the question: is it really possible for complex networks to automatically resolve performance problems and repair themselves? The key to making this possible is by clearly understanding how problems are identified on the network itself using the data on the wire. The challenge with today’s technology is there exists several different approaches to network performance problem isolation, each with their own unique visibility perspectives and disadvantages.

In the early days of computer networking, performance visibility was often obtained by collecting/polling traffic counters (MIBs) on network devices. This resulted in the development of a standard protocol, SNMP, which is still widely in use today, predominantly for infrastructure level (L1/L2) problem isolation. The disadvantage of SNMP is the lack of understanding of problems at the higher layers within the data payload (L3-L7). Most often network engineers could isolate a faulty interface or cable, or heavy utilized links and even basic network congestion. However, isolating network services and applications would require different tools to look deeper into the datagram.

The next step in the evolution began with Cisco’s Netflow technology. By the mid 2000s, it became a an IETF standard named IPFIX. The idea was to “push” flow information from network devices to a collector which originally consisted of the so-called 5-tuple:

  • Source IP address
  • Destination IP address
  • Source port number
  • Destination port number
  • Protocol

This allowed better visibility than SNMP, in that you could now see L3/L4 usage statistics. However, several challenges emerged when trying to isolate performance issues from the application layers (L4/L7). Even when additional extensions are added to provide more visibility into the applications (Flexible NetFlow) the challenge becomes intensified when data volumes are pushed over the wire and dramatically increase the size of the collector(s). It can be like drinking from a fire hose in large Enterprise networks, and ultimately becomes very expensive and resource-intensive.

At this point, there remains three effective approaches to obtaining visibility in the network application layers:

  1. Application software Agents
  2. Simulated Data
  3. Wire Data (a.k.a. Packet Analysis)

Because it is important to understand the true nature of the network performance from the perspective of the End-user’s experience, gathering data from software agents installed on end-points can help rule out issues on the devices or servers involved in the transactions. From this perspective alone, however, there is little to no granularity on issues that originate in network itself, and furthermore, no way to find root cause end-to-end.

Simulated data is great for provisioning measurements of capacity and planning performance within a set of transactions run at a given time. But for a true picture of network performance, we need to base our view on the actual transactions sent by the End-users for the entire duration, and have the ability to isolate issues caused by the network or the application.

This is where Wire Data fits in to complete the overall picture. By capturing a copy (passively using SPAN/TAPS) of packets on the wire, full visibility of network application performance can be obtained, provided the underlying solution is positioned properly to do so. Unlike the aforementioned approaches, wire data is not limited to a sub-set of information and has a full L2-L7 visibility. Legacy packet analysis solutions (a.k.a sniffers) underwent several challenges in the early days that limited their usefulness only to the most skilled network gurus, however, advancements in technology have made use of application based analytics to make wire data easily available to the business. High costs for data storage caused by increases in bandwidth and “capture-all” techniques have been replaced by tactical, analytic approaches to significantly reduce the cost of wire data solutions.

So again: is it really possible for complex networks to automatically resolve performance problems and repair themselves? We can still only say: maybe. In the end, with the advancement of wire data and complementary application (end-point) analytics working together in an ecosystem targeted to business level objectives, we are indeed step closer to the realization of automatic problem resolution and network performance repair.

The next best action: Is automatic problem resolution and network performance repair really possible?

Ron Groulx, Sales Engineer, Canada, Corvil
Corvil safeguards business in a machine world. We see a future where all businesses trust digital machines to algorithmically conduct transactions on their behalf. For some businesses, this future is now.
@corvilinc

You might also be interested in...