Your Network Observability Platform Sees Everything, But Learns From Nobody Else
The article discusses the limitations of current network observability platforms like ThousandEyes and Kentik, which provide excellent visibility but lack the ability to synthesize resolution intelligence across enterprises experiencing the same infrastructure event.
Why it matters
Improving the ability to synthesize and share resolution intelligence across enterprises facing the same network infrastructure issues could significantly reduce MTTR and improve overall network reliability.
Key Points
- 1Current network observability platforms can detect and troubleshoot infrastructure issues, but each enterprise has to resolve the same event from scratch
- 2Network infrastructure is inherently shared, so when something fails, the issue affects multiple enterprises simultaneously
- 3Existing approaches like ISP postmortems, community forums, and federated learning fail to address the need for real-time, scalable synthesis of resolution intelligence
Details
The article explains that while tools like ThousandEyes and Kentik provide exceptional visibility into network performance and issues, they do not address the fundamental problem of shared infrastructure events. When a network degradation occurs, such as a BGP route leak or CDN PoP failure, multiple enterprises running these observability platforms will detect the same issue and go through the same troubleshooting process to resolve it. However, the resolution intelligence gained by one enterprise is not shared with others facing the same problem, leading to longer mean time to resolution (MTTR) compared to enterprises that have previously encountered and solved the same event. The article explores why existing approaches like ISP postmortems, community forums, and federated learning fail to address this 'shared event problem' in a scalable, real-time manner.
No comments yet
Be the first to comment