NetworkMD: Topology Inference and Failure Diagnosis in the Last Mile

, , and
ACM Internet Measurement Conference (IMC)
San Diego, CA,
Abstract. Health monitoring, automated failure localization and diagnosis have all become critical to service providers of large distribution networks (e.g., digital cable and fiber-to-thehome), due to the increases in scale and complexity of their offered services. Existing automated failure diagnosis solutions typically assume complete knowledge of network topology, which in practice is rarely available. The solution presented in this paper—Network Management and Diagnosis (NetworkMD)—is an automated failure diagnosis system that can infer failure groups based on historical failure data, and optionally geographical information. The inferred failure groups mirror missing topologies, and can be used to localize failures, diagnose root causes of problems, and detect misconfiguration in known topologies. NetworkMD uses an unsupervised learning algorithm based on non-negative matrix factorization (NMF) to infer failure groups. Using cable network as the primary example, we demonstrate the effectiveness of NetworkMD in both simulated settings and real environment using data collected from a commercial network serving hundreds of thousands of customers via thousands of intermediate network devices
author = {Yun and Mao and Hani and Jamjoom and Shu and Tao and Jonathan and Smith},
title = {{NetworkMD: Topology Inference and Failure Diagnosis in the Last Mile}},
booktitle = {ACM Internet Measurement Conference (IMC)},
address = {San Diego, CA},
month = {October},
year = {2007}