Rethinking Cloud-Scale Telemetry from an Approximation-First Perspective

Talk
Alan Zaoxing Liu
Time: 
11.07.2025 11:00 to 12:00

Telemetry systems are widely used to collect data from distributed endpoints, analyze data in conjunction to gain valuable insights, and store data for historical analytics. With increasing volumes of data to be collected and the increasing needs for real-time analytics, such as security detection and performance analysis, telemetry costs are rising across the stack. Thus, simply collecting all data, transmitting it for analysis, and storing it exactly has become prohibitively expensive. Instead of existing solutions that leverage exact telemetry or leverage myopic solutions in isolation, we take a holistic bird’s eye view of the telemetry stack and considers approximation primitives like sketches as first-class primitives. We will demonstrate early results on how this paradigm unlock orders of magnitude reduction in cost (100x) without significant deployment effort in large-scale cloud networks.