PhD Proposal: Rearchitecting cloud telemetry through approximation-first techniques
Modern cloud-native infrastructures and applications generate enormous volumes of telemetry data that are critical for performance monitoring, fault diagnosis, and resource optimization. However, as these systems scale to millions of components, such as containers and microservices, the cost of collecting, storing, and analyzing high-volume telemetry data becomes prohibitively high, posing significant challenges to operational costs, scalability, and real-time responsiveness of cloud telemetry systems.
This proposal explores an approximation-first approach to rethinking telemetry architecture, where relaxing exact accuracy enables significant gains in efficiency and responsiveness without compromising practical use. Building on this principle, my proposed research focuses on three aspects. First, I will describe PromSketch, an approximate intermediate query caching system for reducing time series query latency and cloud billing costs. Second, I will discuss how to achieve an elastic and scalable key-value storage system by leveraging approximate membership indexing within network switches with NetMigrate. Finally, I will propose a new cloud telemetry architecture, focusing on reducing telemetry data ingestion costs, leveraging on-device approximation.