Storage-Compression-Querying Tradeoffs in Dataset Versioning

Talk

Amol Deshpande

Talk Series:

Dept. Colloquium

Time:

10.08.2021 11:00 to 12:00

Location:

IRB 0318

URL:

https://talks.cs.umd.edu/talks/2944

Also on Zoom - https://umd.zoom.us/j/96718034173?pwd=clNJRks5SzNUcGVxYmxkcVJGNDB4dz09 Our ability to collect data continues to grow at an exponential rate; combine this with the abundance of local compute and storage capacities, increasingly decentralized teams of data analysts, and the almost-innate fear of ever deleting anything, and the result is a proliferation of many thousands or millions of versions of almost-similar datasets in most enterprises. This not only leads to increased storage and network costs, but also quickly grows unmanageable due to the difficulty in maintaining sufficient context like dataset provenance. Data compression is typically not sufficient by itself to address these challenges, in part because we often need to retrieve or query specific datasets or portions thereof, and in part because the data is usually stored in distributed cloud-based (semi-)structured data management systems. In this talk, I will discuss our work over the last decade on systematically understanding the storage/retrieval/query tradeoffs in this context, and describe how different use cases, computing environments, and data types lead to different solutions. I will also discuss how we can enable new types of introspective analyses of data evolution and data processing pipelines, and future research directions.

Upcoming Events

Event

04.26.2024 12:00 to 13:30

IRB-4105

Computer Science APT Meeting

Event

04.26.2024 13:00 to 14:00

IRB-5105

Computer Science Instructional Faculty Meeting

Talk

04.26.2024 13:30 to 15:00

ATL 3100A

PhD Proposal: Towards the Verification of Quantum Networks
Yusuf Alnawakhtha

Event

04.26.2024 15:00 to 16:30

IRB-0318

Computer Science Education Committee Meeting

Talk

04.29.2024 11:30 to 12:30

IRB 4107

PhD Proposal: Multi-Agent Autonomous Decision Making in Artificial Intelligence
Saptarashmi Bandyopadhyay

Talk

04.29.2024 15:00 to 16:00

IRB 5105

PhD Proposal: Scaling Policy Gradient Methods to Open-Ended Domains
Ryan Sullivan

Talk

04.30.2024 10:00 to 12:00

IRB 4105

AI Empowered Music Education
Snehesh Shrestha

Talk

04.30.2024 12:30 to 15:00

IRB 4107

Towards Trustworthy Models in Machine Learning
Xiaoyu Liu

Talk

05.01.2024 15:00 to 17:00

IRB IRB-4105

PhD Defense: Feedback for Vision
Michael Maynord

Talk

05.02.2024 12:30 to 14:00

IRB 4107

Towards AI Alignment: Advancing Fairness, Reliability, and Human-Like Perception in AI
Bang An