Parameterized and Promised Streaming: Making Big Data More Accessible
Big networks are constantly growing in both size and relevance: from social networks such as Google+, Facebook, and Twitter, to brain networks, gene regulatory networks, and health/disease networks. The traditional approach to analyzing such big datasets is to use powerful supercomputers (clusters), ideally large enough to store the data in main memory. The downsides to this approach are that many potential users of big data lack such powerful computational resources (e.g. point-of-sale Bitcoin blockchain analysis), and it can be difficult to solve unexpected problems within such a large infrastructure (e.g. image analysis after the Boston Marathon Bombing). Our main goal here is to enable the processing of huge datasets on computational devices with a limited amount of fast memory, connected to a relatively slow external data source. In particular in this talk, we introduce two new approaches of parameterized and promised streaming for optimization (often for NP-hard) problems over dynamic graph streams in which edges can be both inserted and deleted. We emphasize that indeed the approaches that we introduce in this talk have relevance in Mapreduce and other distributed models and can be easily implemented in a variety of popular traditional models as well.