Alireza Farhadi
11.18.2021 12:00 to 14:00

IRB 5137

With the rise of big data, there is a growing need to solve optimization tasks on massive datasets. Running optimization tasks on large-scale datasets can be quite challenging as classical algorithms often have unrealistic assumptions about the data. For example, classical algorithms often assume that the computing machines have enough memory to store all the input. However, in the era of big data, we often face massive datasets that are too large to fit into the computer’s memory.In this thesis, we study fundamental problems in computer science such as maximum matching, sorting, edit distance, longest common subsequence. We consider two popular models to deal with massive inputs. The first model is the streaming model in which the input is represented as a sequence of items and the algorithm is only allowed to store a sub-linear portion of the input. In this thesis, we present single-pass approximation algorithms for the maximum matching, longest common subsequence, and edit distance in the streaming model.The other model that we consider is the external memory model. In this model, the input is stored on external memory (such as cloud storage, hard drive, etc.), and the size of the input can be significantly larger than the main memory of the computing machines. In this thesis, we present a tight conditional lower bound on the complexity of external memory sorting of integers.Examining Committee:

Chair:Dean's Representative:Members:

Dr. MohammadTaghi Hajiaghayi Dr. Lawrence A. AusubelDr. John DickersonDr. David Mount Dr. Christos PapadimitriouDr. Aviad RubinsteinDr. Elaine Shi