## List of Projects## TITLE: Classical and Quantum Error Correction
Error correction is what ensures that the audio in your phone calls remains sharp, your hard drives do not deteriorate too quickly, and signals can be reliably transmitted to remote satellites. Over multiple decades, and with the explosion of the information age, an enormous variety of error-correction schemes were developed. Recently, a radically new type of error correction was introduced, one that can protect the quantum information that is stored in a quantum computer or that is communicated over a quantum network. We created the Error Correction Zoo (see here) to categorize and to organize known classical and quantum error-correction schemes. It is inspired by other similar zoos, including the complexity zoo, the quantum algorithm zoo, the quantum protocol zoo, and the qubit zoo. In this summer project, students will learn both classical and quantum error correction and contribute entries to the zoo's repository. The project can be scaled. On the easy end, there are tables and other readymade info to enter. On the hard end, the students can learn and write about advanced topics such as topological codes (see topological) and approximate quantum error correction (see approximate). ## TITLE: Improving Machine Translation for Wikipedia
Machine Translation (MT) uses deep learning to automatically convert text from one language to another, and offers the promise for people to seamlessly access knowledge in any language. However, current MT systems produce translations that are often worse than human translations. To bridge this gap, post-editing, i.e., manual review and correction of machine generated translations, is often used to combine the speed of MT, and the skills and knowledge of bilingual humans. This process is used in Wikipedia: as of September 2022 more than a million of Wikipedia pages have been produced using initial machine translations that are reviewed, edited, and improved by online editors with the support of the Content Translation tool (see here and here). The goal of this project is to use translations of Wikipedia pages as a test bed for understanding and evaluating the use of popular Machine Translation Services (e.g., Google Translate, Meta AI) in a real word translation context. We will develop data processing pipelines that quantify and characterize the editing patterns across different languages and translation services. We will then develop machine learning models that automatically estimate how good a machine translation is and potentially methods that can automatically correct errors in them. ## TITLE: Parallel Algorithms for High-Dimensional Clustering
A clustering algorithm takes a dataset (e.g., points, or nodes in a graph) and groups similar objects together into clusters. This has many applications, e.g., clustering image or text embeddings, which can be used to recommend similar images or find related documents. Existing clustering algorithms for high-dimensional point data often do not scale to very large datasets or produce low-quality results, and thus have poor utility in practice. In this summer project, students will learn to design and implement new parallel algorithms for clustering and gain experience with algorithm engineering. On the easy end, students will gain experience evaluating existing parallel clustering implementations on large datasets and benchmarking parallel programs. On the hard end, students will learn about and implement a variety of state-of-the-art clustering methods. We will evaluate the methods on both small and easily visualizable datasets such as embeddings of OpenImages (see here), as well as large billion-scale pointsets like those in the BigANN benchmark (see here). ## TITLE: Induced Graph Ramsey Theory
The first theorem of Ramsey Theory is the following: For all 2-colorings of the edges of K_6 (the complete graph on 6 vertices) there exists a monochromatic K_3. More genereally, for all m there exists n such that for all 2-colorings of the edges of K_n there is a monochromatic K_m. Lets mix this up! What if you don't use K_n but use some graph G? What if you want (say) mono P_m (the path on m vertices) rather than K_m. (You would want the mono P_m as an induced graph so there are m vertices such that the graph restricted to those vertices is P_m, not K_m or anything else). The induced Graph Ramsey theorem: For all graphs H there is a graph G such that for all 2-colorings of the edges of G there is an INDUCED mono H. The proof of this is rather difficult. There are many directions to go with this. a) Can we get easier proofs for particular H? b) Can we look at the asymetric case: you want either an induced RED H_1 or and induced BLUE H_2 c) Can we get upper and lower bounds on the size of G? d) What if we restrict type of G (Folkman's Theorem) For the project we will learn lots of Ramsey Theory and tackle these questions. For background on Ramsey theory see my course slides: here For some papers on the induced Ramsey theory see: here ## TITLE: Fair Decision Making
Machine learning systems have become prominent in many applications in everyday life, such as healthcare, finance, hiring, and education. These systems are intended to improve upon human decision-making by finding patterns in massive amounts of data, beyond what can be intuited by humans. However, it has been demonstrated that these systems learn and propagate similar biases present in human decision-making. This project aims to develop general theory and techniques on fairness in AI. This project is envisioned to go beyond designing “fair classifiers” that satisfy a static fairness notion in a single moment in time, and designs AI systems that make decisions over a period of time with the goal of ensuring overall long-term fair outcomes at the completion of a process. The use of data-driven AI solutions can allow the detection of patterns missed by humans, to empower targeted intervention and fair resource allocation over the course of an extended period of time. The research from this project will contribute to reducing bias in fair decision-making in general applications of machine learning. ## TITLE: Exploring the Hilbert Geometry
Normally we think of the distance between two points as the Euclidean distance. But there are other metrics that have been useful in fields such as Genetics, Probability, and Physics (see here). In this project we will be examining the Hilbert Metric from the perspective of computational geometry. We will explore questions involving this metric, such as how to construct Delaunay Triangulations and other fundamental structures. The project will include writing a software package to help us visualize these structures. In this project you will learn computational geometry and combine computational geometry with programming. ## TITLE: Using Markov Decision Processes to Mitigate Climate Risk
In India, the majority of farmers are classified as small or marginal, and their livelihoods are vulnerable to climate risk. Giving farmers help in planning crops well (e.g., delaying planting by a couple of days) can make a huge difference to their expected income. We aim to build an optimization-based decision support tool that provides crop planning advice to farmers. We will develop a multi-agent Markov decision process approach for this low-resource sector, and discuss key evaluation metrics for the design and implementation phases of our project. This may involve the topics specified in the prerequisites. This work is in conjunction with Kheyti, a nonprofit in India. |