Combinatorics and Algorithms for Real Problems

List of Projects

TITLE: Classical and Quantum Error Correction

Victor Albert

Phillipe Faist

Prerequisites Linear Algebra, Quantum Mechanics, Quantum Computing, Quantum Error Correction (Two of the three Quantum Prerequisites may suffice)

Description

Error correction is what ensures that the audio in your phone calls remains sharp, your hard drives do not deteriorate too quickly, and signals can be reliably transmitted to remote satellites.

Over multiple decades, and with the explosion of the information age, an enormous variety of error-correction schemes were developed. Recently, a radically new type of error correction was introduced, one that can protect the quantum information that is stored in a quantum computer or that is communicated over a quantum network.

We created the Error Correction Zoo (see here) to categorize and to organize known classical and quantum error-correction schemes. It is inspired by other similar zoos, including the complexity zoo, the quantum algorithm zoo, the quantum protocol zoo, and the qubit zoo.

In this summer project, students will learn both classical and quantum error correction and contribute entries to the zoo's repository. The project can be scaled. On the easy end, there are tables and other readymade info to enter. On the hard end, the students can learn and write about advanced topics such as topological codes (see topological) and approximate quantum error correction (see approximate).

TITLE: Improving Machine Translation for Wikipedia

Prerequisite Competence in python, some knowledge of deep learning. Programming skills in Pytorch preferred.

Marine Carpuat

Eleftheria Briakou

Description

Machine Translation (MT) uses deep learning to automatically convert text from one language to another, and offers the promise for people to seamlessly access knowledge in any language. However, current MT systems produce translations that are often worse than human translations. To bridge this gap, post-editing, i.e., manual review and correction of machine generated translations, is often used to combine the speed of MT, and the skills and knowledge of bilingual humans. This process is used in Wikipedia: as of September 2022 more than a million of Wikipedia pages have been produced using initial machine translations that are reviewed, edited, and improved by online editors with the support of the Content Translation tool (see here and here).

The goal of this project is to use translations of Wikipedia pages as a test bed for understanding and evaluating the use of popular Machine Translation Services (e.g., Google Translate, Meta AI) in a real word translation context. We will develop data processing pipelines that quantify and characterize the editing patterns across different languages and translation services. We will then develop machine learning models that automatically estimate how good a machine translation is and potentially methods that can automatically correct errors in them.

TITLE: Parallel Algorithms for High-Dimensional Clustering

Laxman Dhulipala

Prerequisites Algorithms, experience with systems programming e.g., in C/C++, and interest in writing fast and highly scalable code on multicore CPUs (No need to have taken a parallel algorithms course)

Description

A clustering algorithm takes a dataset (e.g., points, or nodes in a graph) and groups similar objects together into clusters. This has many applications, e.g., clustering image or text embeddings, which can be used to recommend similar images or find related documents. Existing clustering algorithms for high-dimensional point data often do not scale to very large datasets or produce low-quality results, and thus have poor utility in practice. In this summer project, students will learn to design and implement new parallel algorithms for clustering and gain experience with algorithm engineering.

On the easy end, students will gain experience evaluating existing parallel clustering implementations on large datasets and benchmarking parallel programs. On the hard end, students will learn about and implement a variety of state-of-the-art clustering methods. We will evaluate the methods on both small and easily visualizable datasets such as embeddings of OpenImages (see here), as well as large billion-scale pointsets like those in the BigANN benchmark (see here).

TITLE: Fair Decision Making

Furong Huang

Prerequisite: foundational knowledge on machine learning, fairness in machine learning, probability theory, linear algebra, calculus is required, basic knowledge on reinforcement learning is preferred.

Description

Machine learning systems have become prominent in many applications in everyday life, such as healthcare, finance, hiring, and education. These systems are intended to improve upon human decision-making by finding patterns in massive amounts of data, beyond what can be intuited by humans. However, it has been demonstrated that these systems learn and propagate similar biases present in human decision-making. This project aims to develop general theory and techniques on fairness in AI. This project is envisioned to go beyond designing “fair classifiers” that satisfy a static fairness notion in a single moment in time, and designs AI systems that make decisions over a period of time with the goal of ensuring overall long-term fair outcomes at the completion of a process. The use of data-driven AI solutions can allow the detection of patterns missed by humans, to empower targeted intervention and fair resource allocation over the course of an extended period of time. The research from this project will contribute to reducing bias in fair decision-making in general applications of machine learning.

TITLE: Exploring the Hilbert Geometry

David Mount,

AugusIte Gezalyan

Prerequisites Knowledge of discrete structures/introduction to proofs, algorithms, and programming skills.

Description

Normally we think of the distance between two points as the Euclidean distance. But there are other metrics that have been useful in fields such as Genetics, Probability, and Physics (see here). In this project we will be examining the Hilbert Metric from the perspective of computational geometry. We will explore questions involving this metric, such as how to construct Delaunay Triangulations and other fundamental structures. The project will include writing a software package to help us visualize these structures. In this project you will learn computational geometry and combine computational geometry with programming.

TITLE: Using Markov Decision Processes to Mitigate Climate Risk

Aviva Prins

Prerequisites Discrete Math, Probability, Algorithms, Machine Learning. Knowledge of Numerical Analysis, Mechanism Design, Reinforcement learning, or fairness preferred but not required. (This is a long list, so okay if you don't know all of them.)

Description

In India, the majority of farmers are classified as small or marginal, and their livelihoods are vulnerable to climate risk. Giving farmers help in planning crops well (e.g., delaying planting by a couple of days) can make a huge difference to their expected income. We aim to build an optimization-based decision support tool that provides crop planning advice to farmers.

We will develop a multi-agent Markov decision process approach for this low-resource sector, and discuss key evaluation metrics for the design and implementation phases of our project. This may involve the topics specified in the prerequisites.

This work is in conjunction with Kheyti, a nonprofit in India.