Combinatorics and Algorithms for Real Problems

Summer 2018



Explainable AI

Celeste Paul

Prerequisite: A background in artificial intelligence, data science, OR machine learning and an interest in human-computer interaction. Linear algebra also helpful.

With very little knowledge of machine learning, frameworks such as TensorFlow have made it easier than ever to quickly build, train, and deploy a neural network. However, this ease of use comes at a cost and sometimes errors in creating and training classifiers or constructing and tuning networks can result in mismatching of machine and human intelligence. Rookie mistakes combined with the opaque nature of modern neural networks can make it difficult to identify, let alone debug, problems in a machine intelligence model.

Our focus for the summer is on the developer: what are ways that we can provide insight at different phases of creating a neural network to improve the explainability and reliability of machine intelligence? This summer, we are focused on two main problem areas: 1) the construction and training of classifiers; and, 2) the design and tuning of a network.

Side Channels in the Cloud

Dana Dachman-Soled

According to the International Data Corporation (IDC), world wide spending on public cloud services will reach $127 billion by 2018. But public clouds give rise to unique security challenges, even when the cloud provider itself is trusted. In the course of this project we will explore these security challenges, focusing on the leakage of information during real-life computational tasks. Specifically, we will examine to what extent side-channel attacks can expose data when victim and spy processes are co-resident on the same physical machine.

Diversity in Matching Markets

John Dickerson

Prerequisites: basic understanding of supervised machine learning, some experience with a programming language

How should a firm allocate its limited interviewing resources to select the optimal cohort of new employees from a large set of job applicants? How should that firm allocate cheap but noisy resume screenings and expensive but in-depth in-person interviews? We view this problem through the lens of combinatorial pure exploration (CPE) in the multi-armed bandit setting, where a central learning agent performs costly exploration of a set of arms before selecting a final subset with some combinatorial structure. In our case, that combinatorial structure will incorporate elements of diversity, where including multiple elements of the same type in the final set results in diminishing marginal gain. We will apply our algorithms to real data from university admissions and the hiring processes from major tech firms.

Finding Colorings of the Plane Using Optimization Techniques

Thomas Goldstein, Clyde Kruskal

The following are known: For every 2-coloring of the plane there are two points an inch apart that are the same color. For every 3-coloring of the plane there are three points an inch apart that are the same color. There is a 7-coloring of the plane where all pairs of points that are in inch apart are different colors.

What happens for 4,5, and 6 colors? It is an open problem.

A region of the plane is c-colorable if you can color it with c colors and have no two points an inch apart be the same color. Tight bounds are known for coloring some types of regions, including circles and rectangles, with two and three colors. Little is known for more than three colors.

We look at the following: given a number c and a region of the plane, find if it is c-colorable. If it is then find the c-coloring.

While this is traditionally a pure math problem we will use optimization tools from applied math to color regions of the plane. We may also look at other conditions on colorings.

Modeling signals with deep neural networks

Thomas Goldstein

Prerequisite: Linear Algebra, Python. Helpful but not required: Maching Learning Toolboxes like PyTorch or TensorFlow.

Deep neural networks are a powerful tool for modeling images and text, but recent work has focused on methods for modeling signals. In this REU, we'll use methods from machine learning to model, analyze, and classify1-dimensional signals and time series data. One particular application of interest is modeling brain waves using data obtained via Magnetoencephalography (MEG) for the purpose of identifying seizures. We'll also investigate signals obtained from wireless communications, and time series data extracted from medical images and videos.

Cryptanalysis for Lattice-Based Cryptography

Jonathan Katz

Prerequisites: Mathematical maturity and programming ability in C/C++. Knowledge of cryptography is useful but not essential.

Next-generation cryptosystems may be based on newer assumptions related to the hardness of solving certain problems on lattices. This project will explore implementations of known algorithms from the literature for solving these problems in an effort to understand their concrete hardness.

Large Ind Sets OR Data Clustering

Samir Khuller

1) Implimenting graph approximation algorithms for a storage allocation problem.

This project entails implementing several heuristics for approximaion problems on graphs. Students need to be good coders and have a good understanding of graphs and their representations.

2) Clustering for fairness.

Given some data and some sensitive attributes like race, how can we find a fair clustering? In other words, each cluster should be well represetened racially. How can we design clustering algorithms to take this into account.

Trial and Error for Graph Properties

Aarthi Sundaram

Prerquisite: Graph Theory

Algorithmic graph theory aims to characterize graphs based on their properties. One of the ways this is done is by characterizing or testing when an unknown graph G, accessed by querying an oracle, possesses a particular property. Generally, the queries either ask if a particular edge is present in the graph or the set of neighbours for a particular vertex. In this project, for a property P, we consider more general queries in the form of certificates for P (  trials). The oracle then reveals either that this certificate is indeed present in the unknown graph or provides some information about some edge in the certificate that is absent in G (  errors). This is essentially the graph property framework of the trial and error model described in Ivanyos, et. al. (IKQ+14 Ref Below) For a particular kind of oracle in this framework, it is known that a property P can be efficiently checked if there is an efficient algorithm for the associated P^{cap} problem which asks: Given a graph H and a set of edges E' subseteq E(H), is there a certificate for P that contains an edge from E’. This raises the question of characterizing which properties P have polynomial time algorithms for the associated P^{cap} problem, which would be goal of this project. This question would also hold interest independent of the trial and error model as a means of finding a finer classification of poly-time solvable monotone graph properties.

(IKQ+14) Gábor Ivanyos, Raghav Kulkarni, Youming Qiao, Miklos Santha, and Aarthi Sundaram. On the complexity of trial and error for constraint satisfaction problems. In Automata, Languages, and Programming - 41st International Colloquium, ICALP 2014, Copenhagen, Denmark, July 8-11, 2014, Proceedings, Part I, Lecture Notes in Computer Science, pages 663–675. Springer, 2014.” See here