Combinatorics and Algorithms for Real Problems

Projects are listed by Type:

Algorithms, Machine Learning-AI

Algorithms Projects

TITLE: I/O-efficient Parallel Algorithms in Theory and Practice

Mentor: Laxman Dhulipala

Prerequisites Knowledge of Data Structures and Algorithms. Programming skills in C

Description

In parallel algorithm design we are interested in designing algorithms that obtain good speedup over the fastest sequential algorithms as we increase the number of cores used. The two primary measures of interest are the work of the algorithm, i.e., the total number of basic operations performed, and the depth, i.e., the longest chain of sequential dependencies in the algorithm. Due to the increasing size of modern data, designing algorithms that also minimize the number of I/O-complexity (transfers to/from main memory). while having high parallelism are of interest. This project will explore theoretical and practical parallel algorithms that have provably low I/O-complexity.

Some directions we may consider include designing data structures for storing data over a very large number of external memory devices (e.g., SSDs), and algorithms to exploit data stored across a large number of external memory devices. In this project you will learn about parallel algorithms, parallel programming, and designing I/O-efficient algorithms.

TITLE: Phylogenies, errors, and algorithms, oh my!

Mentors: Erin Molloy

Prerequisites (1) Familiarity with discrete math, algorithms, probability, and machine learning, (2) Proficiency in at least one programming language (ideally Python or C), (3) Knowledge of command line (UNIX)

Description

Evolutionary relationships (e.g., among different species or cells in a tumor) are important for many biological studies. However, evolution cannot be directly observed in most practical applications and thus must be reconstructed from genomic data (DNA). Reconstruction often involves complex computational pipelines, with errors from earlier steps in the pipeline propagating downstream and even impacting the accuracy of the reconstructed evolutionary tree, called a phylogeny.

This project will broadly focus on the issue of error propagation and phylogeny reconstruction from an algorithmic perspective. Specific projects could entail (1) quantifying the theoretical and/or practical impact of errors on combinatorial algorithms for evolutionary tree reconstruction, (2) developing phylogeny reconstruction algorithms that are more robust to erroneous or uncertain inputs, or (3) developing machine learning algorithms for improved error or uncertainty detection and evaluating their utility for phylogeny reconstruction.

Machine Learning and AI Projects

TITLE: Machine Learning for Autonomous Driving: Theory and Practice

Mentors: Ming Lin

Prerequisites Basic knowledge about Machine Learning and Neural Networks. Experience with Game Engines preferred but not required.

Description

Recent advances in self-driving platforms, from food delivery robots to autonomous vehicles (AV), highlight the urgent need for strong safety standards. AVs to operate in a mixed autonomy, where there will be human-driven and self-driving cars operating side by side on the road for the foreseeable future need the capability to anticipate human-driven vehicle behaviors. Human behaviour is different and less predictable than a self-driving car, thereby making it especially critical to capture large amounts of human driving data under diverse pre-crash scenarios for machine learning algorithms to learn from human demonstration.

This project aims to bridge this gap between theory and practice of Machine Learning applied to Autonomous Driving. This project will enable machine learning algorithms for autonomous driving to learn to drive more safely from collections of human driving data and behaviors in high-risk situations. This work will significantly enhancing our understanding of driving dynamics and vehicle safety.

TITLE: Making a Multimedia Quiz Show that Stumps AI

Mentors: Jordan Boyd-Graber

Prerequisites

Having a background/interest in human question answering (i.e., writing exams, playing trivia games) would be helpful in addition to knowledge/interest in computation graph frameworks like Pytorch, audio/visual data processing, building web interfaces, and web scraping.

Description

We have previously built mechanisms to measure human vs. computer calibration, see here and to measure how adversarial datasets are here.

We would like to extend these approaches to visual datasets: how well calibrated are visual LLMs, and what is the best way to form human–computer teams to jointly answer multimodal questions to best take advantage of human and computer skills.

This will require:

  • Building datasets of challenging visual questions (and answers)

  • Building an interface to allow for collaboration (and competition) between humans and computers

  • Collecting those responses from humans and computers

TITLE: Alternative Neural Nets for Navigating Games, Puzzles, and the Physical World

Mentor: Sarah Miller

Prereqisites Familiarity with machine learning, discrete math, algorithms, and programming; Candidates familiar with either formalization/proof development (i.e., Lean) or quantum computing may tackle respective aspects of the work, but neither is required.

Description

Project description: Building and training neural networks (NN) to reason both efficiently and flexibly in novel circumstances has been an enduring challenge. Using combinatorial game theory, we will build and train novel types of networks to solve puzzles and play games, aiming to capture increasingly general capabilities. We will test this framework on larger and more complex classes of puzzles and games towards ambiguous real-world predictions and decisions.

After testing this formal framework on interesting sums of puzzles and games, we can proceed with any of the following, depending on the students’ experience and inclination: (a) applications to physical systems, (b) formalization in a programming language, preferably Lean, or (c) exploring enhanced or accelerated computation using quantum resources. Option (a) does not require a physics prerequisite, and neither the formal methods of (b) nor the quantum computing of (c) will be necessary to make progress on this nascent approach to neural architectures and machine learning more generally.

We will compare these novel alternatives to more standard state-of-the-art methods, such as the latest innovations on transformer-based models and/or popular reinforcement learning approaches that have been persistently plagued by hallucinations and/or failures to transfer across complex environments.