Dean's Fellows Lecture Series

All talks will be held in the A.V. Williams Grad student lounge (room 1152) on Wednesdays at 3:20 pm. (That is, talks will start somewhat less than halfway through coffee hour.) If there is not enough space, we will move to A.V. Williams 3258. Talks will finish by 3:50 pm.

Past Talks

Sept. 5, 2007 Polyvios Pratikakis
Locksmith: Finding data races with static analysis

Sept. 12, 2007 Azarakhsh Malekian
Energy Efficient monitoring in sensor networks

Sept. 19, 2007 Jagan Sankaranarayanan
STEWARD: A Spatio-Textual Document Search Engine

Sept. 26, 2007 Tsz-Chiu Au
Accident or Intention: That is the Question (in the Noisy Iterated Prisoner's Dilemma)

Oct. 3, 2007 Justin Domke
Why it Might be a Good Idea to Solve the Egomotion Problem Probabilistically

Oct. 10, 2007 Lidan Wang
Predictive Modeling-based Data Collection in Wireless Sensor Networks

Oct. 17, 2007 Thuan Huynh
Improving Neural Network Encodings

Oct. 24, 2007 Maria Martinez
Tractable Methods for Predicting an Opponent's Most Probable Actions

Oct. 31, 2007 Adam Phillippy
To catch a pathogen: Computational DNA signature discovery and

Nov. 14, 2007 Aleks Aris
Visualizing & Exploring Networks using Semantic Substrates

Abstracts

Sept. 5, 2007 Polyvios Pratikakis
Locksmith: Finding data races with static analysis.

Locksmith is a static analysis tool that finds data races in multithreaded C programs. To do that, it implements a static version of the Lockset algorithm. A race occurs whenever two threads access a memory location without any synchronization, and one of the accesses is a write. Locksmith analyzes multithreaded programs that use locks (or mutexes) to protect accesses to shared locations, and tries to discover which, if any, locks protect each shared location. Whenever an access to a shared memory location is not protected by any lock, it potentially is a race.

Sept. 12, 2007 Azarakhsh Malekian

In this talk we look at a set of problems related to efficient energy management for monitoring applications in wireless sensor networks. We also look at some generalizations of a basic problem called Set k-Cover. Informally the problem can be described as follows: we are given a set of sensors, and a set of regions to be monitored. Each region can be monitored by a subset of sensors. The goal is to partition the sensors into k sets (or time-slots), so that by activating the set of sensors in a time-slot, we can maximize coverage of the regions. By activating each sensor in only one of the k time slots, we decrease its battery consumption and extend its battery life significantly (by a factor of k). This problem is known to be NP-hard.Our goal is to develop improved algorithms for this problem.We also consider generalizations of this problem in different directions. First, we allow each sensor to be active in "a" different sets (time-slots). Also we look at other variants of coverage such as requiring that the regions be covered in each time slot, or a certain number of regions be monitored in each time slot. In the Set k-Cover formulation, there is no requirement that a region be monitored at all, or in any number of time slots. We develop a randomized rounding algorithm for this problem. We also consider extensions where each sensor can monitor only a bounded number of regions, and not all the regions adjacent to it. We develop the first approximation algorithms for this problem.

Sept. 19, 2007 Jagan Sankaranarayanan
STEWARD: A Spatio-Textual Document Search Engine

The workings of a search engine, called STEWARD ("Spatio-Textual Extraction on the Web Aiding Retrieval of Documents") are described. STEWARD is the prototype of a spatio-textual search engine for extracting, querying, and visualizing textual references to geographic locations in unstructured text documents. This is in contrast to search engines like Google that rank documents based on how many other documents link to it. In the case of STEWARD, the query string contains a geographical entity, and we wish to find other documents that are related to it by spatial proximity. For example, a document containing ``Los Angeles'' is deemed relevant to a query string containing ``Hollywood'', even though the query string ``Hollywood'' might not even be mentioned in the document. Such a query cannot be effectively processed by existing search engines because they do not have the notion that both ``Los Angeles'' and ``Hollywood'' are textual references to geographical locations and that they are close to each other on a map. Methods for retrieving and processing web documents, extracting and disambiguating georeferences, and identifying geographic focus are described. A demo of STEWARD is presented that exhibits some of its functionalities.

Joint work with Michael D. Lieberman and Hanan Samet of UMD CS, and Jon Sperling of HUD PD&R.

Sept. 26, 2007 Tsz-Chiu Au
Accident or Intention: That is the Question (in the Noisy Iterated Prisoner's Dilemma)

The performance of multi-agent systems often depends on the robustness of interaction among agents. But errors can occur during the interaction, and that could break the premises the agents make about their interaction. This problem is compounded by the fact that the agents are self-interested and do not completely trust each other; agents can no longer trust each other because of the mistakes that the other agents make, albeit the mistakes are not intentional but accidental. How to cope with such mistakes is a critical factor in the maintenance of cooperation among agents.

To study how to prevent or alleviate conflicts in multi-agent environments, we focus on a game called the Iterated Prisoner's Dilemma (IPD), which is well-known as an abstract model for studying cooperative behavior between two self-interested parties. An important variant of the IPD is called the Noisy IPD, in which there is a nonzero probability that a "cooperate" action will accidentally be changed into a "defect" action and vice versa. Tit-For-Tat and other strategies that do quite well in the ordinary (non-noisy) IPD can do quite badly in the Noisy IPD.

This talk presents a technique called symbolic noise detection, for detecting whether anomalies in player's behavior are deliberate or accidental. The basic idea is to use the other player's deterministic behavior to tell whether an action has been affected by noise. This technique is effective because many IPD players often exhibit a clear deterministic pattern of behavior during interaction. To evaluate this technique, we wrote several programs to participate in the 2005 Iterated Prisoner.s Dilemma competition. Our program did quite well in the competition---out of the 165 contestants in this category, most of our programs ranked among top ten.

Oct. 3, 2007 Justin Domke
Why it Might be a Good Idea to Solve the Egomotion Problem Probabilistically

(This talk will not assume any prior knowledge of computer vision.)

In the egomotion problem, one takes a set of images as input, and tries to compute the 3-D rigid motion between the views as output. In this talk, I will review the traditional approach, is based on the idea of correspondences, or point matches between the different images. The uncertainty in these point matches is fundamental to solving the problem accurately, which leads to the suggestion that one compute full probability distributions of correspondence. Given a set of correspondence distributions, one can easily compute measure of how well a particular rigid motion fits to a pair of images. Experiments using this measure through a "brute force search" through all possible motions suggest that it is highly accurate, but slow. Finally, I will discuss some of our recent work on methods that efficiently manipulate the distribution over the motions to avoid this search phase.

Oct. 10, 2007 Lidan Wang
Predictive Modeling-based Data Collection in Wireless Sensor Networks

We address the problem of designing practical, energy-efficient protocols for continuously collecting data from a wireless sensor network using predictive modeling. Prior work has suggested several approaches to capture, and exploit, the spatial and temporal correlations present in a typical sensor network to minimize energy consumption during data collection. While they have been shown to be effective in reducing the data collection cost, these approaches use simplistic correlation models that are unable to capture all spatial correlations in the network. Further, many idiosyncrasies of WSNs, in particular the broadcast nature of communication, are typically ignored, but can significantly impact the performance of the data collection protocols in practice. Our proposed approach is based on approximating the joint probability distribution over the sensors using undirected graphical models, ideally suited to exploit both the spatial correlations in a sensor network and the broadcast nature of communication.We present algorithms for optimally using such a model for data collection in a sensor network under different communication models; we also develop algorithms for identifying an appropriate model to use for a given sensor network. We present an extensive experimental study over synthetic and real-world datasets showing that our proposed approach significantly reduces the cost of data collection.

Oct. 17, 2007 Thuan Huynh
Improving Neural Network Encodings

Abstract: Error backpropagation is the most widely used supervised learning method for neural networks and has achieved success in many classification, regression and prediction applications. The network learns a mapping between the input and output units, while the hidden units and the weights between them and other units contain the network's internal representation of the input. This work addresses the problem that the neural network's internal representations learned by error backpropagation are very difficult for people to interpret. Our approach is to add a regularization term to the error function used in training to create a sparser encoding at the hidden layer in neural networks. This term helps to increase the sum of squared/inverted squared Euclidean distances between hidden activation vectors. The method was applied to public datasets from the UCI machine learning repository. We show that the final encodings are easier to distinguish and provide a good basis for learning symbolic rules from the network. We also applied it to the phoneme sequence recognition problem and got some promising initial results.

Oct. 24, 2007 Maria Martinez
Tractable Methods for Predicting an Opponent's Most Probable Actions

There are numerous multi-agent applications where it would be useful, if not necessary, to have a model of the behavior of the other agents that are involved. They may be opponents, in the sense that they are rivals with conflicting goals, or simply a group in which the decisions made by agents are important and in some way affect each other's behavior, such as in negotiation environments. One formalism designed for this purpose is the SOMA (Stochastic Opponent Modeling Agents) language. In this language, opponent models may be expressed on a domain-independent basis for a wide variety of applications ranging from automated game playing programs to strategic adversarial reasoning problems. SOMA provides the features that are necessary in order to reason about opponent agents' behavior. It basically consists of inference rules that combine logical and probabilistic agent reasoning in a mathematically sound manner.. Within this language, we can express rules about the probability that a given person or group will act in a certain way in a given situation. The main goal is to compute the k most probable sets of actions that these entities can take; this computation involves solving linear programs that have exponential size with respect to the number of possible actions. In this work, we present techniques (both exact and heuristic) that can be used to tackle this problem.

Oct. 31, 2007 Adam Phillippy
To catch a pathogen: Computational DNA signature discovery and applications to biodefense.

Modern health and security concerns have raised interest in the real- time detection and identification of pathogenic microbes. With the availability of hundreds of complete bacterial and viral genome sequences, it is now possible to use computational methods to identify distinguishing DNA sequences in all of these species (termed signature sequences), and to use these signatures as the basis for diagnostic assays to detect and genotype microbes in both environmental and clinical samples. The success of such assays critically depends on the methods used to identify signatures that properly differentiate between the target genomes and the sample background.

I will present Insignia, a new computational system that identifies DNA signatures of any length in bacterial and viral genomes. Insignia uses highly efficient string matching algorithms to compare sequenced bacterial and viral genomes against each other and to additional background genomes including plants, animals, and human. These comparisons are stored in a database and used to rapidly compute signatures for any particular target species. We have validated 50 Insignia-designed assays on a panel of 46 strains of Vibrio cholerae, and our results show that the signatures are both sensitive and specific.

Nov 14, 2007 Aleks Aris
Visualizing & Exploring Networks using Semantic Substrates
Visualizing and exploring network data has been a challenging problem for HCI (Human-Computer Interaction) Information Visualization researchers due to the complexity of representing networks (graphs). Research on this area has concentrated on improving the visual organization of nodes and links according to graph drawing aesthetics criteria, such as minimizing link crossings and the longest link length. Semantic substrates offer a different approach by which node locations represent node attributes. Users produce (or are assisted to produce) semantic substrates for a given dataset according to its characteristics and the questions, needs, and tasks of users. Then, the substrate is used to layout nodes of the network. Link visibility filters are provided to isolate various parts of the dataset and find meaningful relationships. Datasets, such as legal precedent (with court decisions citing one another) and food-web (predator-prey relationships) have been explored using NVSS, a network visualization software to test and refine the semantic substrate approach. A case study approach is used for evaluation, during which insights and hypothesis are generated by users. The techniques are potentially applicable to other datasets such as social networks, business networks, and email communication. Semantic substrates promise a more controlled, focused, and understandable way of exploring networks.








For any issues with this page, please contact Justin Domke at (his last name)@cs.umd.edu