# UMD Computer Science researchers to present 15 papers at major AI event in New York

The Thirty-Fourth AAAI Conference on Artificial Intelligence (AAAI-20) will be held February 7-12, 2020 at the Hilton New York Midtown, New York, USA.

One of the oldest and most prestigious artificial intelligence (AI) conferences will take place next week in New York City.  The Thirty-Fourth AAAI Conference on Artificial Intelligence (AAAI) will bring together researchers and practitioners in AI, machine learning (ML), and related fields for a week of paper presentations, invited speakers, workshops, and tutorials.  Additionally, collocated conferences such as the ACM/SIGAI Conference on AI, Ethics, and Society (AIES) will attract a diverse set of participants to discuss the impact of AI, and AI hype, on society.

UMDCS researchers will present papers at both AAAI and AIES. As contributed in the past they will continue to serve  a variety of administrative roles, ranging from senior program committee members (e.g., Rama Chellappa, Aravind Srinivasan, John Dickerson, Dinesh Manocha) to conference organizers (e.g., John Dickerson, co-organizer Annual SIGAI/AAAI job fair).  Many UMD researchers---PhD students, undergraduates, and professors alike---will be in attendance.

Assistant Professor John Dickerson will also lead a 3.5-hour-long tutorial on Optimization and Learning Approaches to Resource Allocation for Social Good, joint with researchers from Washington University in St Louis, Northwestern/MIT, and Harvard.

Papers at AAAI:

Ali Shafahi, Mahyar Najibi, Zheng Xu, John Dickerson, Larry S. Davis, Tom Goldstein

Standard adversarial attacks change the predicted class label of a selected image by adding specially tailored small perturbations to its pixels. In contrast, a universal perturbation is an update that can be added to any image in a broad class of images, while still changing the predicted class label. We study the efficient generation of universal adversarial perturbations, and also efficient methods for hardening networks to these attacks. We propose a simple optimization-based universal attack that reduces the top-1 accuracy of various network architectures on ImageNet to less than 20%, while learning the universal perturbation 13X faster than the standard method.  To defend against these perturbations, we propose universal adversarial training, which models the problem of robust classifier generation as a two-player min-max game, and produces robust models with only 2X the cost of natural training. We also propose a simultaneous stochastic gradient method that is almost free of extra computation, which allows us to do universal adversarial training on ImageNet.

Balancing the Tradeoff between Profit and Fairness in Rideshare Platforms During High-Demand Hours (link)

Vedant Nanda, Pan Xu, Karthik Abinav Sankararaman, John P. Dickerson, Aravind Srinivasan

Rideshare platforms, when assigning requests to drivers, tend to maximize profit for the system and/or minimize waiting time for riders. Such platforms can exacerbate biases that drivers may have over certain types of requests. We consider the case of peak hours when the demand for rides is more than the supply of drivers. Drivers are well aware of their advantage during the peak hours and can choose to be selective about which rides to accept. Moreover, if in such a scenario, the assignment of requests to drivers (by the platform) is made only to maximize profit and/or minimize wait time for riders, requests of a certain type (e.g. from a non-popular pickup location, or to a non-popular drop-off location) might never be assigned to a driver. Such a system can be highly unfair to riders. However, increasing fairness might come at a cost of the overall profit made by the rideshare platform. To balance these conflicting goals, we present a flexible, non-adaptive algorithm, ALG, that allows the platform designer to control the profit and fairness of the system via parameters α and β respectively. We model the matching problem as an online bipartite matching where the set of drivers is offline and requests arrive online. Upon the arrival of a request, we use ALG to assign it to a driver (the driver might then choose to accept or reject it) or reject the request. We formalize the measures of profit and fairness in our setting and show that by using ALG, the competitive ratios for profit and fairness measures would be no worse than α/e and β/e respectively. Extensive experimental results on both real-world and synthetic datasets confirm the validity of our theoretical lower bounds. Additionally, they show that ALG under some choice of (α,β) can beat two natural heuristics, Greedy and Uniform, on both fairness and profit.

Detecting Human-Object Interactions via Functional Generalization (link)

Ankan Bansal, Sai Saketh Rambhatla, Abhinav Shrivastava, Rama Chellappa

We present an approach for detecting human-object interactions (HOIs) in images, based on the idea that humans interact with functionally similar objects in a similar manner. The proposed model is simple and efficiently uses the data, visual features of the human, relative spatial orientation of the human and the object, and the knowledge that functionally similar objects take part in similar interactions with humans. We provide extensive experimental validation for our approach and demonstrate state-of-the-art results for HOI detection. On the HICO-Det dataset our method achieves a gain of over 2.5% absolute points in mean average precision (mAP) over state-of-the-art. We also show that our approach leads to significant performance gains for zero-shot HOI detection in the seen object setting. We further demonstrate that using a generic object detector, our model can generalize to interactions involving previously unseen objects.

Generate, Segment and Refine: Towards Generic Manipulation Segmentation (link)

Peng Zhou, Bor-Chun Chen, Xintong Han, Mahyar Najibi, Abhinav Shrivastava, Ser Nam Lim, Larry S. Davis

Detecting manipulated images has become a significant emerging challenge. The advent of image sharing platforms and the easy availability of advanced photo editing software have resulted in a large quantities of manipulated images being shared on the internet. While the intent behind such manipulations varies widely, concerns on the spread of fake news and misinformation is growing. Current state of the art methods for detecting these manipulated images suffers from the lack of training data due to the laborious labeling process. We address this problem in this paper, for which we introduce a manipulated image generation process that creates true positives using currently available datasets. Drawing from traditional work on image blending, we propose a novel generator for creating such examples. In addition, we also propose to further create examples that force the algorithm to focus on boundary artifacts during training. Strong experimental results validate our proposal.

NeoNav: Improving the Generalization of Visual Navigation via Generating Next Expected Observations (link)

Qiaoyun Wu; Dinesh Manocha; Jun Wang; Kai Xu

We propose improving the cross-target and cross-scene generalization of visual navigation through learning an agent that is guided by conceiving the next observations it expects to see. This is achieved by learning a variational Bayesian model, called NeoNav, which generates the next expected observations (NEO) conditioned on the current observations of the agent and the target view. Our generative model is learned through optimizing a variational objective encompassing two key designs. First, the latent distribution is conditioned on current observations and the target view, leading to a modelbased, target-driven navigation. Second, the latent space is modeled with a Mixture of Gaussians conditioned on the current observation and the next best action. Our use of mixtureof-posteriors prior effectively alleviates the issue of overregularized latent space, thus significantly boosting the model generalization for new targets and in novel scenes. Moreover, the NEO generation models the forward dynamics of agentenvironment interaction, which improves the quality of approximate inference and hence benefits data efficiency. We have conducted extensive evaluations on both real-world and synthetic benchmarks, and show that our model consistently outperforms the state-of-the-art models in terms of success rate, data efficiency, and generalization.

3D Shape Completion with Multi-view Consistent Inference (link)

Tao Hu, Zhizhong Han, Matthias Zwicker

3D shape completion is important to enable machines to perceive the complete geometry of objects from partial observations. To address this problem, view-based methods have been presented. These methods represent shapes as multiple depth images, which can be back-projected to yield corresponding 3D point clouds, and they perform shape completion by learning to complete each depth image using neural networks. While view-based methods lead to state-of-the-art results, they currently do not enforce geometric consistency among the completed views during the inference stage. To resolve this issue, we propose a multi-view consistent inference technique for 3D shape completion, which we express as an energy minimization problem including a data term and a regularization term. We formulate the regularization term as a consistency loss that encourages geometric consistency among multiple views, while the data term guarantees that the optimized views do not drift away too much from a learned shape descriptor. Experimental results demonstrate that our method completes shapes more accurately than previous techniques.

STEP: Spatial Temporal Graph Convolutional Networks for Emotion Perception from Gaits (link)

Uttaran Bhattacharya, Trisha Mittal , Rohan Chandra, Tanmay Randhavane, Aniket Bera, Dinesh Manocha

We present a novel classifier network called STEP, to classify perceived human emotion from gaits, based on a Spatial Temporal Graph Convolutional Network (ST-GCN) architecture. Given an RGB video of an individual walking, our formulation implicitly exploits the gait features to classify the emotional state of the human into one of four emotions: happy, sad, angry, or neutral. We use hundreds of annotated real-world gait videos and augment them with thousands of annotated synthetic gaits generated using a novel generative network called STEP-Gen, built on an STGCN based Conditional Variational Autoencoder (CVAE). We incorporate a novel push-pull regularization loss in the CVAE formulation of STEP-Gen to generate realistic gaits and improve the classification accuracy of STEP. We also release a novel dataset (E-Gait), which consists of 2, 177 human gaits annotated with perceived emotions along with thousands of synthetic gaits. In practice, STEP can learn the affective features and exhibits classification accuracy of 89% on E-Gait, which is 14−30% more accurate over prior
methods.

M3ER: Multiplicative Multimodal Emotion Recognition using Facial, Textual, and Speech Cues (link)

Trisha Mittal, Uttaran Bhattacharya, Rohan Chandra, Aniket Bera, Dinesh Manocha

We present M3ER, a learning-based method for emotion recognition from multiple input modalities. Our approach combines cues from multiple co-occurring modalities (such
as face, text, and speech) and also is more robust than other methods to sensor noise in any of the individual modalities. M3ER models a novel, data-driven multiplicative fusion method to combine the modalities, which learn to emphasize the more reliable cues and suppress others on a persample basis. By introducing a check step which uses Canonical Correlational Analysis to differentiate between ineffective and effective modalities, M3ER is robust to sensor noise. M3ER also generates proxy features in place of the ineffectual modalities. We demonstrate the efficiency of our network through experimentation on two benchmark datasets, IEMOCAP and CMU-MOSEI. We report a mean accuracy of 82.7% on IEMOCAP and 89.0% on CMU-MOSEI, which, collectively, is an improvement of about 5% over prior work

Micah Goldblum, Liam Fowl, Soheil Feizi, Tom Goldstein

Knowledge distillation is effective for producing small, high-performance neural networks for classification, but these small networks are vulnerable to adversarial attacks. This paper studies how adversarial robustness transfers from teacher to student during knowledge distillation. We find that a large amount of robustness may be inherited by the student even when distilled on only clean images. Second, we introduce Adversarially Robust Distillation (ARD) for distilling robustness onto student networks. In addition to producing small models with high test accuracy like conventional distillation, ARD also passes the superior robustness of large networks onto the student. In our experiments, we find that ARD student models decisively outperform adversarially trained networks of identical architecture in terms of robust accuracy, surpassing state-of-the-art methods on standard robustness benchmarks. Finally, we adapt recent fast adversarial training methods to ARD for accelerated robust distillation.

Maximum Likelihood Latent Space Embedding of Logistic Random Dot Product Graphs (link)

Luke O'Connor, Muriel Medard, Soheil Feizi

A latent space model for a family of random graphs assigns real-valued vectors to nodes of the graph such that edge probabilities are determined by latent positions. Latent space models provide a natural statistical framework for graph visualizing and clustering. A latent space model of particular interest is the Random Dot Product Graph (RDPG), which can be fit using an efficient spectral method; however, this method is based on a heuristic that can fail, even in simple cases. Here, we consider a closely related latent space model, the Logistic RDPG, which uses a logistic link function to map from latent positions to edge likelihoods. Over this model, we show that asymptotically exact maximum likelihood inference of latent position vectors can be achieved using an efficient spectral method. Our method involves computing top eigenvectors of a normalized adjacency matrix and scaling eigenvectors using a regression step. The novel regression scaling step is an essential part of the proposed method. In simulations, we show that our proposed method is more accurate and more robust than common practices. We also show the effectiveness of our approach over standard real networks of the karate club and political blogs.

Alexander Levine, Soheil Feizi

Recently, techniques have been developed to provably guarantee the robustness of a classifier to adversarial perturbations of bounded L_1 and L_2 magnitudes by using randomized smoothing: the robust classification is a consensus of base classifications on randomly noised samples where the noise is additive. In this paper, we extend this technique to the L_0 threat model. We propose an efficient and certifiably robust defense against sparse adversarial attacks by randomly ablating input features, rather than using additive noise. Experimentally, on MNIST, we can certify the classifications of over 50% of images to be robust to any distortion of at most 8 pixels. This is comparable to the observed empirical robustness of unprotected classifiers on MNIST to modern L_0 attacks, demonstrating the tightness of the proposed robustness certificate. We also evaluate our certificate on ImageNet and CIFAR-10. Our certificates represent an improvement on those provided in a concurrent work (Lee et al. 2019) which uses random noise rather than ablation (median certificates of 8 pixels versus 4 pixels on MNIST; 16 pixels versus 1 pixel on ImageNet.) Additionally, we empirically demonstrate that our classifier is highly robust to modern sparse adversarial attacks on MNIST. Our classifications are robust, in median, to adversarial perturbations of up to 31 pixels, compared to 22 pixels reported as the state-of-the-art defense, at the cost of a slight decrease (around 2.3%) in the classification accuracy. Code is available at this https URL.

Exploiting Cross-Lingual Subword Similarities in Low-Resource Document Classification (link)

Mozhi Zhang, Yoshinari Fujinuma, and Jordan Boyd-Graber

Text classification must sometimes be applied in a low-resource language with no labeled training data. However, training data may be available in a related language. We investigate whether character-level knowledge transfer from a related language helps text classification. We present a cross-lingual document classification framework (CACO) that exploits cross-lingual subword similarity by jointly training a character-based embedder and a word-based classifier. The embedder derives vector representations for input words from their written forms, and the classifier makes predictions based on the word vectors. We use a joint character representation for both the source language and the target language, which allows the embedder to generalize knowledge about source language words to target language words with similar forms. We propose a multi-task objective that can further improve the model if additional cross-lingual or monolingual resources are available. Experiments confirm that character-level knowledge transfer is more data-efficient than word-level transfer between related languages.

Controlling Neural Machine Translation Formality with Synthetic Supervision (link)

Xing Niu and Marine Carpuat

This work aims to produce translations that convey source language content at a formality level that is appropriate for a particular audience. Framing this problem as a neural sequence-to-sequence task ideally requires training triplets consisting of a bilingual sentence pair labeled with target language formality. However, in practice, available training examples are limited to English sentence pairs of different styles, and bilingual parallel sentences of unknown formality. We introduce a novel training scheme for multi-task models that automatically generates synthetic training triplets by inferring the missing element on the fly, thus enabling end-to-end training. Comprehensive automatic and human assessments show that our best model outperforms existing models by producing translations that better match desired formality levels while preserving the source meaning.

Papers at AIES:

Human Comprehension of Fairness in Machine Learning (link)
Debjani Saha, Candice Schumann, Duncan C. McElfresh, John P. Dickerson, Michelle L. Mazurek, Michael Carl Tschantz

Bias in machine learning has manifested injustice in several areas, such as medicine, hiring, and criminal justice. In response, computer scientists have developed myriad definitions of fairness to correct this bias in fielded algorithms. While some definitions are based on established legal and ethical norms, others are largely mathematical. It is unclear whether the general public agrees with these fairness definitions, and perhaps more importantly, whether they understand these definitions. We take initial steps toward bridging this gap between ML researchers and the public, by addressing the question: does a non-technical audience understand a basic definition of ML fairness? We develop a metric to measure comprehension of one such definition--demographic parity. We validate this metric using online surveys, and study the relationship between comprehension and sentiment, demographics, and the application at hand.

Artificial Artificial Intelligence: Measuring Influence of AI 'Assessments' on Moral Decision-Making (link)
Lok Chan, Kenzie Doyle, Duncan McElfresh, Vincent Conitzer, John P. Dickerson, Jana Schaich Borg, Walter Sinnott-Armstrong

Given AI's growing role in modeling and improving decision-making, how and when to present users with feedback is an urgent topic to address. We empirically examined the effect of feedback from false AI on moral decision-making about donor kidney allocation. We found some evidence that judgments about whether a patient should receive a kidney can be influenced by feedback about participants' own decision-making perceived to be given by AI, even if the feedback is entirely random. We also discovered different effects between assessments presented as being from human experts and assessments presented as being from AI.

Balancing the Tradeoff between Profit and Fairness in Rideshare Platforms During High-Demand Hours (link)
Vedant Nanda, Pan Xu, Karthik Abinav Sankararaman, John P. Dickerson, Aravind Srinivasan

This paper will be presented as a long-form oral at AIES, but published as a full paper at the co-located AAAI conference.

Rideshare platforms, when assigning requests to drivers, tend to maximize profit for the system and/or minimize waiting time for riders. Such platforms can exacerbate biases that drivers may have over certain types of requests. We consider the case of peak hours when the demand for rides is more than the supply of drivers. Drivers are well aware of their advantage during the peak hours and can choose to be selective about which rides to accept. Moreover, if in such a scenario, the assignment of requests to drivers (by the platform) is made only to maximize profit and/or minimize wait time for riders, requests of a certain type (e.g. from a non-popular pickup location, or to a non-popular drop-off location) might never be assigned to a driver. Such a system can be highly unfair to riders. However, increasing fairness might come at a cost of the overall profit made by the rideshare platform. To balance these conflicting goals, we present a flexible, non-adaptive algorithm, \lpalg, that allows the platform designer to control the profit and fairness of the system via parameters α and β respectively. We model the matching problem as an online bipartite matching where the set of drivers is offline and requests arrive online. Upon the arrival of a request, we use \lpalg to assign it to a driver (the driver might then choose to accept or reject it) or reject the request. We formalize the measures of profit and fairness in our setting and show that by using \lpalg, the competitive ratios for profit and fairness measures would be no worse than α/e and β/e respectively. Extensive experimental results on both real-world and synthetic datasets confirm the validity of our theoretical lower bounds. Additionally, they show that $\lpalg$ under some choice of (α,β) can beat two natural heuristics, Greedy and Uniform, on \emph{both} fairness and profit.

The Department welcomes comments, suggestions and corrections.  Send email to editor [-at-] cs [dot] umd [dot] edu.