Google Core AI

Real-time Detection of Household Objects
Developers: Abhinav Shrivastava, Tomasz Malisiewicz, Abhinav Gupta, Alexei Efros
Customer:    QoLT ERC (Martial Hebert, Takeo Kanade)


Problem Statement

The NSF Quality of Life Technology (QoLT) ERC investigates technologies to transform lives of people with reduced function capabilities. One important QoLT research thrust is to develop tools to understand users’ environments from visual input, including, as the most basic module, the ability to recognize common objects in the environment that are typically used in activities of daily living (ADL), e.g. bottles, chairs, and trash-cans. However, existing object recognition systems (mostly based on SIFT) generally perform well only on objects that are highly textured (e.g. book covers, paintings, etc) or have very unique colors that don't otherwise appear in the environment. Since most common household objects don't fit into these two categories, using current techniques for this task has proved highly problematic. Working closely with QoLT investigators, we propose to develop an App that can detect and recognize common household objects (without strong texture or color cues) in real-time. We envision using this app in the context of QoLT in a few different modes. For example, in a user-supervised mode, images of the objects would be selected by the user to learn the individual models used for detection. We envision using this app with the sensor systems developed in QoLT including, in particular, wearable cameras that observe the environment from the user’s point of view. This capability will mesh well with QoLT's planned concept for assistive systems (e.g., “where is object X?”) as well as our concept for enhancing user’s vision.

Underlying Technology

Our app showcases technology that is based on our recently published work [1][2]. The conceptually simple but surprisingly powerful method combines the effectiveness of a discriminative object detector with the explicit correspondence offered by a nearest-neighbor approach. The method is based on training a separate linear SVM classifier for every exemplar in the training set. Each of these Exemplar-SVMs is thus defined by a single positive instance and millions of negatives. While each detector is quite specific to its exemplar, we empirically observe that an ensemble of such Exemplar-SVMs offers surprisingly good generalization.

For the purposes of this project, we developed fast and real-time implementations of the method in both C/C++ and Matlab, both of which interface with a wearable ego-centric imaging system and can perform detection of tens of object categories in real-time.

Associated Projects
  • Data-driven Visual Similarity for Cross-domain Image Matching [2]
    • Website
    • Published in ACM Transactions on Graphics (journal) (presented at SIGGRAPH Asia, 2011)

  • Exemplar-SVMs for Object Detection and Beyond [1]

Customer Transfer and Demos

We have released the code for underlying technology (exemplar-based object detection [1] and cross-domain image matching [2]) in July 2011 and January 2012 respectively. This system was then implemented on an ego-centric wearable camera (hardware was provided by QoLT). After initial proof-of-concept testing, the real-time implementation (C/C++) was developed and released internally to QoLT, and we plan to release it publically once the QoLT completes their testing of the code in real scenarios.

Demo videos of our system in various scenarios is shown below.

Generic Scenario (2x)(Legacy Version) Office Scenario (1x)
Office Scenario (1x) Kitchen Scenario (1x)

A live demo of the our system implemented on a wearable ego-centric sensor system was shown in a QoLT Coordination meeting on August 29, 2011 and to the Advisory Board during annual NSF Site Visit on April 24-26, 2012.

Software Release

Source code for the basic infrastructure used in this work (Exemplar-SVM infrastructure for large-scale training using a cluster, fast detection, etc.) is available for download:

Exemplar-SVM tarball
Exemplar-SVM zipfile

You can also directly navigate to the Exemplar-SVM Github project page, which has download instructions, a wiki, and additional starter-guides. The C/C++ version will be released soon, after internal testing by QoLT.

[1] Tomasz Malisiewicz, Abhinav Gupta, Alexei A. Efros. Ensemble of Exemplar-SVMs for Object Detection and Beyond. In ICCV, 2011.
[2] Abhinav Shrivastava, Tomasz Malisiewicz, Abhinav Gupta, Alexei A. Efros. Data-driven Visual Similarity for Cross-domain Image Matching. ACM Transactions on Graphics (SIGGRAPH Asia), 2011