CMSC828J Advanced Topics in Information Processing: Approaches to Representing and Recognizing Objects

General Information

 

 

Class Time  

Wed: 3:00-6:00

Room

CSI 3120

Course Info

See below

Text

Readings available in handouts and on web.  See below 

Personnel

 

Instructor

 

Name

David Jacobs

 

Email

djacobs at cs dot umd dot edu

 

Office

AVW 4421

 

Office hours

Wed 1:00-2:00 or by appt. (or just drop by).

 

 

Announcements

Review session for final will be at 3pm on Wed., May 13, in our usual classroom.

 

Review sheet for Final here.

 

Midterm review will be 1:00-2:30 on Monday, 3/9, in AV Williams Room 3165.

 

Practice Midterm: here.

 

Due on 3/11: a one paragraph description of your proposed class project.  Include a description of the algorithms you will implement and the data set you will use.

Description

One of the most basic problems in vision is to use images to recognize that a particular object or event that we’ve never seen before belongs to a particular class of objects or events.   To do this we must have a rich notion of what an object is, that can capture what is common in them.  For example, chairs vary tremendously in their shape and material properties.  How do we look at a chair we’ve never seen before and identify it as a chair? Accounting for this variation in recognition is largely an unsolved problem.  In this course we will survey a number of approaches to representing and recognizing objects.  We will draw inspiration by looking at work from philosophy, psychology, linguistics, and mathematics.  However, our primary focus will be more concrete, to learn the algorithms and analytic tools that have been applied in visual object classification. 

The class will alternate between lectures teaching the basic mathematical and algorithmic techniques of these methods, and discussion of vision research papers that apply these techniques.  It will be essential for students to have a solid understanding of basic topics in math, such as linear algebra, probability and statistics, and calculus.  It will also be useful to have some knowledge of computer vision, image processing, functional analysis, stochastic processes, or geometry.  In general, the more math a student knows, the easier the course will be.

Requirements

Here is my current plan for the workload of the class. 

1) Reports.  There are 11 classes scheduled in which we will discuss research papers.  Prior to each of these classes, students must turn in one page, in which they discuss a preassigned question concerning the reading.  Late papers will not be accepted, since the goal of these reports is to get you to think about papers before we discuss them.  However, each student need not turn in a report when they are giving a presentation.  In addition, students may skip one paper.  Consequently, each student will be required to complete this assignment for 9 classes.  15% of grade

2) Presentation.  Students will give group presentations later in the semester, in which they synthesize material from a number of papers.  We will settle on the exact format of these presentation in class.   15% of grade

3) Midterm and Final.  These will be based on material from the lectures.  50% of grade

4) Project.  Students will implement some recognition algorithms and test them on an appropriate data set.  Each student must implement and compare at least two algorithms.  These will generally be algorithms studied in class, but students are free to implement other algorithms, or to devise their own.  Each student should discuss their proposed work with me.  20% of grade

5) Class Participation.  Everyone should read papers before class and contribute to discussion of them.  Extra credit.

Note: visitors or auditors are welcome.  However, if you are attending a class in which we will discuss papers, you should complete a report on one of these papers (see requirement 1).

Class Schedule

None of this schedule is written in stone.  Feel free to suggest other papers or topics you’d like to discuss. 

Date

Topic

Background Reading

 1. 1/28

 Introduction

S. Laurence and E. Margolis, ``Concepts and Cognitive Science'', in Concepts edited by E. Margolis and S. Laurence, MIT Press, 1999.

Concepts.  The Stanford Encyclopedia of Philosophy.

L. Wittgenstein, Philosophical Investigations, sections 65-78.

Template Matching

Search in pose space (gradient descent, Hough Transform, chamfer matching...), correspondence space (interpretation trees, alignment) and their relationship

 

Many topics are discussed at CVOnline.  For example, Interpretation Tree search.  Or check Ballard and Brown on-line (example, the Hough Transform).

 

T. Cass "Polynomial Time Geometric Matching for Object Recognition," IJCV 1997.

 

The Hough Transform is described in Forsyth and Ponce 15.1 and 18.3. 

 

Interpretation tree search is described in Trucco and Verri 10.2.

 2. 2/4

 3D  Geometry

Affine and projective transforms.  Affine invariants.  Cross ratio. Aspect graphs.

 Introduction to Projective Geometry, C.R. Wylie, McGraw-Hill Book Co.,  1970.

 

Y. Lamdan, J. T. Schwartz, and H. J. Wolfson. Affine invariant model-based object recognition. IEEE Journal of Robotics and Automation, 6:578--589, 1990

 

I. Weiss. Geometric Invariants and Object Recognition. Intl. J. Computer Vision, 10:207--231, 1993

 

J. Burns, R. Weiss, and E. Riseman, ``The Non-Existence of General-Case View-Invariants’’, in Geometric Invariance for Computer Vision, edited by

 

J. Mundy and A. Zisserman, Appendix – Projective Geometry for Machine Vision, in Geometric Invariance for Computer Vision, edited by J. Mundy and A. Zisserman, MIT Press, 1992. (On reserve)

Discussion: Visual Classification

 

E. Rosch, C. Mervis, W. Gray, D. Johnson, and P. Boyes-Braem, ``Basic Objects in Natural Categories'', Cognitive Psychology, 8:382--439.  Available from me.

 

Biederman, I. (1987). Recognition--by--components: A theory of human image understanding. Psychological Review, 94(2):115--147.  Available through PsycARTICLES through University Library.

Question: Which of these papers (if either) do you think is more relevant to computer vision?  Why?

 

 3. 2/11

Linear Subspaces

PCA.  LDA.  Linear combinations of views.

Duda, Hart and Stork, pp. 114-121. 

 

Shimon Ullman and Ronen Basri, Recognition by Linear Combinations of Models, IEEE Transactions on Pattern Analysis and Machine Intelligence, 13(10): 992-1006, 1991.Available at: http://www.wisdom.weizmann.ac.il/~ronen/publications.html

Discussion: Linear Subspaces in Vision

 

Eigenfaces vs. Fisherfaces: Recognition Using Class Specific Linear Projection
PN Belhumeur, JP Hespanha, DJ Kriegman

 

Face Recognition Based on Fitting a 3D Morphable Model”, by Blanz and Vetter. 

 

Question: Both of these methods are demonstrated on faces.  What other classes of objects (if any) are they appropriate to?  For example, could they be used effectively to identify animal species?  Chairs?  Motorcycles?  Justify your answer.

Turk, M. & Pentland, A. (1991). Eigenfaces for recognition. Journal of Cognitive Neuroscience, 3, 71-86

 

Moghaddam, Jebara and Pentland, “Bayesian Face Recognition.” MERL TR 2000-42.

 

Discriminant analysis of principal components for face recognition
W Zhao, A Krishnaswamy, R Chellappa, D. Swets, J. Weng.

 

T.F. Cootes and C.J. Taylor, "Statistical models of appearance for medical image analysis and computer vision", Proc. SPIE Medical Imaging 2001. 

 

Lohmann, G.P. 1983.  Eigenshape analysis of microfossils: a general morphometric procedure for describing changes in shape.  Mathematical Geology 15:659-672.

 4. 2/18

Lighting

Image normalization: normalized correlation, direction of gradient, histogram equalization.

3D modeling: Lambertian reflectance, spherical harmonics.

Shashua. On photometric issues to feature-based object recognition. Int. J. Computer Vision, 21:99-- 122, 1997.

 

``Lambertian Reflectance and Linear Subspaces,'' IEEE Trans. on Pattern Analysis and Machine Intelligence, 25(2):218-233, (2003).  R. Basri and D. Jacobs.  Available at: http://www.wisdom.weizmann.ac.il/~ronen/publications.html

 

``In Search of Illumination Invariants,'' IEEE Conference on Computer Vision and Pattern Recognition, pp.~{254--261}, (June 2000).  H. Chen, P. Belhumeur, and D. Jacobs.

 Discussion:

Kumar, Savvides and Xie.  Correlation pattern recognition for face recognition.

Wang, Li and Wang.  Face recognition under varying lighting conditions using self quotient image.

Zhang and Samaras.  Face recognition from a single training image under arbitrary unknown lighting using spherical harmonics.

Question: Consider the following question for (at least) two of the above papers. Which of these methods do you think performs face recognition best in the presence of lighting variation?  Which do you think has the most potential for improved performance?  Why?

 

 5. 2/25

Shape Space and Image Manifolds:

 

Kendall's shape space.  Image manifold approach of Trouv and Younes.

 D. G. Kendall. A survey of the statistical theory of shape. Statistical Science, 4(2):87120, 1989

 

Shape and Shape Theory, by Kendall, Barden, Carne and Le

 

Metamophosis Through Lie Group Action, by Trouv and Younes, Foundations of Computational Mathematics, 2004.

Discussion: Statistics in the tangent space

 

Veeraraghavan, Roy-Chowdhury, and Chellappa.  Matching Shape Sequences in Video with Applications in Human Movement Analysis.

Durrleman, Pennec, Trouv, Thompson and Ayache.  Inferring Brain Variability from Diffeormorphic Deformations of Currents: an Integrative Approach

 

Question: Both these papers do shape analysis in the tangent space to an image manifold.  What are the advantages and  limitations of such an approach?

 F. James Rohlf.  Shape Statistics: Procrustes Superimpositions and Tangent Spaces

 

 6. 3/4

Deformable Objects:

Elastic matching and edit distances.  Using finite basis to represent deformations (thin-plate splines).  Earth mover's distance.

Morphometric tools for Landmark data, by Bookstein

Principal Warps: Thin-Plate Splines and the Decomposition of Deformations, by F. Bookstein, PAMI 1989, Vol 11, No 6.

 

Statistical Shape Analysis by I. L. Dryden and Kanti V. Mardia

Geometric Morphometrics: Ten Years of Progress Following the ‘Revolution’ Dean C. Adams, F. James Rohlf , and Dennis E. Slice. 

Discussion: Modeling deformations

 

D’arcy Thompson, On Growth and Form, Dover Books, 1992, Chapters 1 and 17.  Available from me.

 

Serge Belongie, Jitendra Malik and Jan Puzicha Shape Matching and Object Recognition Using Shape Contexts PAMI, 24(4):509-522, April 2002.   

 

Zhu and Yuille.  FORMS: A Flexible Object Recognition and Modelling System.

Question: Which computer vision paper do you think does a better job of instantiating the research program laid out by D'arcy Thompson?  Why?

 

 7. 3/11

 Midterm

 

 Discussion of midterm

 

 8. 3/25

 Nonlinear Manifolds

Saul and Roweis: Think Globally, Fit Locally, Unsupervised Learning of Nonlinear Manifolds (U. Penn. Tech Report CIS-02-18).

 

Joshua B. Tenenbaum, Vin de Silva, John C. Langford, ``A Global Geometric Framework for Nonlinear Dimensionality Reduction’’, Science.

 

Sam T. Roweis, Lawrence K. Saul, Nonlinear Dimensionality Reduction by Locally Linear Embedding, Science, Science.

Discussion: Nonlinear manifolds 

 

Saul and Roweis: Think Globally, Fit Locally, Unsupervised Learning of Nonlinear Manifolds (U. Penn. Tech Report CIS-02-18).

 

He, Yan, Hu, Niyogi,  and Zhang.  Face Recognition Using Laplacianfaces.

Weinberger and Saul.  Unsupervised Learning of Image Manifolds by Semidefinite Programming.

 

Question: Read Saul and Roweis, and one of the other two papers.  In what way does the second paper improve on Saul and Roweis?  Is this improvement signficant?  Why?

 

 9. 4/1

Feature Descriptors:

 

Gabor Jets.  SIFT.  Histogram of gradients.  MSER.  Harris Corner Detection.

David G. Lowe, "Distinctive image features from scale-invariant keypoints," International Journal of Computer Vision, 60, 2 (2004), pp. 91-110.

 

M. Lades, J.C. Vorbruggen, J. Buhmann, J. Lange, C. von der Malsburg, R.P. Wurtz, W. Konen. Distortion Invariant Object Recognition in the Dynamik Link Architecture. IEEE Transactions on Computers 1992, 42(3):300-311.

Histograms of Oriented Gradients for Human Detection, by N.Dalal, B.Triggs.  CVPR 2005  [pdf] 

Discussion: Using features

 

Scalable Recognition with a Vocabulary Tree, by D. Nister and H. Stewenius, CVPR 2006.  [pdf]

 

V. Ferrari, L. Fevrier, F. Jurie, and C. Schmid.  Groups of Adjacent Contour Segments for Object Detection  

E. Nowak, F. Jurie, and B. Triggs.  Sampling Strategies for Bag of Features Image Classification.  [pdf]

Question: Based on these three papers, what do you think are the strengths and limitations of Bag of Features approaches to recognition?  How much further can this approach be pushed?

 

 10. 4/8

Graphical Models

 

Hidden Markov Models.  Belief Nets.

Rabiner, "A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition.  

Discussion: Graphical models in vision 

 

J. Yamato, J. Ohya, and K. Ishii, “Recognizing Human Action in Time-Sequential Images Using Hidden Markov Model,” CVPR ’92, pages 379-385.

 

Crandall, Felzenszwalb and Huttenlocher.  Object Recognition by Combining Appearance and Geometry

 

Question: Both of the approaches described in these papers rely on making assumptions about conditional independence.  For what types of recognition problems do you think these assumptions will be appropriate?  When we they be inappropriate?

 

 11. 4/15

Linear Separators:

Naive Bayes, Perceptrons, SVMs.

 C. Burges, A Tutorial on Support Vector Machines for Pattern Recognition

```Support-Vector Networks,'' Machine Learning 20, 273--297, 1995, Cortes and Vapnik.

Pattern Classification, Duda, Hart and Stork.

Adaboost Tutorial by Freund & Schapire:

Additive Logistic Regression: a Statistical View of Boosting:

Discussion: Linear classifiers in vision

 

P. Viola and M. Jones. Robust real-time object detection. Technical Report 2001/01, Compaq CRL, February 2001

 

Kumar, Belhumeur and Nayar.  FaceTracer: a search engine for large collections of images with faces.

 

O. Boiman, E. Shechtman and M. Irani. In Defense of Nearest-Neighbor Based Image Classification.

 

Question: Most machine learning techniques are not developed specifically for computer vision.  What are the main challenges that you see addressed in these papers when it comes to adapting these learning methods for computer vision applications?

 

 12. 4/22

Student Presentations

 Most approaches to recognition that we have looked at in class are 2D, involving image comparison, pattern recognition, or machine learning that is not explicitly based on an understanding of 3D, including 3D effects of geometry, lighting, and shape.

PRO: Future research must incorporate explicit 3D knowledge to be effective.  Thuan Huynh, Phil Huynh, Daozheng Chen

CON: Explicit 3D understanding is not necessary to future progress in object recognition and classification.  Fatemeh Mir Rashed, Koyel Mukherjee, Jayant Kumar

Discussion: Parts

 

Y.Jin and S.Geman. Context and hierarchy in a probabilistic image model.

 

P. Felzenszwalb, D. McAllester and D. Ramanan. A Discriminatively Trained, Multiscale, Deformable Part Model.

 

Y.Amit and A.Trouve. POP: Patchwork of parts models for object recognition

 

Question: Which of these methods offers the most promise for handling objects with parts?  How do you think this compares with other methods for handling parts that we've discussed in class?

 

 13. 4/29

Student Presentations

 Suppose we have a ten year research goal to build a mobile robot that can recognize a complex class of household objects, such as a chair.  Choose one of the recognition approaches discussed in class, and argue that this is the best approach to try to extend in order to reach this goal.

TEAM 1: Sima Taheri, Radu Dondera, Kate McBryan

TEAM 2: Jacob Devlin, Jun-Cheng Chen, Douglas Summers-Stay

Unsupervised Learning and Discovery. (thanks to: Kristin Grauman's class:)

 

Discovering Objects and Their Location in Images, by J. Sivic, B. Russell, A. Efros, A. Zisserman, and W. Freeman, ICCV 2005.  [pdf]

 

Unsupervised Discovery of Action Classes, by Y. Wang, H. Jiang, M. Drew, Z-N. Li and G. Mori, CVPR 2006. [pdf]

 

Detecting Irregularities in Images and in Video, by O. Boiman, M. Irani, ICCV 2005. [pdf]

 

Question: Based on the readings today, but also on all you've learned this semester, What is a Visual Class?

 

14. 5/6 Student Presentations Perhaps the most popular current approach to classification involves using local descriptors, such as SIFT.  In order to advance the state of the art in this area, it is most important that we:
1) Develop better descriptors; Mohammed Eslami, Marco Adelfio, John Karvounis.
2) Develop better ways of using these descriptors.  Anne Jorstad, Nitesh Shroff, Joao Soares.
Conclusions  

 Exam 5/16

 Exam is in the regular classroom, 1:30-3:30.