CMSC828J Advanced Topics in Information Processing: Approaches to Representing and Recognizing Objects

General Information

 

 

Class Time  

Tue, Thu 11:00-12:15

Room

CSI 2107

Course Info

See below

Text

Readings available on reserve in CS library and on web.  See below 

Personnel

 

Instructor

 

Name

David Jacobs

 

Email

djacobs at cs dot umd dot edu

 

Office

AVW 4421

 

Office hours

Tue 2:00-3:00 or by appt. (or just drop by).

 

 

Announcements

MIDTERM

 

The take-home midterm will be handed out in class on 3/28 and due a week later, at the start of class on 4/4.

 

There will be a review for the Final on Friday, May 12 at 2:00 in AVW 4424.

 

Practice Final

Description

One of the most basic problems in vision is to use images to recognize that a particular object or event that we’ve never seen before belongs to a particular class of objects or events.   To do this we must have a rich notion of what an object is, that can capture what is common in them.  For example, chairs vary tremendously in their shape and material properties.  How do we look at a chair we’ve never seen before and identify it as a chair? Accounting for this variation in recognition is largely an unsolved problem.  In this course we will survey a number of approaches to representing and recognizing objects.  We will draw inspiration by looking at work from philosophy, psychology, linguistics, and mathematics.  However, our primary focus will be more concrete, to learn the algorithms and analytic tools that have been applied in visual object classification.

First we will study approaches based on the idea that objects can be described by a set of necessary and sufficient image properties.  This has taken the form of invariant representations.  We will study the advantages and limitations of geometric and photometric invariants.  Second we will consider methods that attempt to represent the images of classes of objects using subspaces.  This includes approaches based on PCA, linear combinations, and manifold representations of classes.  Third, we will consider classification approaches based on prototypes and exemplars.  This includes methods of template matching, and also methods based on more powerful, non-rigid matching approaches inspired by morphometrics.      Fourth, we will look at approaches that build representations of objects based on their parts.  Fifth, we will look at approaches to building generative models of classes, including the use of hidden markov models.  Finally, we will consider the idea of building classifiers directly, without explicit representations of the class, using methods such as support vector machines, Winnow, naïve Bayes, and Adaboost.  

The class will alternate between lectures teaching the basic mathematical and algorithmic techniques of these methods, and discussion of vision research papers that apply these techniques.  It will be essential for students to have a solid understanding of basic topics in math, such as linear algebra, probability and statistics, and calculus.  It will also be useful to have some knowledge of computer vision, image processing, functional analysis, stochastic processes, or geometry.  In general, the more math a student knows, the easier the course will be.

Requirements

Here is my current plan for the workload of the class. 

1) Reports.  There are about 16 classes scheduled for the presentation of papers.  Prior to each of these classes, students must turn in a one page summary and critique of one of the papers to be discussed.  Late papers will not be accepted, since the goal of these reports is to get you to think about papers before we discuss them.  However, each student need not turn in a report when they are giving a presentation, and may also skip two other classes.  15% of grade

2) Presentation.  Students will be assigned in pairs to present a set of papers and to lead discussion for one class.  This will be a substantial part of the grade.  Students will be given a topic area, and some ideas for papers in this area.  They will formulate a set of about two key papers, and about four more important papers.  They will then synthesize the ideas in these papers, and present a review of the topic.  Presentations should be well prepared.  Students should arrange to meet with me before their presentation to go over it.  15% of grade

3) Midterm and Final.  These will be based on material from the lectures.  50% of grade

4) Project.  Student will choose one:  20% of grade

     a) Write a detailed, paper, approximately five pages in length, proposing research that extends or adapts one of the approaches discussed in class.  You may choose to base this on the papers you have presented. 

     b) Programming project: student will implement a technique, possibly one discussed in class, and apply it to some real data.  This is not meant to be a research project, but something closer to an extended problem set.  However, I will be flexible about the nature of any independent project, and students are encouraged to explore new research ideas as well.  

Projects should be discussed with me.

5) Class Participation.  Everyone should read papers before class and contribute to discussion of them.  Extra credit.

Note: visitors or auditors are welcome.  However, if you are attending a class in which we will discuss papers, you should complete a report on one of these papers (see requirement 1).

Class Schedule

The schedule below is probably overly ambitious, so expect that we won't get to a couple of these classes.  Pairs of students will lead the discussion of papers, as indicated.  There will probably not be enough students to lead all these classes, so I will lead discussion for any extras.

None of this schedule is written in stone.  Feel free to suggest other papers or topics you’d like to discuss. 

Class

Presenters

Topic

Background Reading

1. 1/26

Jacobs

Introduction

 

2. 1/31

Jacobs

Paper Presentation:  

Students must review (a) or (b).

a. Women, Fire and Dangerous Things by Lakoff, Chapters 1 and 2.  On reserve.

b. S. Laurence and E. Margolis, ``Concepts and Cognitive Science'', in Concepts edited by E. Margolis and S. Laurence, MIT Press, 1999.  On reserve.

c. L. Wittgenstein, Philosophical Investigations, sections 65-78.  On reserve.

 

3. 2/2

 

Jacobs

Lecture: Affine and projective geometry and invariants.

Introduction to Projective Geometry, C.R. Wylie, McGraw-Hill Book Co.,  1970. (on reserve).

Y. Lamdan, J. T. Schwartz, and H. J. Wolfson. Affine invariant model-based object recognition. IEEE Journal of Robotics and Automation, 6:578--589, 1990

I. Weiss. Geometric Invariants and Object Recognition. Intl. J. Computer Vision, 10:207--231, 1993

J. Burns, R. Weiss, and E. Riseman, ``The Non-Existence of General-Case View-Invariants’’, in Geometric Invariance for Computer Vision, edited by J. Mundy and A. Zisserman, MIT Press, 1992  (on reserve)

Moses, Y. and Ullman, S. (1992). ``Limitations of non model-based recognition schemes’’. In Sandini, G., editor, Proc. 2nd European Conf. on Computer Vision, Lecture Notes in Computer Science, volume 588, pages 820--828. Springer Verlag.

J. Mundy and A. Zisserman, Appendix – Projective Geometry for Machine Vision, in Geometric Invariance for Computer Vision, edited by J. Mundy and A. Zisserman, MIT Press, 1992. (On reserve)

``In Search of Illumination Invariants,'' IEEE Conference on Computer Vision and Pattern Recognition, pp.~{254--261}, (June 2000).  H. Chen, P. Belhumeur, and D. Jacobs.

4. 2/7

Jacobs

Lecture: Geometric invariance (conclusion) and photometric invariance.

5. 2/9

Jacobs

Presentation, David Jacobs: Classification with Invariants.

a. D. Lowe: Three-Dimensional Object Recognition from Single Two-Dimensional Images.  Artificial Intelligence, 1987.

b. Biederman, I. (1987). Recognition--by--components: A theory of human image understanding. Psychological Review, 94(2):115--147.  On reserve.  Also available through PsycARTICLES through University Library.

Jepson, A.,  W. Richards, and D. Knill, Modal structure and reliable inference, in ``Perception as Bayesian Inference," eds. D. Knill and W. Richards, Cambridge Univ. Press, 1996, pp. 63-92.
Description: This is Chapter 2 from ``Perception as Bayesian Inference." 

Biederman has many papers on this topic, including:

 Biederman, I. Gerhardstein, P. (1993). Recognizing depth-rotated objects: Evidence and conditions for three-dimensional viewpoint invariance." Journal of Experimental Psychology: Human Perception and Performance, 19, 1162-1182.

 Biederman, I. Gerhardstein, P. (1995). Viewpoint-dependent mechanisms in visual object recognition: reply to Tarr and Bulthoff Journal of Experimental Psychology: Human Perception and Performance, 21(6), 1506-1514.

D. Jacobs.  ``What Makes Viewpoint Invariant Properties Perceptually Salient,'' Journal of the Optical Society of America A, in press. 

6. 2/14

Jacobs

Lecture: Linear subspaces – geometry, PCA, LDA

Duda, Hart and Stork, pp. 114-117.  On reserve in library.

Shimon Ullman and Ronen Basri, Recognition by Linear Combinations of Models, IEEE Transactions on Pattern Analysis and Machine Intelligence, 13(10): 992-1006, 1991.Available at: http://www.wisdom.weizmann.ac.il/~ronen/publications.html

 D. Jacobs "Matching 3-D Models to 2-D Images," the International Journal of Computer Vision, (21)1/2:123--153, January, 1997.  On reserve.

Turk, M. & Pentland, A. (1991). Eigenfaces for recognition. Journal of Cognitive Neuroscience, 3, 71-86

 

7. 2/16

Jacobs

Lecture: Linear subspaces – photometry

Shashua. On photometric issues to feature-based object recognition. Int. J. Computer Vision, 21:99-- 122, 1997.

``Lambertian Reflectance and Linear Subspaces,'' IEEE Trans. on Pattern Analysis and Machine Intelligence, 25(2):218-233, (2003).  R. Basri and D. Jacobs.  Available at: http://www.wisdom.weizmann.ac.il/~ronen/publications.html

 

8. 2/21

Jacobs

Presentation:

 

 

Eigenfaces vs. Fisherfaces: Recognition Using Class Specific Linear Projection
PN Belhumeur, JP Hespanha, DJ Kriegman

 

Moghaddam, Jebara and Pentland, “Bayesian Face Recognition.” MERL TR 2000-42.

 

Discriminant analysis of principal components for face recognition
W Zhao, A Krishnaswamy, R Chellappa, D. Swets, J. Weng.

 

 Turk, M. & Pentland, A. (1991). Eigenfaces for recognition. Journal of Cognitive Neuroscience, 3, 71-86.

9. 2/23

Jacobs 

Presentations: Linear Representations of Classes.

 

Starting point might be the two papers:

T.F. Cootes and C.J. Taylor, "Statistical models of appearance for medical image analysis and computer vision", Proc. SPIE Medical Imaging 2001.  Presentation. (from two years ago)

 “Face Recognition Based on 3D Shape Estimation from Single Images”, by Blanz and Vetter.  Presentation. (from two years ago).

``Statistical models of appearance for computer vision'' by Cootes and  Taylor

Lohmann, G.P. 1983.  Eigenshape analysis of microfossils: a general morphometric procedure for describing changes in shape.  Mathematical Geology 15:659-672.

10. 2/28

Student group

Presentations: Non-linear subspaces

 

All students should write a summary of the following paper:

 

Saul and Roweis: Think Globally, Fit Locally, Unsupervised Learning of Nonlinear Manifolds (U. Penn. Tech Report CIS-02-18).

 

Students should also read the following two papers:

Joshua B. Tenenbaum, Vin de Silva, John C. Langford, ``A Global Geometric Framework for Nonlinear Dimensionality Reduction’’, Science.

Sam T. Roweis, Lawrence K. Saul, Nonlinear Dimensionality Reduction by Locally Linear Embedding, Science, Science.

H. Sebastian Seung and Daniel D. Lee The Manifold Ways of Perception, Science.

H. Murase and S.K. Nayar. Visual Learning and Recognition of 3D Objects from Appearance. International Journal of Computer Vision, vol. 14, no. 1, Jan 1995, pp 5-24.

Ronen Basri, Dan Roth, and David Jacobs, Clustering Appearances of 3D Objects, IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), Santa Barbara: 414-420, 1998 

Cutzu, F., and S. Edelman, Representation of object similarity in human vision: psychophysics and a computational model, Vision Research 38:2227-2257, 1998

11. 3/2

Jacobs

Presentation: Prototypes and Natural Categories.

Posner, M.I., & Keele, S.W. (1968). On the genesis of abstract ideas. Journal of Experimental Psychology, 77, 353-363.   Available at: http://step.psy.cmu.edu/scripts/Memory/PosnerKeele1971.html

E. Rosch, C. Mervis, W. Gray, D. Johnson, and P. Boyes-Braem, ``Basic Objects in Natural Categories'', Cognitive Psychology, 8:382--439.  On reserve.

S. Ullman, High-level Vision, MIT Press, 1996, Chapter 6.

Ronen Basri, Recognition by Prototypes, International Journal of Computer Vision, 19(2): 147-168, 1996.

 

12. 3/7

Zhu

Presentations: The psychology of similarity and view-based recognition.

H. Bulthoff, S. Edelman, and M. Tarr, ``How Are Three-Dimensional Objects Represented in the Brain?'' MIT AI Memo #1479.

Sinha, P., Balas, B.J., Ostrovsky, Y., & Russell, R. ``Face recognition by humans: twenty results all computer vision researchers should know about.’’

Check out Mike Tarr's class page for many relevant references.

Poggio, T. and Edelman, S., A network that learns to recognize three-dimensional objects.  Nature, 343:363-266.

Liu Z, Knill D C, and Kersten D. Object classification for human and ideal observers. Vision Research, 35:549--568, 1995

E. Goldmeier, Similarity in Visually Perceived Forms, International Universities Press, Psychological Issues, Volume VIII, Number 1, Monograph 29.  Chapters: 1-5.  On reserve.

 

13. 3/9

Jacobs

Lecture: Template matching

Search in pose space (gradient descent, Hough Transform, chamfer matching...), correspondence space (interpretation trees, alignment) and their relationship

 T. Cass "Polynomial Time Geometric Matching for Object Recognition," IJCV 1997, on reserve in library.

 

The Hough Transform is described in Forsyth and Ponce 15.1 and 18.3. 

 

Interpretation tree search is described in Trucco and Verri 10.2.

14. 3/14

Jacobs

Presentations, mixed with lecture: Shape and Nature.  Students must review.    

 D’arcy Thompson, On Growth and Form, Dover Books, 1992, Chapters 1 and 17.  On reserve.

 

Morphometric tools for Landmark data, by Bookstein

Principal Warps: Thin-Plate Splines and the Decomposition of Deformations, by F. Bookstein, PAMI 1989, Vol 11, No 6.

 D. G. Kendall. A survey of the statistical theory of shape. Statistical Science, 4(2):87120, 1989

Shape and Shape Theory, by Kendall, Barden, Carne and Le

Statistical Shape Analysis by I. L. Dryden and Kanti V. Mardia

Geometric Morphometrics: Ten Years of Progress Following the ‘Revolution’ Dean C. Adams, F. James Rohlf , and Dennis E. Slice. 

15. 3/16

Jacobs

Lecture: Morphometrics.

 

16. 3/28

Jacobs

Presentations: Morphometrics and Recognition.

Serge Belongie, Jitendra Malik and Jan Puzicha Shape Matching and Object Recognition Using Shape Contexts PAMI, 24(4):509-522, April 2002.            

Rangarajan, H. Chui, and F. Bookstein, "The softassign procrustes matching algorithm," in Proceedings of ICIAP '97, Lecture Notes in Computer Science, vol. 1310, pp. 29--42, Springer-Verlag, 1997.

 

Dynamic programming based matching

 

17. 3/30

Jacobs

Lecture: Introduction to feature detection, corner detection, scale selection.

Corner detection is described in computer vision texts.  See, for example, Trucco and Verri Sections 4.3 and 8.4.1

For work on scale space, see: "Detecting salient blob-like image structures and their scales with a scale-space primal sketch: A method for focus-of-attention," by Tony Lindeberg, IJCV 1993.

18. 4/4

Ani, Mudit

Feature Descriptors:  Students should review Lowe's paper.

 

C.Schmid & R.Mohr (1997) Local Grayvalue Invariants for Image Retrieval, IEEE Trans. on Pattern Analysis and Machine Intelligence, 19(5), 530-535.  

 

David G. Lowe, "Distinctive image features from scale-invariant keypoints," International Journal of Computer Vision, 60, 2 (2004), pp. 91-110.

 Also see many recent papers.  Some pointers can be found in "A Performance Evaluation of Local Descriptors", by Mikolajczyk and Schmid, PAMI October 2005   pp. 1615-1630

 

M. Lades, J.C. Vorbruggen, J. Buhmann, J. Lange, C. von der Malsburg, R.P. Wurtz, W. Konen. Distortion Invariant Object Recognition in the Dynamik Link Architecture. IEEE Transactions on Computers 1992, 42(3):300-311.

 

19. 4/6

Yi, Mei

 Neuroscience and features:  Please review the paper by Tanaka, and read the others.

These, and many other interesting papers are available at a UCSD class web page.

 

K. Tanaka, “Inferotemporal Cortex and Object Vision,”

 

M. Risenhuber and T. Poggio “Hierarchical Models of Object Recognition in Cortex”

 

Ullman "Visual Features of Intermediate Complexity and their use in classification."

 

J. Johnson and B. Olshausen, “Timecourse of neural signatures of object recognition”.

 

20. 4/11

Ryan, Kenny

Skeletons

K. Siddiqi, A. Shokoufandeh, S. J. Dickinson & S. W. ZuckerShock Graphs and Shape Matching. International Journal of Computer Vision, 35(1), 13-32, 1999.  

 

21. 4/13

Abhinav, Carlos

Constellation methods and statistical part models

Papers by Perona, Zisserman, Forsyth

 

Pedro F. Felzenszwalb and Daniel P. Huttenlocher.
Pictorial Structures for Object Recognition.  

 

22. 4/18

Jacobs

Lecture Hidden Markov models

Rabiner, "A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition.  

R. Dugad and U. Desai, Technical Report: SPANN-96.1, IIT Bombay.  A Tutorial on Hidden Markov Models.  Available at: http://www.ling.gu.se/~leifg/stat02/doc/newhmmtut.pdf

23. 4/20

Narayanan

Presentations: HMMs for classification. 

 

J. Yamato, J. Ohya, and K. Ishii, “Recognizing Human Action in Time-Sequential Images Using Hidden Markov Model,” CVPR ’92, pages 379-385.

 

Amit Kale, Aravind Sundaresan, A. N. Rajagopalan, Naresh P. Cuntoor, Amit K. Roy-Chowdhury, Volker Krüger and Rama Chellappa, "Identification of humans using gait", IEEE transactions on Image Processing, vol. 13, no. 9, september 2004, 1163 - 1173

 

24&25 4/25 & 4/27

Jacobs

Lecture: Linear separators, naive bayes, perceptrons, svms, boosting, winnow.

C. Burges, A Tutorial on Support Vector Machines for Pattern Recognition

``Learning Quickly When Irrelevant Attributes Abound: A New
Linear-Threshold Algorithm,'' Machine Learning 2: 285--318, 1988,
N. Littlestone.

``Support-Vector Networks,'' Machine Learning 20, 273--297, 1995, Cortes and Vapnik.

Pattern Classification, Duda, Hart and Stork.

26. 5/2

Xu, Arun

Presentations: Adaboost based detectors  Students should review:

P. Viola and M. Jones. Robust real-time object detection. Technical Report 2001/01, Compaq CRL, February 2001

 

H. Schneiderman and T. Kanade "Object Detection Using the Statistics of Parts"  International Journal of Computer Vision. 

M-H. Yang, D. Roth and N. Ahuja A SNoW based Face Detector NIPS-12, Dec, 1999

Adaboost Tutorial by Freund & Schapire:

Additive Logistic Regression: a Statistical View of Boosting:

27. 5/4

Lin, Phil

Presentations: context:  Students should review:

Antonio Torralba et al., "Context-based vision system for place and object recognition," ICCV 2003.

Students should also read:

Kobus Barnard, Pinar Duygulu, Nando de Freitas, David Forsyth, David Blei, and Michael I. Jordan, "Matching Words and Pictures", Journal of Machine Learning Research, Vol 3, pp 1107-1135. 2003

 Antonio Torralba and Pawan Sinha, "Statistical context priming for object detection," ICCV 2001.

http://web.mit.edu/torralba/www/iccv2001.pdf

Antonio Torralba, "Contextual priming for object detection," IJCV 2003.
http://www.ai.mit.edu/people/torralba/IJCVobj.pdf

Kevin Murphy et al., "Using the forest to see the trees: a graphical model relating features, objects, and scenes," NIPS 2003.
http://web.mit.edu/torralba/www/nips2003.pdf

Henderson, J.M. and Hollingworth, A. (1999) High-level scene perception. Annu. Rev. Psychol. 50, 243–271
http://www.psychology.uiowa.edu/faculty/hollingworth/documents/AnnualReview.pdf
Chun, M. M. (2000). Contextual cuing of visual attention. Trends in Cognitive Science, 4(5), 170–178.
http://cvcl.mit.edu/IAP05/Chun_2004.pdf

28. 5/9 Haibin Ling Guest Lecture on deformation insensitive image matching.
29. 5/11 Jacobs Final Class.  Sum up everything, and discuss the final.

Saturday, May 13, 8:00

Final Exam.  In the normal classroom.