Review for Final
CMSC 426 – Spring 2004
Perspective: The key to perspective projection is that the scene point, the image point, and the focal point are all collinear, while at the same time the image point is always in the image plane. Using these facts, you can find the image point given all other information (projection). Or you can find the line where the scene point must lie, given an image point and camera information. Or you can project this line into another camera to get the epipolar line. Or you can intersect two lines, produced by seeing a point in two cameras, to perform stereo reconstruction. Or you can find the line a scene point lies on, and intersect this line with a new image plane, to do rectification.
Orthographic: On the one hand, this is like perspective, with the depth of the points approximated to be all the same. However, this leads us to represent projection with matrix operations. The key issues here are understanding how to represent a point as a vector, or a set of points as a matrix, how to represent the motion and scaled orthographic projection of points as another matrix, and how to produce an image by multiplying these matrices. This gives the image formation equation I = SP. Then we can solve for I (projection), for S (pose determination), for P (structure from known motion), or for S and P (structure from motion) up to an affine ambiguity.
1) If using perspective projection, how can you tell whether a line in the world will project to a single point in the image?
2) Consider a camera in the standard position for perspective, with the image plane the z=1 plane, and the focal point at the origin. Consider a square with corners at: (0,0,2), (1,0,2), (0,0,3), (1,0,3). Give its image with perspective projection. What about with scaled orthographic projection? Prove that parallel lines in the world don’t necessarily project to parallel image lines. Suppose we extend the two parallel lines going through (0,0,2) to (0,0,3) and through (1,0,2) to (1,0,3) so that they are infinite. What will be their vanishing point?
3) Suppose there is a 2D rotation, translation, and scaling that moves the point (1,0) to (3,3), and that also moves the point (2,0) to (4,4). What is the matrix that encodes this transformation?
4) Suppose we have a camera with a focal point at (7,2,5) and an image plane of z = 10. We see a point at (12,9,10). What is the equation for a line in the world that contains this point? Suppose we have another camera with a focal point of (17, 3, 2), and an image plane of z = 6. What is the epipolar line in this image plane where the image of the point must appear? Pick a point that is in this line. If we see the point appear there, what is its 3D position?
5) Start with points, P, that are the corners of a cube. Construct a motion matrix,S, that will rotate these points by pi/8 about the x axis and then pi/4 about the y axis, translate the points by (1, -1), and project them into the image, scaling by 1/2 (use matlab to help if you want). Compute I = S*P.
3 1 1 1 3
1 3 0 2 2
4 4 1 3 5
2 2 2 2 1
3 1 1 1 3
1 3 0 2 2
2 2 2 2 1
A second theme of the class has been segmenting images into meaningful pieces. For example, we worried about finding the boundaries between two parts of the image using edge detection. We used SNAKEs (intelligent scissors) to link pixels with strong gradients into contours. And we used K-means, Hough, and RANSAC to find meaningful groups of features or pixels in the image.
Edge detection involves smoothing to get rid of noise, taking derivatives to measure change, and finding local maxima. To understand edge detection you must understand how to use correlation to smooth an image, you must understand gradients, and you must understand that an edge is a local maxima in the magnitude of a gradient in the direction perpendicular to the gradient. To understand SNAKEs, you must also understand how gradient magnitude measures the strength of a possible edge.
1) Compute the convolution of the 1D image: (0,0,1,1,2,2,3,3,4,4,4,4) with the convolution kernel: (1/4,1/2,1/4). Treat the boundary in any reasonable way.
2) Consider the following 1D image:
(0 0 …. 0 3 3 3 3 3 3 3 3 3 3 3 3 3 3 0 0 0 10 10 10 0 0 0 0 …)
Where would you expect a 1D edge detector to find edges?
The 1D edge detector has a parameter that allows it to ignore weak edges. As we increase this parameter, which edges disappear first?
If we smooth the image before detecting edges, how would you expect this to affect the position and number of the edges?
3) Suppose we have points: (19,3) (13,6), (18,11) (5,3), (5,9) (2, 1). Simulate the steps of k-means clustering if we start with centers at (9,9) and (11,11).
4) Here’s a RANSAC problem for recognition. You could make up a similar problem to apply RANSAC to line finding. Consider a model consisting of three points, with x,y coordinates given by the columns of:
0 20 10
0 0 10
and an image consisting of the points:
4 22 12 16 10 3
0 1 12 19 18 20
Simulate one step of RANSAC with bounded error of 3 pixels to recognize the model, allowing for 2D translation, rotation, and scaling. That is, match enough features of the model and image to determine 2D rotation, translation, and scaling. Project the remaining model features into the image using this transformation. Where do they appear? Look for matching image features within a distance of 3 pixels. Do you find them? For this example, how many different poses can be generated by matching model features to image features? How many involve correct matches between model points and their corresponding image points, assuming that each model point appears in the image.
5) Explain how to use the Hough transform to detect circles of radius 20 pixels in an image.
Finally, we talked about matching, in many different guises. We used matching to do texture synthesis, to find correspondences for stereo, to find correspondences for structure-from-motion, and to recognize objects. Mostly we talked about matching one portion of an image to another using the sum of squared differences. You should certainly understand this means of comparing two image regions. We also talked about the Chamfer distance as a way of comparing two sets of edges. Then we talked about many ways of finding matches efficiently. In texture synthesis and stereo we used brute force matching (that is, we tried all possibilities and picked the best one). In stereo, where we had the epipolar constraint, we could use a shortest path algorithm to do matching. For motion, we talked about multi-scale matching to speed things up, and about using image gradients to compute linear equations that described possible matches (optical flow). In recognition, we used the distance transform and RANSAC to speed up matching.
1) I is an image described by the equation I(x,y) = x*x + y*y. J is the same image translated 1 pixel in the positive x direction. Suppose I and J are consecutive images in a motion sequence. Write the optical flow equation in its general form. Using this equation, give the equation for a line in the second image on which we expect to find the point that was at (5,5) in the first image. Perform the same computation for a point at (5,0). Combine these two equations, and show that this yields the translation of the image.
2) Compute the distance transform of the following image (1 means an edge is present):
0 0 0 0 0 0 0 0
0 0 1 0 0 0 0 0
0 0 0 1 0 0 0 0
0 0 0 1 0 0 0 0
0 0 0 1 0 1 0 0
0 0 1 0 0 1 0 0
Use this to find the best match to the model:
1 1 0 0
0 1 0 0
0 1 0 0
0 1 0 0
How many operations are needed to do this comparison, not counting the operations needed to build the distance transform? Write Matlab code to do this comparison using imfilter.
3) Solve the matching problem in (2) using a multiscale approach and SSD instead of the distance transform. Reduce each image in half by smoothing and subsampling (smooth with an averaging filter, to make things simpler). Find the best match by trying all possibilities. Then go back to the original image and try possibilities near that.