I'd like to make a few comments on problem 1 in the advanced part of problem
set 3, the implementation of E-M.

1) Least Squares vs. Total Least Squares.

There are two ways to fit a line to points, called Least Squares and Total
Least Squares.  Total least squares is much better in general, because it
minimizes the distance from points to the line.  Least Squares is a little
simpler, but only minimizes the distance in the y direction between points
and lines.  It works well for horizontal lines and increasingly poorly as
lines are more vertical.

The problem set doesn't specify which method to use, and it seems with this
data that either method will work.  Therefore, we will accept solutions
using least squares, but give extra credit to solutions using total least
squares.  (If you implement least squares and think it's as good as total
least squares, try doing the problem after swapping the x and y coordinates
of all the points in the test set).

Weiss' tutorial gives a correct set of equations for least squares.  The
book gives an INCORRECT equation for least squares (but a correct equation
for total least squares).  So if using least squares, look at Weiss.  If
using total least squares, look at the book.  Since the book is a bit
sketchy, I'm including more complete notes below on how total least squares
works.

2) Sigma

The problem set implies that you should fix sigma^2 at .1, ie., fix sigma at
about .3.  If you do this, things should work.  However, it is preferable to
update sigma at each iteration by estimating it from the data, as was done
in the tutorial code.  Either approach is acceptable as long as you can get
it to work.

3) Initializing your lines.

Of course, you should not cheat by initializing lines using prior knowledge
about the right answer.  But it is ok to initialize lines that are at least
near all the points, since this you know.  So, for example, it makes sense
to initialize lines with an arbitrary, random slope, but with a random
y-intercept that places the lines somewhere near the complete set of points.


Notes on Total Least Squares

I will use matlab notation, since it's easier to type.  I am just slightly
expanding on what the book says in the top half of page 335.

You want to find a line, described by (a,b,c) that minimizes the sum of
squared distance to all points.  If your line is described as ax + by + c =
0, with a^2+b^2 =1, then the distance from (x,y) to the line is:
abs(ax+by+c).  So you want to find (a,b,c) to minimize:

sum( (ax + by + c).^2) subject to the constraint a^2+b^2=1.

Constrained minimization problems like this can be solved with Langrange
multipliers.  That is, we introduce a dummy variable L, and minimize:

sum( (ax + by + c).^2) + L(a^2 + b^2 - 1).

This will achieve a minima when the constraint is satisfied.

Taking partial derivatives with respect to a,b,c,L, we find that the
following four equations must be satisfied:

(1) a^2 + b^2 - 1 = 0 (partial w/ respect to L)
(2) sum(2x.*(ax + by + c)) + 2aL = 0  (partial with respect to a)
(3) sum(2y.*(ax + by + c)) + 2bL = 0   (partial with respect to b).
(4) sum(2*(ax + by + c)) = 0  (partial with respect to c).

The last three equations correspond to the matrix equation given in the
book, top of page 335.

From (4) we get c = - sum(ax + by)/n (where there are n points), as the book
states.

Then, substituting this expression for c into equations (2) and (3) we get
the second matrix equation on page 335.

To use this approach to find a weighted least squares line, we start with
the minimization: 

sum( w.*(ax + by + c).^2) subject to the constraint a^2+b^2=1.

Then follow the same steps as above.  For example, equation (4) becomes:

(4w) sum(2*w.*(ax + by + c)) = 0  (partial with respect to c).

Note, for those of you familiar with SVD or PCA, the derivation above leads
to PCA in 2D.  That is, another way to look at this is to compute the mean x
and y values and subtract these from each point.  Then run SVD on the
resulting 2xn matrix (call it P), which is equivalent to finding the
eigenvectors of PP^T.  This is the eigenvector equation in the book.