I'd like to make a few comments on problem 1 in the advanced part of problem set 3, the implementation of E-M. 1) Least Squares vs. Total Least Squares. There are two ways to fit a line to points, called Least Squares and Total Least Squares. Total least squares is much better in general, because it minimizes the distance from points to the line. Least Squares is a little simpler, but only minimizes the distance in the y direction between points and lines. It works well for horizontal lines and increasingly poorly as lines are more vertical. The problem set doesn't specify which method to use, and it seems with this data that either method will work. Therefore, we will accept solutions using least squares, but give extra credit to solutions using total least squares. (If you implement least squares and think it's as good as total least squares, try doing the problem after swapping the x and y coordinates of all the points in the test set). Weiss' tutorial gives a correct set of equations for least squares. The book gives an INCORRECT equation for least squares (but a correct equation for total least squares). So if using least squares, look at Weiss. If using total least squares, look at the book. Since the book is a bit sketchy, I'm including more complete notes below on how total least squares works. 2) Sigma The problem set implies that you should fix sigma^2 at .1, ie., fix sigma at about .3. If you do this, things should work. However, it is preferable to update sigma at each iteration by estimating it from the data, as was done in the tutorial code. Either approach is acceptable as long as you can get it to work. 3) Initializing your lines. Of course, you should not cheat by initializing lines using prior knowledge about the right answer. But it is ok to initialize lines that are at least near all the points, since this you know. So, for example, it makes sense to initialize lines with an arbitrary, random slope, but with a random y-intercept that places the lines somewhere near the complete set of points. Notes on Total Least Squares I will use matlab notation, since it's easier to type. I am just slightly expanding on what the book says in the top half of page 335. You want to find a line, described by (a,b,c) that minimizes the sum of squared distance to all points. If your line is described as ax + by + c = 0, with a^2+b^2 =1, then the distance from (x,y) to the line is: abs(ax+by+c). So you want to find (a,b,c) to minimize: sum( (ax + by + c).^2) subject to the constraint a^2+b^2=1. Constrained minimization problems like this can be solved with Langrange multipliers. That is, we introduce a dummy variable L, and minimize: sum( (ax + by + c).^2) + L(a^2 + b^2 - 1). This will achieve a minima when the constraint is satisfied. Taking partial derivatives with respect to a,b,c,L, we find that the following four equations must be satisfied: (1) a^2 + b^2 - 1 = 0 (partial w/ respect to L) (2) sum(2x.*(ax + by + c)) + 2aL = 0 (partial with respect to a) (3) sum(2y.*(ax + by + c)) + 2bL = 0 (partial with respect to b). (4) sum(2*(ax + by + c)) = 0 (partial with respect to c). The last three equations correspond to the matrix equation given in the book, top of page 335. From (4) we get c = - sum(ax + by)/n (where there are n points), as the book states. Then, substituting this expression for c into equations (2) and (3) we get the second matrix equation on page 335. To use this approach to find a weighted least squares line, we start with the minimization: sum( w.*(ax + by + c).^2) subject to the constraint a^2+b^2=1. Then follow the same steps as above. For example, equation (4) becomes: (4w) sum(2*w.*(ax + by + c)) = 0 (partial with respect to c). Note, for those of you familiar with SVD or PCA, the derivation above leads to PCA in 2D. That is, another way to look at this is to compute the mean x and y values and subtract these from each point. Then run SVD on the resulting 2xn matrix (call it P), which is equivalent to finding the eigenvectors of PP^T. This is the eigenvector equation in the book.