Consider a simple example:
We have cameras with focal points at (-10,0,0) (0,0,0), focal lengths of 1 and image planes at the z=1
plane.
The world contains a 40x40 square in the z=100 plane, and it’s lower left corner at (0,0,100).
The background is in the z=200 plane, with vertical stripes. For example, one stripe has sides x=-5, x=5, with z=200.
In the left image the square has corners at (.1,0), (.5,0), (.1, .4) ,
(.5, .4). In the right image, it’s at (0,0), (.4,0),
(0,.4), (.4,.4). The baseline is 10, the disparity is .1, so
distance is 10/.1 = 100.
In the left image, the stripe is bounded by the lines x = .025, x =
.05. In the right image, it’s -.025,
.025. So in the left image, the stripe
is partly blocked by
the square, in the right image it’s fully to the left of the square. For the stripe, disparity is .05, so distance is 10/.05 = 200.
Notice that a line segment with ends at (-10,0,200), (0,0,100)
projects in the left image
to (0,0),(.1,0) and in the right to (-.05,0) (0,0). The line
gets shorter in the right image due to foreshortening.