Least errors
September 2018
Many optimisation problems end up becoming problems of minimising distances as measured with respect to some chosen target. An elementary example suffices to capture the heart of the concept. Say we have a set of five numbers: {1, 5, 40, 60, 100}. If we measure distance from a to b as |b - a|, then for what number b we can achieve the least total distance with respect to all of the aforementioned five numbers?

The simple answer is the arithmetic mean: sum up all of the values and divide by the total number of values summed up.

(1 + 5 + 40 + 60 + 100) / 5

This is of course same concept as the center of mass in physics and expectation value in statistics. To see this:

(1 + 5 + 40 + 60 + 100) / 5
= (1 + 5 + 40 + 60 + 100) * (1/5)
= 1 * (1/5) + 5 * (1/5) + 40 * (1/5) + 60 * (1/5) + 100 * (1/5)

Where the weighting is uniform exactly because we're interested in the arithmetic mean of a set of real-valued numbers. In statistics and probability the numbers are usually the realisations of a random variable the weights are the probabilities of that particular value being realised.

If we use the above set as the sample space of a discrete random variable X and denote a particular realisation as x, we can denote its expectation value as

E[X] = Σx * p(x)

where p(x) is the probability that a particular value x is realised. This is of course the same thing as the arithmetic mean above, but represented more abstractly. If the random variable is instead continuous, the only thing that would change is that the sigma would become an integral from minus infinity to infinity.

There is a certain inherent difficulty in summed distances that requires some workarounds. Notice that before we defined distance as the absolute value of the difference between b and a. What if we had discarded the absolute value sign? Let's use a simpler set to see this more clearly, for example, b = {1, 3}. Now, if we want to measure the summed distances of 1 and 3 from 2, what would we get?

Σ (2 - a) = (2 - 3) + (2 - 1) = (-1) + (1) = 0

We get zero. This is in fact a central property of the arithmetic mean: it, by definition, is chosen so that the sums of distances to all directions are equal. It's more helpful if one thinks about it in terms of center of mass. What is center of mass? It is the point in space where the mass to every possible direction is perfectly balanced, and sort of cancels each other out. The same is true for the arithmetic mean.

And so one workaround for this is the absolute value sign, but that is often cumbersome from a mathematical point of view. More common workaround is to square each distance calculated, essentially because squares are always non-negative and thus we always obtain a non-negative measure for distance and zero only when the absolute values of the summed distances actually add up to zero.