Proof for why the conceptual and computation formulas for Sum of Squared Deviates formula are equivalent?
The Sum of Squared Deviates has a conceptual formula of: $SS = ∑(Xi — Mx)^2$ (where $Mx$ is the sample mean)
The corresponding computational formula is: $$SS = ∑Xi^2 —\frac{(∑Xi)^2}{N}$$
How are they algebraically equivalent? I cant seem to figure out (proof) how to convert from one to the other.
Also, what is the point of having a "computational formula" if the mathematical result is the same?
$\endgroup$1 Answer
$\begingroup$Perform the square and collect terms in the 'conceptual' formula to get the 'computational' one. All sums are taken over $i = 1$ to $n$.
$$\sum (X_i = \bar X)^2 = \sum X_I^2 - \sum 2\bar X X_i + \sum \bar X^2 \\ = \sum X_i^2 -2\bar X \sum X_i + n\bar X^2 \\ = \sum X_i^2 - 2\left( \sum X_i \right)^2/n\; + \left( \sum X_i \right)^2/n \\ = \sum X_i^2 - \left( \sum X_i \right)^2/n,$$where we have used $\bar X = \left(\sum X_i \right)/n$ in a few places.
There are a couple of advantages to the computational formula:
(a) It has fewer operations. In particular, only one subtraction for the computational formula compared with $n$ subtractions in the original.
(b) The computational formula requires less memory. Suppose we are reading in a large number $n$ of observations. The original formula requires a first 'pass' to find $\bar X,$ and to store the $n$ data values $X_i.$ Then a second pass is required to find the deviations $(X_i - \bar X),$ square them and sum them. By contrast, for the computational formula, suppose we have three memories and make one pass: one memory to count to $n$, a second membory to accumulate to $\sum X_i,$ and a third memory to accumulate $\sum X_i^2.$Then the values in the three memories can be used to get the computational form.
(c) The sample variance is $s^2 = SS/(n-1).$ Suppose we have two samples (of sizes $m$ and $n$) from the same population, we know the sample mean and variance for each, and we want to find the combined mean of the entire sample of $m + n$ observations. We can get the grand total from $m\bar X_A$ and $n\bar X_B$. So the combined mean is $\bar X_C = (m\bar X_A + n\bar X_B)/(m + n).$ Similarly, if we know the two means and variances, we can solve two computational formulas for $SSQ_A = \sum_{i=1}^m X_i^2$ and $SSQ_B = \sum_{i = m+1}^{m+n} X_i^2$, then get $SS_C = SS_A + SS_B,$ and finally use the computational formula to get $s_C^2.$ Taking the samples to be of size $n$ and 1, respectively, we can get formulas for continuously updating the sample mean and variance as each new observation is obtained.
For example if sample A of size 10 has mean 198.4 and variance 178.93, and sample B of size 20 has mean 201.1 and variance 66.83, then you can show (within rounding error) that the combined sample of size 30 has mean 200.2 and variance 100.99.
One cautionary note about using the computational formula is not to round anything until the final answer is obtained. The (single) difference in the computational formula might involve subtracting two huge numbers to get a relatively small difference. If you round the two huge numbers before subtracting, you might 'round away' the essence of the computation.
$\endgroup$