CroftSoft /
Library /
Tutorials
The Crossproduct is the Dot Product
David Wallace Croft
20050601
Abstract
This tutorial shows the relationship between the Pearson productmoment
correlation coefficient from statistics and the dot product from linear
algebra.
Motivation
The
standard deviation seems a bit odd when you compare it to the more
intuitive
average absolute deviation. Worse, the standard deviation stretches
to reach out to highly deviant scores which you might consider
throwing out of your data anyway. One wonders why the standard deviation
formula is used at all when it seems somewhat arbitrary.
If you record sample data for two variables such as x and y and then
compute the
Pearson productmoment correlation coefficient,
it will always fall between 1 and +1.
This bit of magic occurs because the correlation coefficient is the
average crossproduct of z scores and the z scores are scaled by those
stretchy standard deviations. This is a clue that the standard deviation
is not as arbitrary as it first seems.
Bear in mind that the crossproduct of z scores is not the same as the
cross product
of two threedimensional vectors. The crossproduct of
z scores comes from statistics as used in the correlation coefficient
and the other cross product
comes from linear algebra as used in computer graphics and physics.
It turns out, however, that
the crossproduct of statistics is the
dot product of linear algebra.
The formula for the standard deviation looks almost like the length of
multidimensional vector centered at the mean. Could the standard deviation
be a way of converting a data vector into a normalized unit vector? Yes, if
you note that the square root of the number of samples n in the denominator
of the standard deviation cancels out when you compute the correlation
coefficient. You then have the projection of one ndimensional unit vector
onto another. This is the cosine of the angle between them and this always
falls between 1 and +1.
Proof

Mean
${m}_{x}=\left(\underset{i=1}{\overset{n}{\Sigma}}{x}_{i}\right)/n$

Variance
${\sigma}_{x}^{2}=\left(\underset{i=1}{\overset{n}{\Sigma}}{[{x}_{i}{m}_{x}]}^{2}\right)/n$

Standard Deviation
${\sigma}_{x}=\sqrt{{\sigma}_{x}^{2}}$

Z Score
${z}_{\mathrm{x,i}}=({x}_{i}{m}_{x})/{\sigma}_{x}$

Correlation Coefficient
$r=\frac{\underset{i=1}{\overset{n}{\Sigma}}[{z}_{\mathrm{x,i}}*{z}_{\mathrm{y,i}}]}{n}$

Average Crossproduct
$r=\frac{\underset{i=1}{\overset{n}{\Sigma}}\left[\right(\frac{{x}_{i}{m}_{x}}{{\sigma}_{x}})*(\frac{{y}_{i}{m}_{y}}{{\sigma}_{y}}\left)\right]}{n}$

Move the constants
$r=\frac{1}{{\sigma}_{x}*{\sigma}_{y}*n}*\underset{i=1}{\overset{n}{\Sigma}}\left[\right({x}_{i}{m}_{x})*({y}_{i}{m}_{y}\left)\right]$

Dot Product
$r=\frac{1}{{\sigma}_{x}*{\sigma}_{y}*n}*\left[\right(x{m}_{x})\xb7(y{m}_{y}\left)\right]$

Standard Deviation in Terms of Norm
${\sigma}_{x}=\sqrt{\frac{\left(\underset{i=1}{\overset{n}{\Sigma}}{[{x}_{i}{m}_{x}]}^{2}\right)}{n}}=\frac{\Vert x{m}_{x}\Vert}{\sqrt{n}}$

The n cancels out
$r=\frac{\sqrt{n}*\sqrt{n}}{\Vert x{m}_{x}\Vert *\Vert y{m}_{y}\Vert *n}*\left[\right(x{m}_{x})\xb7(y{m}_{y}\left)\right]$

Projection of Unit Vectors
$r=\frac{(x{m}_{x})}{\Vert x{m}_{x}\Vert}\xb7\frac{(y{m}_{y})}{\Vert y{m}_{y}\Vert}=\mathrm{cos}\left(\theta \right)$

Proportionate Reduction in Error
${r}^{2}={\mathrm{cos}}^{2}\left(\theta \right)=1{\mathrm{sin}}^{2}\left(\theta \right)$

Perpendicular to Projection
$\mathrm{sin}\left(\theta \right)$
Links
