What is meant by "dot product between random variables?"
I was having a discussion with a colleague today about correlation coefficients, and I was told that correlation coefficient between 2 random variables $X$ and $Y$ is proportional to the dot product of the two random variables.
I asked him what he means by this, and I was told that you can view random variables as vectors. I don't think I agree with that, but I don't have a sufficient background to really argue my point, but now I want to revisit this.
How can a random variable be viewed a vector? What is meant by dot product between 2 random variables -- is this actually formal terminology or something loosely used?
$\endgroup$ 14 Answers
$\begingroup$The space $L^0(\Omega)$ of all random variables on a fixed sample space $\Omega$ is a vector space - the (outcome-wise) sum of two random variables is a random variable, and a scalar multiple of a random variable is again a random variable. So in that sense, random variables can be viewed as "vectors" because they are the elements of a vector space.
By "dot product" they likely mean the $L^2$ inner product, defined by $\langle X, Y \rangle = E[XY]$. This obeys the same basic algebraic properties as the ordinary Euclidean dot product: bilinear (with respect to the addition and scalar multiplication described above), symmetric, positive definite. Strictly speaking, this inner product doesn't necessarily live on $L^0(\Omega)$, but rather on the vector subspace $L^2(\Omega) \subset L^0(\Omega)$ consisting of random variables with finite second moment.
$\endgroup$ 5 $\begingroup$For two joint discrete variables, the expectation of their product is a weighted dot product of their value vectors (all diagonal values are positive making the diagonal matrix positive definite):
$$ \mathbf{E}[XY] = \sum_{i=1}^n p_i x_i y_i = (x_1,...,x_n) \begin{pmatrix} p_1 & ... & 0\\ \vdots & \ddots & \vdots \\ 0 & ...& p_n \end{pmatrix} (y_1,...,y_n)^T$$
Here, $(X,Y)$ has $n$ possible realizations $(x_i, y_i)$ with probabilities $p_i$, $i=1,...,n$.
$\endgroup$ 2 $\begingroup$Suppose you have a collection of $n$ samples of dependent (in general) variables $X$ and $Y$: $(x_1, y_1), (x_2, y_2), \ldots, (x_n, y_n)$
Then we can view this collection of $n$ samples as a pair of vectors in $\mathbb{R}^n$: $(x_1, x_2, \ldots, x_n)$ and $(y_1, y_2, \ldots, y_n)$.
Then what your colleague is saying is that we can view correlation between $X$ and $Y$ as a kind of normalized inner product between these two vectors.
$\endgroup$ 1 $\begingroup$A multivariate random variable can be considered as a random vector.
But correlation between two such random vectors (or more precisely, cross-correlation) would typically produce a matrix rather than a scalar value
My guess is that you may have been discussing two univariate random variables, say $X$ and $Y$, and calculating the sample correlation between them. If the sample size is $n$ then you could regard the two samples as random vectors $\mathbf{X}=(X_1,X_2,\ldots,X_n)$ and $\mathbf{Y}=(Y_1,Y_2,\ldots,Y_n)$. The sample correlation coefficient would then be $$\frac{\sum\limits_{i=1}^n (x_i-\bar{x})(y_i-\bar{y})}{\sqrt{\sum\limits_{i=1}^n (x_i-\bar{x})^2 \sum\limits_{i=1}^n (y_i-\bar{y})^2}}$$ but you could calculate this using dot products and scalar arithmetic with the vector $\mathbf{1}_n$ of $n$ ones, with $$\frac{\mathbf X \cdot \mathbf Y - n(\mathbf X \cdot \mathbf 1_n)(\mathbf Y \cdot \mathbf 1_n) }{\sqrt{(\mathbf X \cdot \mathbf X - n(\mathbf X \cdot \mathbf 1_n)^2)(\mathbf Y \cdot \mathbf Y - n(\mathbf Y \cdot \mathbf 1_n)^2)}}$$
If you know that the expected values of $X$ and $Y$ are zero then you can use$$\frac{\sum\limits_{i=1}^n x_i y_i}{\sqrt{\sum\limits_{i=1}^n x_i^2 \sum\limits_{i=1}^n y_i^2}} \text{ or }\frac{\mathbf X \cdot \mathbf Y }{\sqrt{(\mathbf X \cdot \mathbf X)(\mathbf Y \cdot \mathbf Y )}}$$and in this sense you might be stretching things and the correlation is proportional to the sample covariance $\mathbf X \cdot \mathbf Y$
$\endgroup$ 3More in general
"Zoraya ter Beek, age 29, just died by assisted suicide in the Netherlands. She was physically healthy, but psychologically depressed. It's an abomination that an entire society would actively facilitate, even encourage, someone ending their own life because they had no hope. Th…"