The meaning behind $(X^TX)^{-1}$
In linear algebra, we learn that the inverse of a matrix "undoes" the linear transformation. What exactly is the meaning of the inverse of $(X^TX)^{-1}$?
$X^TX$ we know as being a square matrix whose diagonal elements are the sums of squares. So what are we doing when we take the inverse of this? I have always used this property in my calculations but would like to understand more of the meaning behind it.
$\endgroup$ 12 Answers
$\begingroup$When $X$ is a real matrix, the elements of $(X^TX)^{-1}$ also provide a measure of the extent of linear dependence among the columns of $X$.
If $X^TX$ is invertible then the columns of $X$ have to be independent, but sometimes the the columns are "almost" dependent in a sense which will be made clear below.
Denote the $i$th column of $X$ by $x_i$ and let let $\hat{x_i}$ denote the projection of $x_i$ on space spanned by $\{x_j : j \neq i \}$. Call $\epsilon_i = x_i - \hat{x_i}.$ Not that if any $\|\epsilon_i\|$ is "small", it suggests strong linear dependence among the columns of $X$
One can prove the $ij$th element of $(X^TX)^{-1}$ is $\dfrac{\epsilon_i^T\epsilon_j}{\|\epsilon_i\|^2\|\epsilon_j\|^2}.$
In particular the ith diagonal element of $(X^TX)^{-1}$ is $\dfrac{1}{\|\epsilon_i\|^2}$. So if the $i$th column of $X$ is almost a linear combination of other columns, it will be indicated by a very large value at the $i$th diagonal element of $(X^TX)^{-1}$.
$\endgroup$ $\begingroup$Probably the main intuition you will get from the fact that for OLS model you have $$ \operatorname{Var}(\hat{\beta}) = \sigma^2_{\epsilon}(X'X)^{-1}, $$ namely, you can view $(X'X)^{-1}$ as matrix that in a sense measures the stability of your model.
$\endgroup$