In statistical study and research, matrix derivation is used frequently. Some complex formulas can be derived by Matrix Derivation simply. Unfortunately, it seems that students in Statistics will not be taught about matrix derivation. So the issues usually become hard to solve. For example, OLS is easy with only one independent variable, while it would be much more complex when the number of independent variables is over one. You have to write down so many functions of each variable. But for matrix derivation, not only the computation is simple but also the thought is in accordance with the one variable’s method. Matrix is a useful tool in statistical. I have learned several materials about it and restructure the knowledge system by my understanding.
Defination
Derivative of scalar \(f\) to matrix \(X\) as \[\frac{\partial{f}}{\partial{X}}=[\frac{\partial{f}}{\partial{X_{ij}}}]\]
For scalar \(x\), \[df=f'(x)dx\]
For vector \(x\), total differential formula is \[df=\sum_{i=1}^n{\frac{\partial{f}}{\partial{x_i}}dx_i}=\frac{\partial{f^T}}{\partial{x}}dx\]
For matrix \(X\), here is \[df=\sum_{i=1}^m{\sum_{j=1}^n{\frac{\partial{f}}{\partial{X_{ij}}}dX_{ij}}}=tr(\frac{\partial{f^T}}{\partial{X}}dX)\]
Basic formula
- \(d(X\pm{Y})=dX\pm{dY}\)
- \(d(XY)=(dX)Y+XdY\)
- \(d(X^T)=(dX)^T\)
- \(dtr(X)=tr(dX)\)
- \(dX^{-1}=-XdXX^{-1}\)
- \(d|X|=tr(X^{*}dX)\), where \(X^*\) represents Adjoint matrix
- \(d|X|=|X|tr(X^{-1}dX)\), when \(X\) is reversible.
- \(d(X\odot{Y})=dX\odot{Y}+X\odot{dY}\)
- \(d\sigma(X)=\sigma'(X)\odot{dX}\), where\(\sigma(X)=[\sigma(X_{ij})]\)
Tips
- \(f=tr(f)\), if it’s a scalar.
- \(tr(A^T)=tr(A)\)
- \(tr(A\pm{B})=tr(A)\pm{tr(B)}\)
- \(tr(AB)=tr(BA)\)
- \(tr(A^T(B\odot{C}))=tr((A\odot{B})^TC)\)
I will find some normal examples to exercise at next article. And the derivation of Matrix \(f\) is supposed to be introduced later.