Last time, we introduced the topic of Matrix Derivation. But we only discussed the situation when \(f\) is a scalar. How about matrix \(f\)? The vectorization method is used in the derivation of matrix. Let’s start.
Defination
How can we define the matrix to matrix derivative? We need to review the previous definition when \(f\) is only a scalar.
Vector \(f\) and Vector \(x\)
For vector \(x\), total differential formula is \[df=\sum_{i=1}^n{\frac{\partial{f}}{\partial{x_i}}dx_i}=\frac{\partial{f^T}}{\partial{x}}dx\]
Thus, we can define the derivative between vector \(f(p\times1)\) and vector \(x\): \[\frac{\partial{f}}{\partial{x}}= \begin{bmatrix} \frac{\partial{f_1}}{\partial{x_1}}&\frac{\partial{f_2}}{\partial{x_1}}&\cdots&\frac{\partial{f_p}}{\partial{x_1}}\\ \frac{\partial{f_1}}{\partial{x_2}}&\frac{\partial{f_2}}{\partial{x_2}}&\cdots&\frac{\partial{f_p}}{\partial{x_2}}\\ \vdots&\vdots&\ddots&\vdots\\ \frac{\partial{f_1}}{\partial{x_m}}&\frac{\partial{f_2}}{\partial{x_m}}&\cdots&\frac{\partial{f_p}}{\partial{x_m}}\\ \end{bmatrix}(m\times{p})\]
Vectorization of a matrix
We define the vectorization of a matrix (column first):
\[vec(X)=[X_{11},\cdots,X_{m1},X_{12},\cdots,X_{m2},\cdots,X_{1n},\cdots,X_{mn}]^T(mn\times1)\]
Matrix \(f\) and Matrix \(x\)
\[\frac{\partial{F_{p\times{q}}}}{\partial{X_{m\times{n}}}}=\frac{\partial{vec(F_{pq})}}{\partial{vec(X_{mn})}}(mn\times{pq})\] \[vec(dF)=\frac{\partial{F^T}}{\partial{X}}vec(dX)\]
Connection
Hessian matrix
To distinguish the two definitions of derivative of scalar \(f\) to matrix \(X\), we define the \(\frac{\partial{f}}{\partial{X}}=[\frac{\partial{f}}{\partial{X_{ij}}}]\) as \(\nabla_{X}f\).
The second derivative of scalar to matrix is also called Hessian matrix.
\[\nabla_{X}^2f=\frac{\partial^2{f}}{\partial{X}^2}=\frac{\partial{\nabla_Xf}}{\partial{X}}(mn\times{mn})\]
Jacobian matrix
\[\frac{\partial{f}}{\partial{x}}= \begin{bmatrix} \frac{\partial{f_1}}{\partial{x_1}}&\frac{\partial{f_1}}{\partial{x_2}}&\cdots&\frac{\partial{f_1}}{\partial{x_m}}\\ \frac{\partial{f_2}}{\partial{x_1}}&\frac{\partial{f_2}}{\partial{x_2}}&\cdots&\frac{\partial{f_2}}{\partial{x_m}}\\ \vdots&\vdots&\ddots&\vdots\\ \frac{\partial{f_p}}{\partial{x_1}}&\frac{\partial{f_p}}{\partial{x_2}}&\cdots&\frac{\partial{f_p}}{\partial{x_m}}\\ \end{bmatrix}(p\times{m})\]
The derivatives of two layouts are transposed to each other.