Matrix Derivation (2)

2020-12-29
1 min read

Last time, we introduced the topic of Matrix Derivation. But we only discussed the situation when \(f\) is a scalar. How about matrix \(f\)? The vectorization method is used in the derivation of matrix. Let’s start.

Defination

How can we define the matrix to matrix derivative? We need to review the previous definition when \(f\) is only a scalar.

Vector \(f\) and Vector \(x\)

For vector \(x\), total differential formula is \[df=\sum_{i=1}^n{\frac{\partial{f}}{\partial{x_i}}dx_i}=\frac{\partial{f^T}}{\partial{x}}dx\]

Thus, we can define the derivative between vector \(f(p\times1)\) and vector \(x\): \[\frac{\partial{f}}{\partial{x}}= \begin{bmatrix} \frac{\partial{f_1}}{\partial{x_1}}&\frac{\partial{f_2}}{\partial{x_1}}&\cdots&\frac{\partial{f_p}}{\partial{x_1}}\\ \frac{\partial{f_1}}{\partial{x_2}}&\frac{\partial{f_2}}{\partial{x_2}}&\cdots&\frac{\partial{f_p}}{\partial{x_2}}\\ \vdots&\vdots&\ddots&\vdots\\ \frac{\partial{f_1}}{\partial{x_m}}&\frac{\partial{f_2}}{\partial{x_m}}&\cdots&\frac{\partial{f_p}}{\partial{x_m}}\\ \end{bmatrix}(m\times{p})\]

Vectorization of a matrix

We define the vectorization of a matrix (column first):

\[vec(X)=[X_{11},\cdots,X_{m1},X_{12},\cdots,X_{m2},\cdots,X_{1n},\cdots,X_{mn}]^T(mn\times1)\]

Matrix \(f\) and Matrix \(x\)

\[\frac{\partial{F_{p\times{q}}}}{\partial{X_{m\times{n}}}}=\frac{\partial{vec(F_{pq})}}{\partial{vec(X_{mn})}}(mn\times{pq})\] \[vec(dF)=\frac{\partial{F^T}}{\partial{X}}vec(dX)\]

Connection

Hessian matrix

To distinguish the two definitions of derivative of scalar \(f\) to matrix \(X\), we define the \(\frac{\partial{f}}{\partial{X}}=[\frac{\partial{f}}{\partial{X_{ij}}}]\) as \(\nabla_{X}f\).

The second derivative of scalar to matrix is also called Hessian matrix.

\[\nabla_{X}^2f=\frac{\partial^2{f}}{\partial{X}^2}=\frac{\partial{\nabla_Xf}}{\partial{X}}(mn\times{mn})\]

Jacobian matrix

\[\frac{\partial{f}}{\partial{x}}= \begin{bmatrix} \frac{\partial{f_1}}{\partial{x_1}}&\frac{\partial{f_1}}{\partial{x_2}}&\cdots&\frac{\partial{f_1}}{\partial{x_m}}\\ \frac{\partial{f_2}}{\partial{x_1}}&\frac{\partial{f_2}}{\partial{x_2}}&\cdots&\frac{\partial{f_2}}{\partial{x_m}}\\ \vdots&\vdots&\ddots&\vdots\\ \frac{\partial{f_p}}{\partial{x_1}}&\frac{\partial{f_p}}{\partial{x_2}}&\cdots&\frac{\partial{f_p}}{\partial{x_m}}\\ \end{bmatrix}(p\times{m})\]

The derivatives of two layouts are transposed to each other.