I am confused by this line:

$\nabla_{W^{(k)}}J = gh^{(k-1)T}+\lambda\nabla_{W^{(k)}}{\Omega(\theta)}$

This equation is derived by the equation below:

$a^{(k)}= b^{(k)}+W^{(k)}h^{(k-1)}$

But the gradient of $W^{(k)}h^{(k-1)}$ with respect to $W^{(k)}$ should be $h^{(k-1)}$, right?

Why is there a transpose $T$ here?

A matrix calculus problem in backpropagation encountered when studying Deep Learning