I am confused by this line:
$\nabla_{W^{(k)}}J = gh^{(k-1)T}+\lambda\nabla_{W^{(k)}}{\Omega(\theta)}$
This equation is derived by the equation below:
$a^{(k)}= b^{(k)}+W^{(k)}h^{(k-1)}$
But the gradient of $W^{(k)}h^{(k-1)}$ with respect to $W^{(k)}$ should be $h^{(k-1)}$, right?
Why is there a transpose $T$ here?
A matrix calculus problem in backpropagation encount...