Very helpful advice in this particular post! It’s the little changes that make the largest changes. Thanks for sharing!
For multiple linear regression, why is the intercept term only considered for the "first" feature? I'd expected to see y = w0 + w1 * x1 + w2 + w3 * x2 + ... + w(2n) + w(2n+1) * w(n). I'm sure this doesn't even make sense mathematically, but from a conceptual perspective, can you explain why this isn't the case?