As per step 3 in pseudocode for gradient descent, we are updating all the weights. If we update single weight at a time and perform steps and then again update next weight, we might get much smaller cost. Am I correct?
in the second example, when Ytrue = 0, the cost function is show as :
but isn't it supposed to be −log(0.2) instead of −log(0.8) ?
Mapping countries to integers using a dictionary:
Is it not more efficient to use data['Country'] = data['Country'].apply(lambda s: mymap.get(s) if s in mymap else s)?