I have to stick myself on to this question. Even widen it as in at the example code in lines 8 to 12, seems to be a light of the answer but with no enough clearness yet.
# standardize the data to make sure each feature contributes equally# to the distance
The example of (Stochastic) Gradient Descent does not make sense with a single Datapoint.
The Datapoint x1=2, x2=-3 and y=1 can be considered as representing a point in a
3 dimentional space with x1 on X-axis , x2 on Y-axis, and y on Z-axis.
In 3-D Coordinate Geometry, the equation of the plane is represented as ax+by+cz+d=0
which in your ML parlance, you represent it by :
y = w1x1 + w2x2 + b
Our goal, you say, is to find the weights w1, w2 and b so that the Plane (defined by above equation)
passes closest to the given Data Points (in this case only one datapoint is given).
Now, our basic knowledge of geometry tells us that, ...