The example of (Stochastic) Gradient Descent does not make sense with a single Datapoint.
The Datapoint x1=2, x2=-3 and y=1 can be considered as representing a point in a
3 dimentional space with x1 on X-axis , x2 on Y-axis, and y on Z-axis.
In 3-D Coordinate Geometry, the equation of the plane is represented as ax+by+cz+d=0
which in your ML parlance, you represent it by :
y = w1x1 + w2x2 + b
Our goal, you say, is to find the weights w1, w2 and b so that the Plane (defined by above equation)
passes closest to the given Data Points (in this case only one datapoint is given).
Now, our basic knowledge of geometry tells us that, ...