Part of course:
Linear Regression Tutorial with Example
- Linear regression
- Simple linear regression
- Multiple linear regression
- Cost functions
- Optimization using Gradient Descent
Linear Regression is a simple machine learning model for regression problems, i.e., when the target variable is a real value.
Let's start with an example — suppose we have a dataset with information about the area of a house (in square feet) and its price (in thousands of dollars) and our task is to build a machine learning model which can predict the price given the area. Here is what our dataset looks like
If we plot our data, we might get something similar to the following:
In simple linear regression, we establish a relationship between target variable and input variables by fitting a line, known as the regression line.
In general, a line can be represented by linear equation y = m * X + b. Where, y is the dependent variable, X is the independent variable, m is the slope, b is the intercept.
In machine learning, we rewrite our equation as y(x) = w0 + w1 * x where w's are the parameters of the model, x is the input, and y is the target variable. Different values of w0 and w1 will give us different lines, as shown below
The above equation can be used when we have one input variable (also called feature). However, in general, we usually deal with datasets which have multiple input variables. The case when we have more than one feature is known as multiple linear regression, or simply, linear regression. We can generalize our previous equation for simple linear regression to multiple linear regression:
In the case of multiple linear regression, instead of our prediction being a line in 2-dimensional space, it is a hyperplane in n-dimensional space. For example, in 3D, our plot would look as follows
Different values of the weights (w0, w1, w2, ... wn) gives us different lines, and our task is to find weights for which we get best fit. One question you may have is, how can we determine how well a particular line fits our data? Or, given two lines, how do we determine which one is better? For this, we introduce a cost function which measures, given a particular value for the w's, how close the y's are to corresponding ytrue's. That is, how well do a particular set of weights predict the target value.
For linear regression, we use the mean squared error cost function. It is the average over the various data points (xi, yi) of the squared error between the predicted value y(x) and the target value ytrue.
The cost function defines a cost based on the distance between true target and predicted target (shown in the graph as lines between sample points and the regression line), also known as the residual. The residuals are visualized below,
If a particular line is far from all the points, the residuals will be higher, and so will the cost function. If a line is close to the points, the residuals will be small, and hence the cost function.
Each value of the weight vector w gives us a corresponding cost J(w). We want to find the value of weights for which cost is minimum. We can visualize this as follows:
Note: Above we have used the word "global" because the shape of the cost-function for linear regression is convex (i.e. like a bowl). It has a single minimum, and it smoothly increases in all directions around it.
Given the linear regression model and the cost function, we can use Gradient Descent (covered in the next article) to find a good set of values for the weight vector. This process of finding the best model out of the many possible models is called optimization.