Linear Regression Process

Here we will see the step by step calculation of Linear Regression with the help of an example.

Let's suppose that we are provided with the following data.

xy
13
21
35
42
56
69
710
84
98

Plot all these data points on the graph.

graph

Now, our goal is to draw the regression line which passes through all these points with the least error.

Step-1: Calculate the mean of the x and y-axis. 

Note: The line of regression will pass through the mean of x & y always.

Mean of x-axis=5.0

Mean of y-axis=5.34

Now we will draw a point where these two values will intersect. Thus this is the point from where our line of regression will pass.

graph

Here, the blue point is the mean value of the x and y-axis. 

Step-2: Calculate the value of m 

As we know the equation of a line is y=mx+c. Here

m=\frac{\sum (x-\bar{x})\sum(y-\bar{y})}{\sum(x-\bar{x}) ^{2}}

 

where (x-x̄) is nothing but the distance of all the points from x=5.

 ( y-ȳ)s nothing but the distance of all the points from y=5.34.

xyx-x̄y-ȳ(x-x̄)(y-ȳ)(x-x̄)2
13-4-2.349.3416
21-3-4.34139
35-2-0.340.674
42-1-3.343.341
5600.6700
6913.673.671
71024.674.674
843-1.34-1.349
9842.672.6716

Then, 

\sum (x-\bar{x})\sum(y-\bar{y})=36

\sum (x-\bar{x})^{2}=60

Therefore,

m=0.6

Step-3: Calculate the value of c

Equation of line, y=mx+c.

From the above calculations, we got

x=5, y=5.34 and m=0.6

Therefore,

5.34=5*0.6+c

c=2.34

Step-4: Find the equation of Regression Line using Equation of a line

From all the above-calculated values, we can conclude that our regression line would intersect the y-axis at 2.94 and it will also cross the point (5,5.34).

Therefore, the equation of the Regression line would be 

y=0.6x+2.34.

Step-5: We will check whether the variables are dependent on independent variables or not for this process firstly we will predict the values of y.

For m=0.6,

c=2.34,

y=0.6x+2.34,

Then calculate the R2 for the given data Here R2 is the goodness of fit.

xyy-ȳypyp-ȳ(y-ȳ)2(yp-ȳ)2
13-2.342.94-2.45.45.76
21-4.343.54-1.818.83.24
35-0.344.14-1.20.111.44
42-3.344.74-0.611.10.36
560.675.3400.440
693.675.940.613.40.36
7104.676.541.221.81.44
84-1.347.141.81.73.24
982.677.742.47.15.76

 

R^{2}=\frac{\sum (y_{p}-{y})^{2}}{\sum (y-\bar{y})^{2}}

R2 =0.27

Here, R2  tends towards 0 in this case we can say that independent variables are not at all related to dependent variables. More the values of Rmore will be the dependency. If we increase the value of R2  the error will decrease.

graph

Here green line represents the line of regression.