Sunday, 1 July 2018

Regression Unplugged I

Simple Linear Regression

In this and the following related articles  we will be going through all types of regression algorithms, calculations , applications and how do we apply them in real life problem. with some basic ways to do them.

Firstly we focus on calculating simple linear regression and what does it mean in prediction actually.

Very few people understand that this is the fundamental method of prediction. A simple linear equation is actually taught in class VII or before in schools , but unfortunately not many explained how it will be a key concept around with the entire predictions game revolves , the future and basis of all regression predictions intuitions.


Simple Liner Regression is defined as an powerful mathematical relationship between two variable and enabling us to 'predict'  the best approximation of one variable based on other variable.

You would have come across the following equation and graph in school:

Y = mx +c

Y= Dependent Variable , X = Independent Variable , m= Slope , b=intercept

Thanks to this wonderful image on lumen  below , we can almost very easily interpret the above equation , for a  set of uniform points , its fairly easy to say , regression line is most robust and simplest method to predict y.

At one point Y value will always be calculated from x , provided we calculatethe slope and intercept, and how do we do that?

Logic of least Square:

Now we know what we want to do ,we have to find best fit line which covers all my routes and becomes the closest to all points for my customized data set here for the regression line .

Following data set :

x = {11,10,9,8,7,6,5,4,3,2,1}
y = {3757,3683,3678, 3515,3416,3437,3300,3179,3088,3004,3049}

The Least Squares Regression Line is line that makes the vertical distance from the data points to the regression line as small as possible.In the graph the shadowed area for every point to the line should be minimum ,  Algorithm is equipped to do that mathematically. The crux is closer we get to target better it is for our prediction accuracy.

I plotted the following regression line in R , and when we do so we will be in best position to predict next value of Y based on X
How to do it manually

The following table shows how do we calculate linear regression using least square method and yes , by hand.

X mean,Y mean are the respective is the average of x,y for all observations

How to do it in Excel

In the data and Analysis tab , we get the function Regression , place x and Y ranges and you ll get the following results.

Too much to digest here , well lets check the yellow one only for now as slope and intercept  we ll discuss the rest in next write up.

keep in mind R square , this is one of most crucial measure for DATA SCIENTIST.

How to do it in R

Smallest code possible in r , as their is a built in function for regression.
df<- data.frame(X=c(11,10,9,8,7,6,5,4,3,2,1),
                Y=c(3757,3683,3678,3515,3416,3437,3300,3179,3088, 3004,3049))
reg=lm(formula=Y~X, data = df))


Again a lot to be explained, keep an eye on next post of regression unplugged series.


Probably the favorite part of the blog , how do we apply this technique.

Start from very basic , Weight Height prediction , Experience Salary Prediction , Balls runs predictions , Games score prediction.

Moderate ones :Prediction of your wealth based on investments.

Recommended : Once we clear regression these days FIFA WC is going on , try your prediction on the winners based on the historic data eg how many which league players success in WC and by that logic whom should we include in or fantasy set up. I am working on a similar model we share the details once its completed.

Next in Line 

Well its not that simple mathematically , we need to understand other factors and assumptions as well , I ll write about them in regression unplugged II.

Let me know in comments please.

To be continued ..


Post a comment