# Butterfly Predictions

## Logistic Regression

Its been a while for new post, I was going through a lot of world cup prediction models and ended up creating one logistic regression set and predicted around 75% times correctly. Data collection was a challenge and I included a lot of manually created/transformed variables, will publish in detail how I trained and scored my model.

Moving on in our regular learning series we have Logistic Regression, as always, my focus will be to understand the mathematics behind and how do we use it for our life.

Introduction:

In the linear regression model, the dependent variable y is considered continuous, whereas in logistic regression it is categorical.
Linear regression uses the general linear equation

Y=   b0 + b1x1 + E

where Y is a continuous dependent variable and independent variables Xi are continuous.

Above graph shows a basic graphical comparison between linear and logistic form of regression, linear provides a continuous values for Y , however logistic provides a probability of Y value with the help of sigmoid function and the output always is categorical from 1 or 0 (simple logistic regression.

### Math behind and How do we predict in Logistic Regression

Now we all know that we need to predict the y values in regression as:

Y = b0+b1x+E   ____ Eq 1

For the distribution which is non-linear we will use sigmoid function to calculate the y values which states

P=1/1+e-y ____ Eq 2

Solve it for Y and you ll get

Y=ln (p/1-p)
p/1-p is called Odds

What are odds :

Odds: Probability of event occurring / Probability of event not occurring P/1-P
Odd Ratio: If P1 if for current event, and P0 is for previous event (P1/1-P1) / (P0/1-P0)

Substituting in Eq1, we get

Ln(p/1-p) = b0+b1x ____ Eq 3

The plan is to get the probability and then decide an optimum level (based on understanding of problem), above which we ll define the target to be 1 and below which we ll call it 0 (only 2 variables as of now).

Something similar is explained in the graphs by udemy with a lot more clarity
For every x value we are calculating the probability and based on that we decide to mark it 1 or 0 , to calculate probability we use above Sigmoid function. It does not use OLS (Ordinary Least Square) for parameter estimation. Instead, it uses maximum likelihood estimation (MLE)

We have a scenario to explain further where a Football Team scores in first half and now we want to see what the probability of is winning of that team.

 Goals 1 2 2 3 4 3 1 2 1 2 2 3 4 5 0 1 Win 0 1 0 0 1 1 0 1 0 1 0 1 1 1 0 0

FROM R:

Summary of Logistic regression in R gives:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.08333    0.19754  -0.422  0.67953
LR\$Goals.    0.25926    0.07603   3.410  0.00423 **

The output indicates that Goals in first half studying is holding good significance
The Intercept (b0/c) is -0.08333 and Goals(b1/m) is 0.25926.
Going forward These coefficients are entered in the logistic regression equation to estimate the odds (equivalently, probability) of passing the exam:

From Eq 2
P= P=1/1+e-y
Or P=1/1+e-(b0+b1x)

Now we know b0 and b1 from our regression coefficients, we can calculate the probabilities.

 Goals in first half 1 2 3 4 5 P(winning) 0.54 0.6 0.66 0.71 0.76

And we decide a optimum value of 0.60 and assume if a team scores 2 goals more than other team , that has very good chances of winning and we declare it as winner , of course there are numerous other factors involved but this is one aspect of looking at it.

### Performance of Logistic Regression Model

#### AIC (Akaike Information Criteria) and Deviance –

The analogous metric of adjusted R² in linear regression we have AIC in logistic regression is AIC.
AIC is the measure of fit which penalizes model for the number of model coefficients. Therefore, we always prefer model with minimum AIC value.

It measures flexibility of the models. Its analogous to adjusted R2 in multiple linear regression where it tries to prevent you from including irrelevant predictor variables. Lower AIC of model is better than the model having higher AIC.

`Summary of Model  `
`Estimate Std. Error t value Pr(>|t|)   `
`(Intercept) -0.08333    0.19754  -0.422  0.67953   `
`LR\$Goals.    0.25926    0.07603   3.410  0.00423 **`
`---`
`Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1`
` `
`(Dispersion parameter for gaussian family taken to be 0.1560847)`
`Null deviance: 4.0000  on 15  degrees of freedom`
`Residual deviance: 2.1852  on 14  degrees of freedom`
`  (400 observations deleted due to missingness)`
`AIC: 19.552`

#### Deviance

It is a measure of goodness of fit of a generalized linear model. Higher the deviance value, poorer is the model fit.

The summary of the model says:
Null deviance: 4.0000 on 15 degrees of freedom
Residual deviance: 2.18 on 14 degrees of freedom

Null Deviance indicates the response predicted by a model with nothing but an intercept. Lower the value, better the model.

Residual deviance indicates the response predicted by a model on adding independent variables. Lower the value, better the model.

Lower value of residual deviance points out that the model has become better when it includes more variable (generic).

#### Confusion Matrix

Its just a comparison of actual and predicted values of Y and allows us to find accuracy and avoid over fitting , We ll have additional write up on confusion matrix but in brief the accuracy is calculated from.

Accuracy = True Positive +True negative / True Positive + True negative + False Positive + False negative.

#### let me know what you think in comments👇

1. 2. 