# Butterfly Predictions

## Classification III : Baye's Theorem

Bayes Theorem:

This is arguably one of the most important concepts of probability and predictions based on existing set, and the reason I got to start butterfly predictions.
Bayes Theorem : It the probability of an event, based on prior knowledge of conditions that might be related to the event.

P (A|B)     =     P (B|A) * P(A)
P(B)

Mathematics

Now let’s take a use case, I have 2 Email Accounts with me one is with Gmail, calling it G and other is with outlook called O.

Expectation is, finding the probability that the next outlook main is a spam ?

Details

G 20 mails per day
O 10 mails per day
Out of all mails 5% are spams, and out of all spams 50% are from G and 50% from O.

Solution:

1.Now I receive 30 mails a day and any new mail coming from Gmail has probability:

P(G) = 20/30 =0 .66
P(O)=10/30=0.33

2.Probability of spam mails are

P(S)=5%=5/100=0.05

3. Since we know that 50% spams are coming from Gmail , hence the probability of span emails coming given source is Gmail is 50%  (of total spams), implies

P(G|S) =50%=.5

similarly for Outlook as well

P(O|S)=50%=.5

4.Now the Probability of spam coming from outlook is, written as

P(S|O) =?

Applying Bayes Theorem:

P (Spam|Outlook) = P(Outlook | Spam) * P(Spam)
P(Outlook)

P(S|O)                  =   P(O|S) * P(S)         = 0.075 (7.5%)
P(O)

Conclusion: For all 100 mails coming from outlook, you will have 7-8 mails as spams.

P (Spam|Gmail)  =  P(Gmail | Spam) * P(Spam)
P(Gmail)

P(S|O)                 =   P(G|S) * P(S)          = 0.37
P(G)

Conclusion: For all 100 mails coming from Gmail, you will have nearly 4 mails as spams.

Part II : Naive Bayes in Machine Learning

Naive Bayes classifiers is one of simple probabilistic classifiers based on applying Bayes Theorem with naive independence assumptions between the features.

Abstractly if the n features(independent variables ) f1,f2,f3…fn and outcome class C , the probability in one range(for large number of features ) can be said as :

P(C|F) = P(C|F) P(C)
P(F)

In Bayes terms

Posterior  =   Prior * Likelihood
Evidence

How to apply the Theorem: Example

We have a data set with us which provides us the sales of a car , with respect to few attributes , now we would like to predict using Naïve Classifier , based on these features, how much are the chances of selling the car , following R code will explain in simple lines how can we achieve that.

### R Code :Naive Bayes Classifier

``````library (e1071)
library(MASS)
library(caTools)
library(caret)``````
``## Loading required package: lattice``
``## Loading required package: ggplot2``
``library(xlsx)``

#### Loading data , DA frame remove the continuous key values (SNos) which is not required while classification

``````Data<-read.xlsx(file="NBDS.xlsx",sheetName="Sheet2",header=TRUE)
DA<- Data[,2:5]
DA``````
``````##     Color   Type   Origin Sold
## 1     Red Sports Domestic    1
## 2     Red Sports Domestic    0
## 3     Red Sports Domestic    1
## 4  Yellow Sports Domestic    0
## 5  Yellow Sports Imported    1
## 6  Yellow    SUV Imported    0
## 7  Yellow    SUV Imported    1
## 8  Yellow    SUV Domestic    0
## 9     Red    SUV Imported    0
## 10    Red Sports Imported    1``````

#### Splitting the data in training and test set , 75% split ratio, although this is very small set , but its required for validation.

``````DA\$Sold<-as.factor(DA\$Sold)
set.seed(123)
split = sample.split(DA\$Sold, SplitRatio = 0.75)
train = subset(DA, split == TRUE)
test = subset(DA, split == FALSE)
str(DA)``````
``````## 'data.frame':    10 obs. of  4 variables:
##  \$ Color : Factor w/ 2 levels "Red","Yellow": 1 1 1 2 2 2 2 2 1 1
##  \$ Type  : Factor w/ 2 levels "Sports","SUV": 1 1 1 1 1 2 2 2 2 1
##  \$ Origin: Factor w/ 2 levels "Domestic","Imported": 1 1 1 1 2 2 2 1 2 2
##  \$ Sold  : Factor w/ 2 levels "0","1": 2 1 2 1 2 1 2 1 1 2``````

#### Creation of Model

``````NB_Model<-naiveBayes(x=train[,1:3],y=train\$Sold)
NB_Model``````
``````##
## Naive Bayes Classifier for Discrete Predictors
##
## Call:
## naiveBayes.default(x = train[, 1:3], y = train\$Sold)
##
## A-priori probabilities:
## train\$Sold
##   0   1
## 0.5 0.5
##
## Conditional probabilities:
##           Color
## train\$Sold Red Yellow
##          0 0.5    0.5
##          1 0.5    0.5
##
##           Type
## train\$Sold Sports  SUV
##          0   0.50 0.50
##          1   0.75 0.25
##
##           Origin
## train\$Sold Domestic Imported
##          0     0.75     0.25
##          1     0.50     0.50``````

#### Scoring and Confusion Matrix

``````pred <- predict(NB_Model,newdata=DA)
cm=confusionMatrix(data=pred,reference = DA\$Sold)
cm``````
``````## Confusion Matrix and Statistics
##
##           Reference
## Prediction 0 1
##          0 5 3
##          1 0 2
##
##                Accuracy : 0.7
##                  95% CI : (0.3475, 0.9333)
##     No Information Rate : 0.5
##     P-Value [Acc > NIR] : 0.1719
##
##                   Kappa : 0.4
##  Mcnemar's Test P-Value : 0.2482
##
##             Sensitivity : 1.000
##             Specificity : 0.400
##          Pos Pred Value : 0.625
##          Neg Pred Value : 1.000
##              Prevalence : 0.500
##          Detection Rate : 0.500
##    Detection Prevalence : 0.800
##       Balanced Accuracy : 0.700
##
##        'Positive' Class : 0
## ``````

Summary : We get 70% accuracy with our set , since the data is scattered its fine for a very small sample , next articles we will evaluate how do we measure the accuracy and achieve better results of classification models.

Let me know if you need additional details in comment section.