Saturday, 18 August 2018

Recommendation System III


R Code for Market Basket Analysis : Explained

Market Basket Analysis


Recommendation using Market Basket Analysis -Association rules is well explained in above article, now to understand it further we will be going through the code in R and ow do we implement the algorithm.


Use Case : A grocery store items are arranged in sheet and we want to put the best items together so that sales increase and also less selling items get their visibility share.

Here is the source sheet.
Data Set

Code Explained in detail in R Markdown.



Step 1: Load the required libraries

library(ggplot2)
library(arules)
## Loading required package: Matrix
## 
## Attaching package: 'arules'
## The following objects are masked from 'package:base':
## 
##     abbreviate, write
library(arulesViz)
## Loading required package: grid
library(datasets)
library(rattle)
## Rattle: A free graphical interface for data science with R.
## Version 5.2.0 Copyright (c) 2006-2018 Togaware Pty Ltd.
## Type 'rattle()' to shake, rattle, and roll your data.

Step 2: Load the source file

This file contains data of groceries set in excel format,for arules package we need to convert all our columner data to transactions format.
GR<-read.csv("groceries.csv",header = FALSE)
GRT<-read.transactions("groceries.csv",format="basket",sep=",")
Basic EDA , where we see the top 20 items based on their frequency, to analyse the most selling items in the set.
We can have so many EDA before customizing our rules for product , for now since we have whole milk as top frequency we will analyse what all things can be put with whole milk.
itemFrequencyPlot(GRT,topN=20,type="absolute")  

Step 3: Creating first set of rules (All Items)

rules_ap <- apriori(GRT, parameter = list( support=0.001, confidence = 0.8))
## Apriori
## 
## Parameter specification:
##  confidence minval smax arem  aval originalSupport maxtime support minlen
##         0.8    0.1    1 none FALSE            TRUE       5   0.001      1
##  maxlen target   ext
##      10  rules FALSE
## 
## Algorithmic control:
##  filter tree heap memopt load sort verbose
##     0.1 TRUE TRUE  FALSE TRUE    2    TRUE
## 
## Absolute minimum support count: 9 
## 
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[169 item(s), 9835 transaction(s)] done [0.00s].
## sorting and recoding items ... [157 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 4 5 6 done [0.02s].
## writing ... [410 rule(s)] done [0.00s].
## creating S4 object  ... done [0.01s].
summary(rules_ap)                    
## set of 410 rules
## 
## rule length distribution (lhs + rhs):sizes
##   3   4   5   6 
##  29 229 140  12 
## 
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   3.000   4.000   4.000   4.329   5.000   6.000 
## 
## summary of quality measures:
##     support           confidence          lift            count      
##  Min.   :0.001017   Min.   :0.8000   Min.   : 3.131   Min.   :10.00  
##  1st Qu.:0.001017   1st Qu.:0.8333   1st Qu.: 3.312   1st Qu.:10.00  
##  Median :0.001220   Median :0.8462   Median : 3.588   Median :12.00  
##  Mean   :0.001247   Mean   :0.8663   Mean   : 3.951   Mean   :12.27  
##  3rd Qu.:0.001322   3rd Qu.:0.9091   3rd Qu.: 4.341   3rd Qu.:13.00  
##  Max.   :0.003152   Max.   :1.0000   Max.   :11.235   Max.   :31.00  
## 
## mining info:
##  data ntransactions support confidence
##   GRT          9835   0.001        0.8
Refining the rules for one product (whole milk), Now we will see what items can be sold with whole milk.
rules_milk<- apriori(GRT, parameter=list(supp=0.001,conf = 0.15,minlen=2), 
               appearance = list(default="rhs",lhs="whole milk"),
               control = list(verbose=F))
rules_milk<-sort(rules_milk, decreasing=TRUE,by="confidence")
inspect(rules_milk[1:5])
##     lhs             rhs                support    confidence lift    
## [1] {whole milk} => {other vegetables} 0.07483477 0.2928770  1.513634
## [2] {whole milk} => {rolls/buns}       0.05663447 0.2216474  1.205032
## [3] {whole milk} => {yogurt}           0.05602440 0.2192598  1.571735
## [4] {whole milk} => {root vegetables}  0.04890696 0.1914047  1.756031
## [5] {whole milk} => {tropical fruit}   0.04229792 0.1655392  1.577595
##     count
## [1] 736  
## [2] 557  
## [3] 551  
## [4] 481  
## [5] 416

Step 5 : Analysis and plotting top rules

plot(rules_milk,method="graph")


Conclusion :

Keep the linked product with milk and see the increase in sales .

Please let me know in comments section , if you need separate r scipt.



0 comments:

Post a Comment