Recommendation Systems (R)

Association mining is commonly used to make product recommendations by identifying products that are frequently bought together. It is a common technique used to find associations between many variables. It is often used by grocery stores, e-commerce websites, and anyone with large transaction databases. A most common example that we encounter in our daily lives — Amazon knows what else you want to buy when you order something on their site.

Based on the concept of strong rules, Rakesh Agrawal, Tomasz Imieliński and Arun Swami introduced association rules for discovering regularities between products in large-scale transaction data recorded by point-of-sale (POS) systems in supermarkets.

Associate mining in R¶

Michael Hahsler has authored and maintains two very useful R packages relating to association rule mining: the arules package and the arulesViz package.

library(tidyverse)
library(arulesViz)
library(arules)
library(kableExtra)

Data and EDA¶

There’s a public data of buying records in a grocery store. The data looks like below.

data <- read.transactions('..\\data\\groceries.csv', format = 'basket', sep=',')
inspect(head(data))

##     items                      
## [1] {citrus fruit,             
##      margarine,                
##      ready soups,              
##      semi-finished bread}      
## [2] {coffee,                   
##      tropical fruit,           
##      yogurt}                   
## [3] {whole milk}               
## [4] {cream cheese,             
##      meat spreads,             
##      pip fruit,                
##      yogurt}                   
## [5] {condensed milk,           
##      long life bakery product, 
##      other vegetables,         
##      whole milk}               
## [6] {abrasive cleaner,         
##      butter,                   
##      rice,                     
##      whole milk,               
##      yogurt}

I want to find the most frequently bought items.

itemFrequencyPlot(data, topN = 20, type = "absolute")

I want to find the items that are bought frequently together

freq.items <- eclat(data, parameter = list(supp = 0.01, maxlen = 15))

## Eclat
## 
## parameter specification:
##  tidLists support minlen maxlen            target  ext
##     FALSE    0.01      1     15 frequent itemsets TRUE
## 
## algorithmic control:
##  sparse sort verbose
##       7   -2    TRUE
## 
## Absolute minimum support count: 98 
## 
## create itemset ... 
## set transactions ...[169 item(s), 9835 transaction(s)] done [0.01s].
## sorting and recoding items ... [88 item(s)] done [0.00s].
## creating sparse bit matrix ... [88 row(s), 9835 column(s)] done [0.00s].
## writing  ... [333 set(s)] done [0.01s].
## Creating S4 object  ... done [0.00s].

inspect(head(freq.items))

##     items                           support    count
## [1] {hard cheese, whole milk}       0.01006609  99  
## [2] {butter milk, whole milk}       0.01159126 114  
## [3] {butter milk, other vegetables} 0.01037112 102  
## [4] {ham, whole milk}               0.01148958 113  
## [5] {sliced cheese, whole milk}     0.01077783 106  
## [6] {oil, whole milk}               0.01128622 111

Product recommendation rules¶

There are three parameters controlling the number of rules to be generated viz. Support, Lift and Confidence.

Support is an indication of how frequently the item set appears in the data set.
$$ Support = \frac{Number\, of\, transactions\, with\, both\, A\, and\, B}{Total\, number\, of\, transactions} = P\left(A \cap B\right) $$

Confidence is an indication of how often the rule has been found to be true.
$$ Confidence = \frac{Number\, of\, transactions\, with\, both\, A\, and\, B}{Total number of transactions with A} = \frac{P\left(A \cap B\right)}{P\left(A\right)} $$

Lift is the factor by which, the co-occurrence of A and B exceeds the expected probability of A and B co-occurring, had they been independent. So, higher the lift, higher the chance of A and B occurring together.
$$ Lift = \frac{Confidence}{Expected Confidence} = \frac{P\left(A \cap B\right)}{P\left(A\right).P\left(B\right)} $$

rules <- apriori(data, parameter = list(support = 0.0015, confidence = 0.9))

## Apriori
## 
## Parameter specification:
##  confidence minval smax arem  aval originalSupport maxtime support minlen
##         0.9    0.1    1 none FALSE            TRUE       5  0.0015      1
##  maxlen target  ext
##      10  rules TRUE
## 
## Algorithmic control:
##  filter tree heap memopt load sort verbose
##     0.1 TRUE TRUE  FALSE TRUE    2    TRUE
## 
## Absolute minimum support count: 14 
## 
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[169 item(s), 9835 transaction(s)] done [0.01s].
## sorting and recoding items ... [153 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 4 5 6 done [0.01s].
## writing ... [7 rule(s)] done [0.00s].
## creating S4 object  ... done [0.00s].

rules_optim <- rules[]
inspect(rules_optim)

##     lhs                         rhs                    support confidence    coverage      lift count
## [1] {liquor,                                                                                         
##      red/blush wine}         => {bottled beer}     0.001931876  0.9047619 0.002135231 11.235269    19
## [2] {flour,                                                                                          
##      root vegetables,                                                                                
##      whipped/sour cream}     => {whole milk}       0.001728521  1.0000000 0.001728521  3.913649    17
## [3] {cream cheese,                                                                                   
##      other vegetables,                                                                               
##      sugar}                  => {whole milk}       0.001525165  0.9375000 0.001626843  3.669046    15
## [4] {butter,                                                                                         
##      pip fruit,                                                                                      
##      whipped/sour cream}     => {whole milk}       0.001830198  0.9000000 0.002033554  3.522284    18
## [5] {domestic eggs,                                                                                  
##      tropical fruit,                                                                                 
##      whipped/sour cream}     => {whole milk}       0.001830198  0.9000000 0.002033554  3.522284    18
## [6] {fruit/vegetable juice,                                                                          
##      tropical fruit,                                                                                 
##      whipped/sour cream}     => {other vegetables} 0.001931876  0.9047619 0.002135231  4.675950    19
## [7] {root vegetables,                                                                                
##      sausage,                                                                                        
##      tropical fruit,                                                                                 
##      yogurt}                 => {whole milk}       0.001525165  0.9375000 0.001626843  3.669046    15

lhs is “left hand side” and rhs is “right hand side”. In the first low, it’s for the result about “if a customer buy liquor and red/blush wine, which is in lhs column, will the customer buy bottled beer, which is in rhs column?”

To find the item pairs in descending order of support, lift and confidence.

inspect(sort(rules_optim, by="confidence", decreasing = T))

##     lhs                         rhs                    support confidence    coverage      lift count
## [1] {flour,                                                                                          
##      root vegetables,                                                                                
##      whipped/sour cream}     => {whole milk}       0.001728521  1.0000000 0.001728521  3.913649    17
## [2] {cream cheese,                                                                                   
##      other vegetables,                                                                               
##      sugar}                  => {whole milk}       0.001525165  0.9375000 0.001626843  3.669046    15
## [3] {root vegetables,                                                                                
##      sausage,                                                                                        
##      tropical fruit,                                                                                 
##      yogurt}                 => {whole milk}       0.001525165  0.9375000 0.001626843  3.669046    15
## [4] {liquor,                                                                                         
##      red/blush wine}         => {bottled beer}     0.001931876  0.9047619 0.002135231 11.235269    19
## [5] {fruit/vegetable juice,                                                                          
##      tropical fruit,                                                                                 
##      whipped/sour cream}     => {other vegetables} 0.001931876  0.9047619 0.002135231  4.675950    19
## [6] {butter,                                                                                         
##      pip fruit,                                                                                      
##      whipped/sour cream}     => {whole milk}       0.001830198  0.9000000 0.002033554  3.522284    18
## [7] {domestic eggs,                                                                                  
##      tropical fruit,                                                                                 
##      whipped/sour cream}     => {whole milk}       0.001830198  0.9000000 0.002033554  3.522284    18

inspect(sort(rules_optim, by="support", decreasing = T))

##     lhs                         rhs                    support confidence    coverage      lift count
## [1] {liquor,                                                                                         
##      red/blush wine}         => {bottled beer}     0.001931876  0.9047619 0.002135231 11.235269    19
## [2] {fruit/vegetable juice,                                                                          
##      tropical fruit,                                                                                 
##      whipped/sour cream}     => {other vegetables} 0.001931876  0.9047619 0.002135231  4.675950    19
## [3] {butter,                                                                                         
##      pip fruit,                                                                                      
##      whipped/sour cream}     => {whole milk}       0.001830198  0.9000000 0.002033554  3.522284    18
## [4] {domestic eggs,                                                                                  
##      tropical fruit,                                                                                 
##      whipped/sour cream}     => {whole milk}       0.001830198  0.9000000 0.002033554  3.522284    18
## [5] {flour,                                                                                          
##      root vegetables,                                                                                
##      whipped/sour cream}     => {whole milk}       0.001728521  1.0000000 0.001728521  3.913649    17
## [6] {cream cheese,                                                                                   
##      other vegetables,                                                                               
##      sugar}                  => {whole milk}       0.001525165  0.9375000 0.001626843  3.669046    15
## [7] {root vegetables,                                                                                
##      sausage,                                                                                        
##      tropical fruit,                                                                                 
##      yogurt}                 => {whole milk}       0.001525165  0.9375000 0.001626843  3.669046    15

inspect(sort(rules_optim, by="lift", decreasing = T))

##     lhs                         rhs                    support confidence    coverage      lift count
## [1] {liquor,                                                                                         
##      red/blush wine}         => {bottled beer}     0.001931876  0.9047619 0.002135231 11.235269    19
## [2] {fruit/vegetable juice,                                                                          
##      tropical fruit,                                                                                 
##      whipped/sour cream}     => {other vegetables} 0.001931876  0.9047619 0.002135231  4.675950    19
## [3] {flour,                                                                                          
##      root vegetables,                                                                                
##      whipped/sour cream}     => {whole milk}       0.001728521  1.0000000 0.001728521  3.913649    17
## [4] {cream cheese,                                                                                   
##      other vegetables,                                                                               
##      sugar}                  => {whole milk}       0.001525165  0.9375000 0.001626843  3.669046    15
## [5] {root vegetables,                                                                                
##      sausage,                                                                                        
##      tropical fruit,                                                                                 
##      yogurt}                 => {whole milk}       0.001525165  0.9375000 0.001626843  3.669046    15
## [6] {butter,                                                                                         
##      pip fruit,                                                                                      
##      whipped/sour cream}     => {whole milk}       0.001830198  0.9000000 0.002033554  3.522284    18
## [7] {domestic eggs,                                                                                  
##      tropical fruit,                                                                                 
##      whipped/sour cream}     => {whole milk}       0.001830198  0.9000000 0.002033554  3.522284    18

plot(rules_optim[1:7], method = "graph")

By changing the support and confidence cutoffs, we can get better recommendations.