How to get items for both LHS and RHS for only specific columns in arules?
It seems that one can't constrain lhs and rhs at once (I also did not before playing with your data). But you can use subset. EDIT: I was wrong, you can also constrain lhs and rhs at once, see below for another solution. I keep Solution 1 because in some cases it might be useful to compute a bigger set and then split by the left hand side.
Solution 1:
rules_sales <- apriori(sales,
parameter=list(support =0.001, confidence =0.5, minlen=2, maxlen=2),
appearance = list(lhs=c("HouseOwnerFlag=0", "HouseOwnerFlag=1"),
default="rhs"))
rules_subset <- subset(rules_sales, (rhs %in% paste0("Product=", unique(sales$Product))))
inspect(rules_subset)
gives:
lhs rhs support confidence lift
1 {HouseOwnerFlag=0} => {Product=SV DVD Movies E100 Yellow} 0.05 0.5 10
2 {HouseOwnerFlag=0} => {Product=Fabrikam Refrigerator 4.6CuFt E2800 Grey} 0.05 0.5 5
3 {HouseOwnerFlag=1} => {Product=Contoso SLR Camera M144 Gold} 0.10 0.5 5
But you should be careful about your low support:
Warning in apriori(sales, parameter = list(support = 0.001, confidence = 0.5, :
You chose a very low absolute support count of 0. You might run out of memory! Increase minimum support.
Solution 2:
I was tricked by the definition of the parameter default. Using lhs and rhs at once tells each item that is assigned to one of them, that it can only be used for lhs/rhs. The parameter "default" is automatically set to "both" and all other items not used in lhs/rhs can be used for both (Explanation of the appearence parameter as implemented in the R package: http://www.inside-r.org/node/86290, I realised that it must be possible when reading the manual of the original C implementation: http://www.borgelt.net/doc/apriori/apriori.html#appearin). You have to set default="none"
then you can constrain lhs and rhs without using a subset later.
rules_sales <- apriori(sales,
parameter=list(support =0.001, confidence =0.5, minlen=2, maxlen=2),
appearance = list(lhs=c("HouseOwnerFlag=0", "HouseOwnerFlag=1"),
rhs=paste0("Product=", unique(sales$Product)), default="none"))
I am very late to the party... but as I am also playing now with the package, let me include my thoughts in case is helpful for someone.
The rules included in the output are the ones that are compliant with the support and confidence parameters. So, if you don't have any rules with the format you expect try relax these constraints: lower support, lower confidence. The lhs, as far as I have found can only contain one term, so you could restrict this part to the terms you want to appear (Product) in order to speed up the rules generation. I haven't tried on your specific dataset but I think this is general advise that should work in all cases.