How does the Amazon recommendation system work? – Analyze the algorithm and make a prototype that visualizes the algorithm

I think the recommendation systems are interesting.

I decided that I wanted to learn how the Amazon recommendation works in theory and afterwards implement a demo-site that visualized the algorithm. My implementation is not identical as the Amazon’s version but it follows the same principle. My main focus is to illustrate and visualize the concept. I used a item-to-item matrix table for simplicity.

Reference reading materials:
(1) http://www.cs.umd.edu/~samir/498/Amazon-Recommendations.pdf
(2) Amazon Recommendation Patent
(3) http://stackoverflow.com/questions/2323768/how-does-the-amazon-recommendation-feature-work
(4) http://maya.cs.depaul.edu/mobasher/papers/ewmf04-web/node1.html
(5) http://blog.echen.me/2011/02/15/an-overview-of-item-to-item-collaborative-filtering-with-amazons-recommendation-system/

I recommend reading (1). This is only 5 pages long and easy to read. Amazon’s algorithm is based on item-to-item filtering. In short: Amazon developed their own recommendation system based on items rather than users, because there are fewer items than users (scalability). The list of visited items by any user is stored in a item-to-item matrix table. The recommendation algorithm are calculated by using the Cosine Similarity function on the vectors from the matrix table.

I recommend reading (2) for technical insights and design overview of how Amazon has implemented it.

For the demo-site I wrote a short abstract. The demo-site can be seen here.
http://jory.dk/AreaRecommendation/.
The demo project can be downloaded here

Abstract: 
How does the Amazon recommendation works? 

This is about visualizing the item to item collaborations filtering mechanism using a item-to-item matrix table.
The item-to-item matrix, the vectors and the calculated data values are displayed.

There are n different items and the item recommendation can display up to m items.

There are implemented different item-to-item neighborhood functions. 
A simple max count of seen neighbor items, the Cosine Similarity and the Jaccard Index.

A tracker keeps track of visited items for any user and is saved to a matrix table.
To make it simple only the relation between previous and current viewed item are tracked in this example.

Design

The demo-site has two pages: Home and item page. The item page shows specific information regarding the viewed item and the recommendation is displayed here.

There are 5 different components

Multiple view – shows multiple components, all the items are displayed here
User view – shows specific information about the current user in the session
Item view – shows detailed information about the current item
Recommendation view – shows recommended items based on the current item
Data view – visualizes the data structure used by the recommendation algorithm

Interactions

This shows how the tracker collects the data from the users in to the matrix table. (The illustrated tracking method is a simplified version. You could also iterate the viewed items when a user view a new item to save all the item-to-item relation for the viewed items).

When a new visitor user3 sees the item A,
the recommendation system founds out the closest match are the Items B and C.

The general idea of the Amazon recommendation engine is to locate item vectors which are similar in pattern for the current viewed item vector.

e.g. A vector with pattern [1,1,1,0,0,0,0] is more similar to vector [0,1,1,0,1,0,0] than to vector [0,0,0,1,0,1,1].

Written by kunuk Nykjaer

March 4, 2012 at 6:43 pm

Posted in Algorithm, Csharp, Data Mining, Visualization

Tagged with amazon, item to item collaboration filtering, recommendation engine, recommendation system

13 Responses

Subscribe to comments with RSS.

[…] How does the Amazon recommendation system work? – Analyze the algorithm and make a prototype that … […]

Mahout tiedonlouhinnan funktioita « Olipa kerran Bigdata

August 13, 2012 at 8:22 pm

Reply
- Hi. This is some cool stuff.
  
  I wish you had a more detailed explanation though:)
  
  Dee
  
  August 17, 2014 at 8:50 am
  
  Reply
Very good.

Thomas Packer

October 18, 2012 at 11:09 pm

Reply
Hi. I just saw your blog. I loved the explanation part.
Just one thing: Why do we need to calculate the similarity matrix? Can’t we just select the items from item-item matrix with maximum value? For example: in your matrix, if user is looking at product A then B and C would be recommended because those products have maximum value.

adivvy

September 24, 2013 at 10:31 pm

Reply
- Here we are tracking data and behavior from the users. Then based on that we calculate some information we can apply to new visitors.
  
  It depends on what you want to achieve. There are different algorithms you can apply.
  This is just to show the general concept. I don’t think one algorithm is more correct than other. Depends on what you want (But usually it is about getting more income by giving people what you think they want).
  
  The similarity calculation is based on the idea that some people have identical interest and by tracking their behavior you get a DNA or some unique identifier for those group. Then when a new visitor arrives you try to identify which group that visitor belongs to and display the information which would be interesting for the group.
  
  By taking the maximum value how to you know you are displaying the most interesting item for the visitor?
  Maybe it will, or not. You will have to investigate how the max-value strategy works compared to the other options.
  
  I am not sure but I think taking max-value is about taking the most interesting item disregarding where the visitor might be grouped to and this is not so much about identifying similarity behavior. I could be wrong. I have just played with the concept lightly and implemented misc. algorithms. For more in dept answer you might seek some recommendation system forums.
  
  The concept is as I said, collect data and behavior. Then based on that implement a strategy which you think will give the visitors the items they are interested in by tracking the visitors behavior.
  
  kunuk Nykjaer
  
  September 29, 2013 at 1:15 pm
  
  Reply
  - What is the cost function of your model?
    
    adivvy
    
    March 23, 2014 at 10:16 pm
Thank you so much, this is great.

I have one question though: in the item to item matrix column B we have [1,0,2,0], I’m not sure how you calculated it? We have only 1 C

med

March 17, 2015 at 10:17 am

Reply
- You can try it yourself here http://jory.dk/AreaRecommendation There you can click the Clear Viewed Items button to simulate a new user. When you go to product b, then c, then b again. Then item to item matrix for b and c will have value 2.
  
  kunuk Nykjaer
  
  March 17, 2015 at 3:55 pm
  
  Reply
[…] https://kunuk.wordpress.com/2012/03/04/how-does-the-amazon-recommendation-system-work-analyze-the-al… […]

Nguồn dữ liệu thường được dùng trong data mining | Elementary os

May 8, 2015 at 8:33 am

Reply
Just wanted to say thank you for writing this. I found this article as well as the links provided very useful for a project I did earlier this year.

Matthew

October 4, 2015 at 10:18 am

Reply
Very nice article and demo.. thanks for posting this..

Santosh M K

May 14, 2017 at 9:10 am

Reply
[…] 그림은 Software Programming blog의 How does the Amazon recommendation system work?을 손질한 […]

지능이라는 게임(10): 집단지능과 블록체인(2) – 필암문화원(PICS)

May 12, 2018 at 6:51 am

Reply
[…] Maguire, J., & Matthews, J. (2012). Are we all cultural intermediaries now? An introduction to cultural intermediaries in context. European Journal Of Cultural Studies, 15(5), 551-562. doi: 10.1177/1367549412445762Nykjaer, K. (2019). How does the Amazon recommendation system work? – Analyze the algorithm and make a prototype that visualizes the algorithm. Retrieved from https://kunuk.wordpress.com/2012/03/04/how-does-the-amazon-recommendation-system-work-analyze-the-al… […]

Algorithms as cultural intermediaries – meco6936

April 9, 2019 at 10:23 am

Reply