After having integrated several versions of the Mahout and Myrrix recommenders at fairly large scale. I was interested in solving three problems that these did not directly provide for:
1) realtime queries for recs using data not yet incorporated into the training set. Myrrix allows this but Mahout using the hadoop mr version does not.
2) cross-recommendations from two or more action types (say purchase and detail-view)
3) blending metadata and user preference data to return recs (for example category & user preferences => recs)
Using Solr + Mahout provided an amazingly flexible and performant way to do this. Ted wrote about his experience with this basic approach in his recent book. Take user preferences, run them through RowSimilarityJob and you get an item by item similarity Matrix. This is the core of an item-based cooccurrence recommender. If you take the similarity matrix, and convert it into a list of tokens per row, you have something Solr can index. If you then use a user’s history as a query on the indexed data you get an ordered list of recommendations.
When I set out to do #1 and #3 the need for CF data AND metadata was the first problem. So I mined the web for video reviews and video metadata. Then logging any users who visit the site will lead to data for #2 and #1.
The demo site is https://guide.finderbots.com and instructions are at the end of this for anyone who would like to test it out. As a crude user test there is a procedure we ask you to follow to help gather quality of recommendations data. It’s running out of my closet over Comcast so if it’s down I may have tripped over a cord, sorry try again later.
There are a bunch of different methods for making recs illustrated on the site. One method that illustrates blending metadata uses preference data from you, and metadata to bias and filter recs. Imagine that you have trained the system with your preferences by making some video picks. Now imagine you’d like to get recommendations for Comedies from Neflix based on your previous video preferences. This is done with a single Solr query on indexed video fields that hold genre, similar videos (from the similarity matrix), and sources. The query finds similar videos to the ones you have liked, with the genre “Comedy” boosted by some amount, but only those that have at least one source = “Netflix”.
I’ll be doing some blog posts covering the specifics of how each rec type is done, the site and DB architecture, and Solr setup.
The project uses the Solr recommender prep code here: https://github.com/pferrel/solr-recommender
BTW I plan to publish obfuscated usage data in the github repo.
begin form letter =======================================
Please use a very newly updated browser (latest Firefox, Chrome, Safari, and nothing older than IE10) the site doesn’t yet check browser compatibility but relies on HTML5 and CSS3 rather heavily.
1) go to https://guide.finderbots.com/users/sign_up to create an account
2) go to https://guide.finderbots.com/trainers to ’train' the recommender hit thumbs up on videos you like. There are 20 pages of training videos, you can leave at any time but if you can go through them all it would be appreciated.
3) go to https://guide.finderbots.com/guides/recommend to immediately get personalized recs from your training data. If you completed the trainer check the top line of recs, count how many are videos you liked or would like to see. Scroll right or left to see a total of 24 in four batches of 6. If you could report to me the total you thought were good recs it would be greatly appreciated.
4) browse videos by various criteria here: https://guide.finderbots.com/guides These are not recommendations, they are simply a catalog.
5) control how you browse videos by clicking the gears icon. You can set all videos to be from one or more sources here. If you choose Netflix alone (don’t forget to uncheck ‘all’) then recs and browsed videos will all be available on Netflix.
1) realtime queries for recs using data not yet incorporated into the training set. Myrrix allows this but Mahout using the hadoop mr version does not.
2) cross-recommendations from two or more action types (say purchase and detail-view)
3) blending metadata and user preference data to return recs (for example category & user preferences => recs)
Using Solr + Mahout provided an amazingly flexible and performant way to do this. Ted wrote about his experience with this basic approach in his recent book. Take user preferences, run them through RowSimilarityJob and you get an item by item similarity Matrix. This is the core of an item-based cooccurrence recommender. If you take the similarity matrix, and convert it into a list of tokens per row, you have something Solr can index. If you then use a user’s history as a query on the indexed data you get an ordered list of recommendations.
When I set out to do #1 and #3 the need for CF data AND metadata was the first problem. So I mined the web for video reviews and video metadata. Then logging any users who visit the site will lead to data for #2 and #1.
The demo site is https://guide.finderbots.com and instructions are at the end of this for anyone who would like to test it out. As a crude user test there is a procedure we ask you to follow to help gather quality of recommendations data. It’s running out of my closet over Comcast so if it’s down I may have tripped over a cord, sorry try again later.
There are a bunch of different methods for making recs illustrated on the site. One method that illustrates blending metadata uses preference data from you, and metadata to bias and filter recs. Imagine that you have trained the system with your preferences by making some video picks. Now imagine you’d like to get recommendations for Comedies from Neflix based on your previous video preferences. This is done with a single Solr query on indexed video fields that hold genre, similar videos (from the similarity matrix), and sources. The query finds similar videos to the ones you have liked, with the genre “Comedy” boosted by some amount, but only those that have at least one source = “Netflix”.
I’ll be doing some blog posts covering the specifics of how each rec type is done, the site and DB architecture, and Solr setup.
The project uses the Solr recommender prep code here: https://github.com/pferrel/solr-recommender
BTW I plan to publish obfuscated usage data in the github repo.
begin form letter =======================================
Please use a very newly updated browser (latest Firefox, Chrome, Safari, and nothing older than IE10) the site doesn’t yet check browser compatibility but relies on HTML5 and CSS3 rather heavily.
1) go to https://guide.finderbots.com/users/sign_up to create an account
2) go to https://guide.finderbots.com/trainers to ’train' the recommender hit thumbs up on videos you like. There are 20 pages of training videos, you can leave at any time but if you can go through them all it would be appreciated.
3) go to https://guide.finderbots.com/guides/recommend to immediately get personalized recs from your training data. If you completed the trainer check the top line of recs, count how many are videos you liked or would like to see. Scroll right or left to see a total of 24 in four batches of 6. If you could report to me the total you thought were good recs it would be greatly appreciated.
4) browse videos by various criteria here: https://guide.finderbots.com/guides These are not recommendations, they are simply a catalog.
5) control how you browse videos by clicking the gears icon. You can set all videos to be from one or more sources here. If you choose Netflix alone (don’t forget to uncheck ‘all’) then recs and browsed videos will all be available on Netflix.