Hi, I'm trying to create item similarity.
I gather items which users visit during shopping and then create a file:
user_id, item_id, weight (where weight can be: [1.0, 1.6, 1.9], depends on
user action type and data source)
UNION
-item_id, item_id, 1.0 (from items dictionary)
and I do provide a userFile, where user_id = -item_id
The idea is to get item similary. If any user visits item named "A", i want
to show him items "B", "c", "xxx" using preferences of other users.
The problem is that the last (???) mapreduce job returns 0 rows:
Here are my settings:
sudo -u oozie mahout recommenditembased \
--input visited_items_with_inverted_items \
--output result \
--similarityClassname SIMILARITY_LOGLIKELIHOOD \
--usersFile inverted_items \
--numRecommendations 500 \
--booleanData false \
--maxPrefsPerUser 100 \
--maxSimilaritiesPerItem 500 \
--minPrefsPerUser 0\
--maxPrefsPerUserInItemSimilarity 30 \
--threshold 0.91 \
--tempDir temp \
Some counters... I don't get what do they mean....
14/07/20 22:43:08 INFO mapred.JobClient:
org.apache.mahout.cf.taste.hadoop.item.ToUserVectorsReducer$Counters
14/07/20 22:43:08 INFO mapred.JobClient: USERS=7528530
14/07/20 22:43:43 INFO mapred.JobClient:
org.apache.mahout.cf.taste.hadoop.preparation.ToItemVectorsMapper$Elements
14/07/20 22:43:43 INFO mapred.JobClient:
USER_RATINGS_NEGLECTED=1,798,738
14/07/20 22:43:43 INFO mapred.JobClient: USER_RATINGS_USED=12,429,693
14/07/20 22:44:24 INFO mapred.JobClient:
org.apache.mahout.math.hadoop.similarity.cooccurrence.RowSimilarityJob$Counters
14/07/20 22:44:24 INFO mapred.JobClient: ROWS=3312879
14/07/20 22:45:18 INFO mapred.JobClient:
org.apache.mahout.math.hadoop.similarity.cooccurrence.RowSimilarityJob$Counters
14/07/20 22:45:18 INFO mapred.JobClient: COOCCURRENCES=35882374
14/07/20 22:45:18 INFO mapred.JobClient: PRUNED_COOCCURRENCES=0
14/07/20 22:46:00 INFO mapred.JobClient: Map input records=3312879
14/07/20 22:46:00 INFO mapred.JobClient: Map output records=17570268
14/07/20 22:46:00 INFO mapred.JobClient: Reduce input records=5221907
14/07/20 22:46:00 INFO mapred.JobClient: Reduce output records=3312879
14/07/20 22:46:34 INFO mapred.JobClient: Reduce input records=3312879
14/07/20 22:46:34 INFO mapred.JobClient: Reduce output records=3312879
14/07/20 22:46:34 INFO mapred.JobClient: Reduce input records=3312879
14/07/20 22:46:34 INFO mapred.JobClient: Reduce output records=3312879
14/07/20 22:47:06 INFO mapred.JobClient: Map input records=7528530
14/07/20 22:47:06 INFO mapred.JobClient: Map output records=3313251
14/07/20 22:47:06 INFO mapred.JobClient: Reduce input records=3313251
14/07/20 22:47:06 INFO mapred.JobClient: Reduce output records=3313251
14/07/20 22:47:40 INFO mapred.JobClient: Map input records=6626130
14/07/20 22:47:40 INFO mapred.JobClient: Map output records=6626130
14/07/20 22:47:40 INFO mapred.JobClient: Reduce input records=6626130
14/07/20 22:47:40 INFO mapred.JobClient: Reduce output records=3312879
14/07/20 22:48:26 INFO mapred.JobClient: Map input records=3312879
14/07/20 22:48:26 INFO mapred.JobClient: Map output records=3313251
14/07/20 22:48:26 INFO mapred.JobClient: Reduce input records=3313251
14/07/20 22:48:26 INFO mapred.JobClient: Reduce output records=0
why 0???
I gather items which users visit during shopping and then create a file:
user_id, item_id, weight (where weight can be: [1.0, 1.6, 1.9], depends on
user action type and data source)
UNION
-item_id, item_id, 1.0 (from items dictionary)
and I do provide a userFile, where user_id = -item_id
The idea is to get item similary. If any user visits item named "A", i want
to show him items "B", "c", "xxx" using preferences of other users.
The problem is that the last (???) mapreduce job returns 0 rows:
Here are my settings:
sudo -u oozie mahout recommenditembased \
--input visited_items_with_inverted_items \
--output result \
--similarityClassname SIMILARITY_LOGLIKELIHOOD \
--usersFile inverted_items \
--numRecommendations 500 \
--booleanData false \
--maxPrefsPerUser 100 \
--maxSimilaritiesPerItem 500 \
--minPrefsPerUser 0\
--maxPrefsPerUserInItemSimilarity 30 \
--threshold 0.91 \
--tempDir temp \
Some counters... I don't get what do they mean....
14/07/20 22:43:08 INFO mapred.JobClient:
org.apache.mahout.cf.taste.hadoop.item.ToUserVectorsReducer$Counters
14/07/20 22:43:08 INFO mapred.JobClient: USERS=7528530
14/07/20 22:43:43 INFO mapred.JobClient:
org.apache.mahout.cf.taste.hadoop.preparation.ToItemVectorsMapper$Elements
14/07/20 22:43:43 INFO mapred.JobClient:
USER_RATINGS_NEGLECTED=1,798,738
14/07/20 22:43:43 INFO mapred.JobClient: USER_RATINGS_USED=12,429,693
14/07/20 22:44:24 INFO mapred.JobClient:
org.apache.mahout.math.hadoop.similarity.cooccurrence.RowSimilarityJob$Counters
14/07/20 22:44:24 INFO mapred.JobClient: ROWS=3312879
14/07/20 22:45:18 INFO mapred.JobClient:
org.apache.mahout.math.hadoop.similarity.cooccurrence.RowSimilarityJob$Counters
14/07/20 22:45:18 INFO mapred.JobClient: COOCCURRENCES=35882374
14/07/20 22:45:18 INFO mapred.JobClient: PRUNED_COOCCURRENCES=0
14/07/20 22:46:00 INFO mapred.JobClient: Map input records=3312879
14/07/20 22:46:00 INFO mapred.JobClient: Map output records=17570268
14/07/20 22:46:00 INFO mapred.JobClient: Reduce input records=5221907
14/07/20 22:46:00 INFO mapred.JobClient: Reduce output records=3312879
14/07/20 22:46:34 INFO mapred.JobClient: Reduce input records=3312879
14/07/20 22:46:34 INFO mapred.JobClient: Reduce output records=3312879
14/07/20 22:46:34 INFO mapred.JobClient: Reduce input records=3312879
14/07/20 22:46:34 INFO mapred.JobClient: Reduce output records=3312879
14/07/20 22:47:06 INFO mapred.JobClient: Map input records=7528530
14/07/20 22:47:06 INFO mapred.JobClient: Map output records=3313251
14/07/20 22:47:06 INFO mapred.JobClient: Reduce input records=3313251
14/07/20 22:47:06 INFO mapred.JobClient: Reduce output records=3313251
14/07/20 22:47:40 INFO mapred.JobClient: Map input records=6626130
14/07/20 22:47:40 INFO mapred.JobClient: Map output records=6626130
14/07/20 22:47:40 INFO mapred.JobClient: Reduce input records=6626130
14/07/20 22:47:40 INFO mapred.JobClient: Reduce output records=3312879
14/07/20 22:48:26 INFO mapred.JobClient: Map input records=3312879
14/07/20 22:48:26 INFO mapred.JobClient: Map output records=3313251
14/07/20 22:48:26 INFO mapred.JobClient: Reduce input records=3313251
14/07/20 22:48:26 INFO mapred.JobClient: Reduce output records=0
why 0???