Vectorizing data in mapreduce mode

Hi everyone,

My Pig script generates the following -- results are stored in part-m-00000 to part-m-00004 files.

-bash-4.1$ hadoop dfs -ls /scratch/ItemIds

Found 7 items
-rw-r--r-- 1 userid supergroup 0 2013-12-23 11:13 /scratch/ItemIds/_SUCCESS
drwxr-xr-x - userid supergroup 0 2013-12-23 11:12 /scratch/ItemIds/_logs
-rw-r--r-- 1 userid supergroup 276019 2013-12-23 11:12 /scratch/ItemIds/part-m-00000
-rw-r--r-- 1 userid supergroup 272188 2013-12-23 11:12 /scratch/ItemIds/part-m-00001
-rw-r--r-- 1 userid supergroup 252597 2013-12-23 11:12 /scratch/ItemIds/part-m-00002
-rw-r--r-- 1 userid supergroup 236508 2013-12-23 11:12 /scratch/ItemIds/part-m-00003
-rw-r--r-- 1 userid supergroup 270658 2013-12-23 11:12 /scratch/ItemIds/part-m-00004

The output is stored as the Tab separated values:

userid1 itemid1 itemid2 itemid3 ......
userid2 itemid1 itemid2 itemid3 ......
......

I have following questions:

1. Is there a mahout utility that lets me point to the /scratch/ItemIds and will generate one file out of these 5 part files?

2. What is the recommended way of parsing this tab separated file in a mapreduce mode? I want to vectorize this data and would like to do that in a parallel mode. I know how to vectorize the data correctly and how to run K-means on that.

I have been using the following command to run my clustering algorithm on dummy data. Now, I want to ingest real data.

hadoop jar /apps/analytics/myanalytics.jar myanalytics.SimpleKMeansClustering -libjars /apps/mahout/trunk/core/target/mahout-core-0.9-SNAPSHOT.jar /:/apps/mahout/trunk/core/target/mahout-core-0.9-SNAPSHOT-job.jar:/apps/mahout/trunk/math/target/mahout-math-0.9-SNAPSHOT.jar

However, I am not sure if I write the code to vectorize data in my SimpleKMeansClustering class, will the above command run it in mapreduce mode?

Vectorizing data in mapreduce mode

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112