Quantcast
Channel: Apache Timeline
Viewing all articles
Browse latest Browse all 5648

Configuring Pig to store results all in one file

$
0
0
Hi,

I am aware there are several threads on the topic already :), however the
suggestions out there didn't seem to work on my script.

My output folder contains many parts:
part-m-00000 part-m-00003 part-m-00006 part-m-00009 part-m-00012
part-m-00015 part-m-00018 part-m-00021 part-m-00024 part-m-00027
part-m-00030
part-m-00001 part-m-00004 part-m-00007 part-m-00010 part-m-00013
part-m-00016 part-m-00019 part-m-00022 part-m-00025 part-m-00028
_temporary
part-m-00002 part-m-00005 part-m-00008 part-m-00011 part-m-00014
part-m-00017 part-m-00020 part-m-00023 part-m-00026 part-m-00029

I am reading from one local file and executing in local mode so I would
expect getting only one part-m-00000 as my output. Any clue why I get more
than one part?

I pasted my script below:
register /home/kereno/pigmix.jar

page_views = load
'/home/kereno/more/pig-0.13.0-RC1/conversion_pig_scripts/page_views' using
org.apache.pig.test.pigmix.udf.PigPerformanceLoader() as (user, action,
timespent, query_term, ip_addr, timesta
mp,estimated_revenue, page_info, page_links);

page_views_flattened = foreach page_views generate user, action, timespent,
query_term, ip_addr, timestamp, estimated_revenue,
((map[]) page_info) as page_info, (bag{tuple(map[])})page_links as
page_links;

store page_views_flattened into 'parsed/ADM-format/page_views' using
org.apache.pig.builtin.PigStorage_for_AQL('\t');

Thanks,
Keren

Viewing all articles
Browse latest Browse all 5648

Trending Articles