Hi,
According to Pig's documention on union, two schemas which have the same
schema (have the same length and types can be implicitly cast) can be
concatenated (see http://pig.apache.org/docs/r0.11.1/basic.html#union)
However, when I try with:
A = load '1.txt' using PigStorage(' ') as (x:int, y:chararray,
z:chararray);
B = load '1_ext.txt' using PigStorage(' ') as (a:int, b:chararray,
c:chararray);
C = union A, B;
describe C;
DUMP C;
store C into '/home/kereno/Documents/pig-0.11.1/workspace/res';
with:
~/Documents/pig-0.11.1/workspace 130$ more 1.txt 1_ext.txt
::::::::::::::
1.txt
::::::::::::::
1 a aleph
2 b bet
3 g gimel
::::::::::::::
1_ext.txt
::::::::::::::
0 a alpha
0 b beta
0 g gimel
I get in result:~/Documents/pig-0.11.1/workspace 0$ more res/part-m-0000*
::::::::::::::
res/part-m-00000
::::::::::::::
0 a alpha
0 b beta
0 g gimel
::::::::::::::
res/part-m-00001
::::::::::::::
1 a aleph
2 b bet
3 g gimel
Whereas I was expecting something like
0 a alpha
0 b beta
0 g gimel
1 a aleph
2 b bet
3 g gimel
[all together]
I understand that two files for non-matching schemas would be generated but
why for union with a matching schema?
Thanks,
Keren
According to Pig's documention on union, two schemas which have the same
schema (have the same length and types can be implicitly cast) can be
concatenated (see http://pig.apache.org/docs/r0.11.1/basic.html#union)
However, when I try with:
A = load '1.txt' using PigStorage(' ') as (x:int, y:chararray,
z:chararray);
B = load '1_ext.txt' using PigStorage(' ') as (a:int, b:chararray,
c:chararray);
C = union A, B;
describe C;
DUMP C;
store C into '/home/kereno/Documents/pig-0.11.1/workspace/res';
with:
~/Documents/pig-0.11.1/workspace 130$ more 1.txt 1_ext.txt
::::::::::::::
1.txt
::::::::::::::
1 a aleph
2 b bet
3 g gimel
::::::::::::::
1_ext.txt
::::::::::::::
0 a alpha
0 b beta
0 g gimel
I get in result:~/Documents/pig-0.11.1/workspace 0$ more res/part-m-0000*
::::::::::::::
res/part-m-00000
::::::::::::::
0 a alpha
0 b beta
0 g gimel
::::::::::::::
res/part-m-00001
::::::::::::::
1 a aleph
2 b bet
3 g gimel
Whereas I was expecting something like
0 a alpha
0 b beta
0 g gimel
1 a aleph
2 b bet
3 g gimel
[all together]
I understand that two files for non-matching schemas would be generated but
why for union with a matching schema?
Thanks,
Keren