Data locality & JOINing 2 large datasets with composite join key

Hi,

I am using Pig 0.20.0 and having 2 large data sets (log files) each is of
around 2-3 TBs in size in compressed format. I need to join both of them
with a composite join key of userId and timestamp.
UserId is common however as the log files are from 2 completely different
systems and may have different timestamp. So I have to use the logic of
join in such a way that:

Take the timestamp from one log, lets say timestampA, then consider all
records from other data sets where userId matches with that and timestamp
of B (say timestampB) be: (timestampA - x) >= timestampB >= (timestampA +
x) where x is like 5 minutes.

*Questions:*
1. So far I have used equi-join, but not sure how I can do non equi join.
2. Is there anyway to optimize the join operation? if I do the secondary
sort of userId + timestamp in both datasets, will it help?

Thanks
Amit

Data locality & JOINing 2 large datasets with composite join key

Trending Articles

RAMAYAMPET Mandal Sarpanch | Upa-Sarpanch | Ward member Mobile Numbers Medak...

लड़कियां सेक्स के दौरान क्यों करती है उह! आह!लड़कियां सेक्स के दौरान क्यों करती...

Neem Baba Extra Questions Answer Class 6 English Poorvi

Throw Back: 4×4 — Sikilitele (Ft Castro) Prod by JQ

Rajasthan Board 10th Result 2016 Roll No wise & Name Wise

Lowe faces four theft charges

Practice Sheet of Right form of verbs for HSC Students

Mafia, Murder & Mayhem In The Motor City: Detroit Mob Hit Timeline (1937-2007)

The 10 Tennessee Cities With The Largest Black Population For 2021

Materials Around Us Class 6 Worksheet Science Chapter 6

デスクトップヒープの枯渇

Best Suvichar in Hindi |बेस्ट सुविचार |शुभ विचार हिंदी में

Kanulanu Thaake Lyrics and translation | Manam (2014)

Korean Sex Porn Videos: XXX Videos & Free Porn Movies

Teen Shot In Miami Drive-By Dies From Injuries

Download: IQ Muzatasha feat Shy D & Pmj – Ulesi NiFertilizer Yamavuto

Mahakal Attitude Status

Property developer set up cannabis factory to help pay off debts...

♡

KB: How to troubleshoot issues when adding a Hyper-V host in System Center...