An implementation of ROLLUP H2IRG on APACHE PIG

Introduction

Experimental Setup

Experiments' Results

View the Project on GitHub bigfootproject/pig/tree/pig-rollup

Experiments' Results

This figure shows the overrall comparison over six experiments with the metric is the runtime (second) of each experiment.
Create Infographics

We observe that the result of the experiment which used our ROLLUP outperforms the experiment which used the current ROLLUP with respect to the same data input set.
With Hybrid IRG+IRG, the runtime of map phase are much smaller than the one of the current ROLLUP, because in our ROLLUP, the mapper outputs only two records.
Due to the overhead of the cost of changing Unixtime to ISOtime, the map phase of the experiment which ran on rdns data set was incredibly large.


This figure shows the overrall comparison over two experiments with the metric is the runtime (second) of each experiment.
Create Infographics
The data input set is uniform_ish_32_30. Our implementation outperforms the current one of PIG, because the number of records we output in map-phase in less than the number of records the default ROLLUP outputs in map-phase.