Abstract
The data placement strategy greatly affects the efficiency of MapReduce. The current strategy only takes the map phase into account to optimize the map time. But the ignored shuffle phase may increase the total running time significantly in many jobs. We propose a new data placement strategy, named OPTAS, which optimizes both the map and shuffle phases to reduce their total time. However, the huge search space makes it difficult to find out an optimal data placement instance (DPI) rapidly. To address this problem, an algorithm is proposed which can prune most of the search space and find out an optimal result quickly. The search space firstly is segmented in ascending order according to the potential map time. Within each segment, we propose an efficient method to construct a local optimal DPI with the minimal total time of both the map and shuffle phases. To find the global optimal DPI, we scan the local optimal DPIs in order. We have proven that the global optimal DPI can be found as the first local optimal DPI whose total time stops decreasing, thus further pruning the search space. In practice, we find that at most fourteen local optimal DPIs are scanned in tens of thousands of segments with the pruning strategy. Extensive experiments with real trace data verify not only the theoretic analysis of our pruning strategy and construction method but also the optimality of OPTAS. The best improvements obtained in our experiments can be over 40% compared with the existing strategy used by MapReduce.
Original language | English |
---|---|
Title of host publication | 2013 International Conference on Parallel and Distributed Systems |
Publisher | IEEE Computer Society |
Pages | 315-322 |
Number of pages | 8 |
ISBN (Electronic) | 9781479920815 |
DOIs | |
Publication status | Published - 2013 |
Externally published | Yes |
Event | 19th IEEE International Conference on Parallel and Distributed Systems - Seoul, Korea, Republic of Duration: 15 Dec 2013 → 18 Dec 2013 Conference number: 19 |
Publication series
Name | ICPADS |
---|---|
Publisher | IEEE |
ISSN (Print) | 1521-9097 |
Conference
Conference | 19th IEEE International Conference on Parallel and Distributed Systems |
---|---|
Abbreviated title | ICPADS 2013 |
Country/Territory | Korea, Republic of |
City | Seoul |
Period | 15/12/13 → 18/12/13 |