OPTAS: Optimal Data Placement in MapReduce

Changjian Wang, Yongrui Qin, Zhen Huang, Yuxing Peng, Dongsheng Li, Huiba Li

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

5 Citations (Scopus)


The data placement strategy greatly affects the efficiency of MapReduce. The current strategy only takes the map phase into account to optimize the map time. But the ignored shuffle phase may increase the total running time significantly in many jobs. We propose a new data placement strategy, named OPTAS, which optimizes both the map and shuffle phases to reduce their total time. However, the huge search space makes it difficult to find out an optimal data placement instance (DPI) rapidly. To address this problem, an algorithm is proposed which can prune most of the search space and find out an optimal result quickly. The search space firstly is segmented in ascending order according to the potential map time. Within each segment, we propose an efficient method to construct a local optimal DPI with the minimal total time of both the map and shuffle phases. To find the global optimal DPI, we scan the local optimal DPIs in order. We have proven that the global optimal DPI can be found as the first local optimal DPI whose total time stops decreasing, thus further pruning the search space. In practice, we find that at most fourteen local optimal DPIs are scanned in tens of thousands of segments with the pruning strategy. Extensive experiments with real trace data verify not only the theoretic analysis of our pruning strategy and construction method but also the optimality of OPTAS. The best improvements obtained in our experiments can be over 40% compared with the existing strategy used by MapReduce.

Original languageEnglish
Title of host publication 2013 International Conference on Parallel and Distributed Systems
PublisherIEEE Computer Society
Number of pages8
ISBN (Electronic)9781479920815
Publication statusPublished - 2013
Externally publishedYes
Event19th IEEE International Conference on Parallel and Distributed Systems - Seoul, Korea, Republic of
Duration: 15 Dec 201318 Dec 2013
Conference number: 19

Publication series

ISSN (Print)1521-9097


Conference19th IEEE International Conference on Parallel and Distributed Systems
Abbreviated titleICPADS 2013
Country/TerritoryKorea, Republic of


Dive into the research topics of 'OPTAS: Optimal Data Placement in MapReduce'. Together they form a unique fingerprint.

Cite this