OPTAS: Optimal Data Placement in MapReduce

Changjian Wang, Yongrui Qin, Zhen Huang, Yuxing Peng, Dongsheng Li, Huiba Li

Research output: Chapter in Book/Report/Conference proceedingConference contribution

4 Citations (Scopus)

Abstract

The data placement strategy greatly affects the efficiency of MapReduce. The current strategy only takes the map phase into account to optimize the map time. But the ignored shuffle phase may increase the total running time significantly in many jobs. We propose a new data placement strategy, named OPTAS, which optimizes both the map and shuffle phases to reduce their total time. However, the huge search space makes it difficult to find out an optimal data placement instance (DPI) rapidly. To address this problem, an algorithm is proposed which can prune most of the search space and find out an optimal result quickly. The search space firstly is segmented in ascending order according to the potential map time. Within each segment, we propose an efficient method to construct a local optimal DPI with the minimal total time of both the map and shuffle phases. To find the global optimal DPI, we scan the local optimal DPIs in order. We have proven that the global optimal DPI can be found as the first local optimal DPI whose total time stops decreasing, thus further pruning the search space. In practice, we find that at most fourteen local optimal DPIs are scanned in tens of thousands of segments with the pruning strategy. Extensive experiments with real trace data verify not only the theoretic analysis of our pruning strategy and construction method but also the optimality of OPTAS. The best improvements obtained in our experiments can be over 40% compared with the existing strategy used by MapReduce.

Original languageEnglish
Title of host publication 2013 International Conference on Parallel and Distributed Systems
PublisherIEEE Computer Society
Pages315-322
Number of pages8
ISBN (Electronic)9781479920815
DOIs
Publication statusPublished - 2013
Externally publishedYes
Event19th IEEE International Conference on Parallel and Distributed Systems - Seoul, Korea, Republic of
Duration: 15 Dec 201318 Dec 2013
Conference number: 19

Publication series

NameICPADS
PublisherIEEE
ISSN (Print)1521-9097

Conference

Conference19th IEEE International Conference on Parallel and Distributed Systems
Abbreviated titleICPADS 2013
CountryKorea, Republic of
CitySeoul
Period15/12/1318/12/13

Fingerprint

Experiments

Cite this

Wang, C., Qin, Y., Huang, Z., Peng, Y., Li, D., & Li, H. (2013). OPTAS: Optimal Data Placement in MapReduce. In 2013 International Conference on Parallel and Distributed Systems (pp. 315-322). [6808189] (ICPADS). IEEE Computer Society. https://doi.org/10.1109/ICPADS.2013.52
Wang, Changjian ; Qin, Yongrui ; Huang, Zhen ; Peng, Yuxing ; Li, Dongsheng ; Li, Huiba. / OPTAS : Optimal Data Placement in MapReduce. 2013 International Conference on Parallel and Distributed Systems. IEEE Computer Society, 2013. pp. 315-322 (ICPADS).
@inproceedings{4baeb57470404665ac49e3c344bed8ef,
title = "OPTAS: Optimal Data Placement in MapReduce",
abstract = "The data placement strategy greatly affects the efficiency of MapReduce. The current strategy only takes the map phase into account to optimize the map time. But the ignored shuffle phase may increase the total running time significantly in many jobs. We propose a new data placement strategy, named OPTAS, which optimizes both the map and shuffle phases to reduce their total time. However, the huge search space makes it difficult to find out an optimal data placement instance (DPI) rapidly. To address this problem, an algorithm is proposed which can prune most of the search space and find out an optimal result quickly. The search space firstly is segmented in ascending order according to the potential map time. Within each segment, we propose an efficient method to construct a local optimal DPI with the minimal total time of both the map and shuffle phases. To find the global optimal DPI, we scan the local optimal DPIs in order. We have proven that the global optimal DPI can be found as the first local optimal DPI whose total time stops decreasing, thus further pruning the search space. In practice, we find that at most fourteen local optimal DPIs are scanned in tens of thousands of segments with the pruning strategy. Extensive experiments with real trace data verify not only the theoretic analysis of our pruning strategy and construction method but also the optimality of OPTAS. The best improvements obtained in our experiments can be over 40{\%} compared with the existing strategy used by MapReduce.",
keywords = "Data placement, MapReduce, OPTAS",
author = "Changjian Wang and Yongrui Qin and Zhen Huang and Yuxing Peng and Dongsheng Li and Huiba Li",
year = "2013",
doi = "10.1109/ICPADS.2013.52",
language = "English",
series = "ICPADS",
publisher = "IEEE Computer Society",
pages = "315--322",
booktitle = "2013 International Conference on Parallel and Distributed Systems",
address = "United States",

}

Wang, C, Qin, Y, Huang, Z, Peng, Y, Li, D & Li, H 2013, OPTAS: Optimal Data Placement in MapReduce. in 2013 International Conference on Parallel and Distributed Systems., 6808189, ICPADS, IEEE Computer Society, pp. 315-322, 19th IEEE International Conference on Parallel and Distributed Systems, Seoul, Korea, Republic of, 15/12/13. https://doi.org/10.1109/ICPADS.2013.52

OPTAS : Optimal Data Placement in MapReduce. / Wang, Changjian; Qin, Yongrui; Huang, Zhen; Peng, Yuxing; Li, Dongsheng; Li, Huiba.

2013 International Conference on Parallel and Distributed Systems. IEEE Computer Society, 2013. p. 315-322 6808189 (ICPADS).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - OPTAS

T2 - Optimal Data Placement in MapReduce

AU - Wang, Changjian

AU - Qin, Yongrui

AU - Huang, Zhen

AU - Peng, Yuxing

AU - Li, Dongsheng

AU - Li, Huiba

PY - 2013

Y1 - 2013

N2 - The data placement strategy greatly affects the efficiency of MapReduce. The current strategy only takes the map phase into account to optimize the map time. But the ignored shuffle phase may increase the total running time significantly in many jobs. We propose a new data placement strategy, named OPTAS, which optimizes both the map and shuffle phases to reduce their total time. However, the huge search space makes it difficult to find out an optimal data placement instance (DPI) rapidly. To address this problem, an algorithm is proposed which can prune most of the search space and find out an optimal result quickly. The search space firstly is segmented in ascending order according to the potential map time. Within each segment, we propose an efficient method to construct a local optimal DPI with the minimal total time of both the map and shuffle phases. To find the global optimal DPI, we scan the local optimal DPIs in order. We have proven that the global optimal DPI can be found as the first local optimal DPI whose total time stops decreasing, thus further pruning the search space. In practice, we find that at most fourteen local optimal DPIs are scanned in tens of thousands of segments with the pruning strategy. Extensive experiments with real trace data verify not only the theoretic analysis of our pruning strategy and construction method but also the optimality of OPTAS. The best improvements obtained in our experiments can be over 40% compared with the existing strategy used by MapReduce.

AB - The data placement strategy greatly affects the efficiency of MapReduce. The current strategy only takes the map phase into account to optimize the map time. But the ignored shuffle phase may increase the total running time significantly in many jobs. We propose a new data placement strategy, named OPTAS, which optimizes both the map and shuffle phases to reduce their total time. However, the huge search space makes it difficult to find out an optimal data placement instance (DPI) rapidly. To address this problem, an algorithm is proposed which can prune most of the search space and find out an optimal result quickly. The search space firstly is segmented in ascending order according to the potential map time. Within each segment, we propose an efficient method to construct a local optimal DPI with the minimal total time of both the map and shuffle phases. To find the global optimal DPI, we scan the local optimal DPIs in order. We have proven that the global optimal DPI can be found as the first local optimal DPI whose total time stops decreasing, thus further pruning the search space. In practice, we find that at most fourteen local optimal DPIs are scanned in tens of thousands of segments with the pruning strategy. Extensive experiments with real trace data verify not only the theoretic analysis of our pruning strategy and construction method but also the optimality of OPTAS. The best improvements obtained in our experiments can be over 40% compared with the existing strategy used by MapReduce.

KW - Data placement

KW - MapReduce

KW - OPTAS

UR - http://www.scopus.com/inward/record.url?scp=84900855361&partnerID=8YFLogxK

U2 - 10.1109/ICPADS.2013.52

DO - 10.1109/ICPADS.2013.52

M3 - Conference contribution

AN - SCOPUS:84900855361

T3 - ICPADS

SP - 315

EP - 322

BT - 2013 International Conference on Parallel and Distributed Systems

PB - IEEE Computer Society

ER -

Wang C, Qin Y, Huang Z, Peng Y, Li D, Li H. OPTAS: Optimal Data Placement in MapReduce. In 2013 International Conference on Parallel and Distributed Systems. IEEE Computer Society. 2013. p. 315-322. 6808189. (ICPADS). https://doi.org/10.1109/ICPADS.2013.52