MapReduce is the efficient framework for parallel processing of distributed big data in cluster environment. In such a cluster, task failures can impact on performance of applications. Although MapReduce automatically reschedules the failed tasks, it takes long completion time because it starts from scratch. The checkpointing mechanism is the valuable technique to avoid re-execution of finished tasks in MapReduce. However, defining incorrect checkpoint interval can still decrease the performance of MapReduce applications and job completion time. So, in this paper, checkpoint interval is proposed to avoid re-execution of whole tasks in case of task failures and save job completion time. The proposed checkpoint interval is based on five parameters: expected job completion time without checkpointing, checkpoint overhead time, rework time, down time and restart time. The experiments show that the proposed checkpoint interval takes the advantage of less checkpoints overhead and reduce completion time at failure time.
2018 International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery (CyberC)
Task analysis,Checkpointing,Fault tolerance,Fault tolerant systems,Computational modeling,Big Data,Google
Rework,Scratch,Task analysis,Computer science,Parallel processing,Real-time computing,Fault tolerance,Downtime,Big data,Distributed computing