TideWatch: Fingerprinting the Cyclicality of Big Data Workloads

Dan Williams, Shuai Zheng, Xiangliang Zhang and Hani Jamjoom

IEEE INFOCOM

Toronto, Canada, April 2014

Abstract. Intrinsic to "big data" processing workloads (e.g., iterative MapReduce, Pregel, etc.) are cyclical resource utilization patterns that are highly synchronized across different resource types (e.g., CPU, memory, network) as well as the workers in a cluster. In Infrastructure as a Service settings, cloud providers do not exploit this characteristic to better manage VMs because they view VMs as black boxes. We present TideWatch, a system that automatically identifies cyclicality and similarity in running VMs. TideWatch predicts period lengths of most VMs in Hadoop workloads within 9% of actual iteration boundaries and suc- cessfully classifies up to 95% of running VMs as participating in the appropriate Hadoop cluster. Furthermore, we show how TideWatch can be used to improve the timing of VM migrations, reducing both migration time and network impact by over 50% when compared to a random approach.

Keywords. Hadoop, Resource Measurement, Cyclicality

Link. /publications/jamjoom-tidewatch-infocom-2014.pdf

Bibtex.

@inproceedings{jamjoom-tidewatch-infocom-2014,
author = {Dan and Williams and Shuai and Zheng and Xiangliang and Zhang and Hani and Jamjoom},
title = {{TideWatch: Fingerprinting the Cyclicality of Big Data Workloads}},
booktitle = {IEEE INFOCOM},
address = {Toronto, Canada},
month = {April},
year = {2014}
}