Dan Williams, Shuai Zheng, Xiangliang Zhang and Hani Jamjoom
IEEE INFOCOM
Toronto, Canada, April 2014
Abstract. Intrinsic to "big data" processing workloads (e.g.,
iterative MapReduce, Pregel, etc.) are cyclical
resource utilization patterns that are highly
synchronized across different resource types (e.g.,
CPU, memory, network) as well as the workers in a
cluster. In Infrastructure as a Service settings,
cloud providers do not exploit this characteristic
to better manage VMs because they view VMs as black
boxes. We present TideWatch, a system that
automatically identifies cyclicality and similarity
in running VMs. TideWatch predicts period lengths of
most VMs in Hadoop workloads within 9% of actual
iteration boundaries and suc- cessfully classifies
up to 95% of running VMs as participating in the
appropriate Hadoop cluster. Furthermore, we show how
TideWatch can be used to improve the timing of VM
migrations, reducing both migration time and network
impact by over 50% when compared to a random
approach.
Keywords. Hadoop, Resource Measurement, Cyclicality
Bibtex.
@inproceedings{jamjoom-tidewatch-infocom-2014,
author = {Dan and Williams and Shuai and Zheng and Xiangliang and Zhang and Hani and Jamjoom},
title = {{TideWatch: Fingerprinting the Cyclicality of Big Data Workloads}},
booktitle = {IEEE INFOCOM},
address = {Toronto, Canada},
month = {April},
year = {2014}
}