Victor Heorhiadi, Shriram Rajagopalan, Hani Jamjoom, Michael K. Reiter and Vyas Sekar
IEEE Conference on Distributed Computing Systems (ICDCS)
Nara, Japan, June 2016
Abstract. Modern Internet applications are being disaggregated into a
microservice-based architecture, with services being
updated and deployed hundreds of times a day. The
accelerated software life cycle and heterogeneity of
language runtimes in a single application
necessitates a new approach for testing the
resiliency of these applications in production
infrastructures. We present Gremlin, a framework for
systematically testing the failure handling
capabilities of microservices. Gremlin is based on
the observation that microservices are loosely
coupled and thus rely on standard message exchange
patterns over the network. Gremlin's centralized
control plane allows the operator to easily design
tests, while the generic data plane manipulates
inter-service messages at the network layer to
execute these tests. We show how to use Gremlin to
express common failure scenarios and how developers
of an enterprise application were able to discover
previously unknown bugs in their failure handling
code without modifying the application.
Keywords. Failure Injection, Testing, Microservices, DevOps, Cloud
Bibtex.
@inproceedings{jamjoom-gremlin-icdcs-2016,
author = {Victor and Heorhiadi and Shriram and Rajagopalan and Hani and Jamjoom and Michael K. and Reiter and Vyas and Sekar},
title = {{Gremlin: Systematic Resilience Testing of Microservices}},
booktitle = {IEEE Conference on Distributed Computing Systems (ICDCS)},
address = {Nara, Japan},
month = {June},
year = {2016}
}