Pico Replication: A High Availability Framework for Middleboxes

, and
ACM Symposium on Cloud Computing (SoCC)
Santa Clara, California,
Abstract. Middleboxes are being rearchitected to be service oriented, composable, extensible, and elastic. Yet system-level support for high availability (HA) continues to introduce significant performance overhead. In this paper, we propose Pico Replication (PR), a system-level framework for middleboxes that exploits their flow-centric structure to achieve low overhead, fully customizable HA. Unlike generic (virtual machine level) techniques, PR operates at the flow level. Individual flows can be checkpointed at very high frequencies while the middlebox continues to process other flows. Furthermore, each flow can have its own checkpoint frequency, output buffer and target for backup, enabling rich and diverse policies that balance---per-flow---performance and utilization. PR leverages OpenFlow to provide near instant flow-level failure recovery, by dynamically rerouting a flow's packets to its replication target. We have implemented PR and a flow-based HA policy. In controlled experiments, PR sustains checkpoint frequencies of 1000Hz, an order of magnitude improvement over current VM replication solutions. As a result, PR drastically reduces the overhead on end-to-end latency from 280% to 15.5% and throughput overhead from 99.5% to 3.2%.
Keywords. High Availability, Network Function Virtualization, Middleboxes, Software Defined Networking
author = {Shriram and Rajagopalan and Dan and Williams and Hani and Jamjoom},
title = {{Pico Replication: A High Availability Framework for Middleboxes}},
booktitle = {ACM Symposium on Cloud Computing (SoCC)},
address = {Santa Clara, California},
month = {Oct},
year = {2013}