Here is an overview of current and past projects that I am leading (or have led) at IBM Research. You can also find a summary of my Ph.D. work here.

___big data systems' optimizations

This project focuses on improving the effeciency and scalabilty of big data systems. Our initial focus has been on fingerprinting big data workloads in large scale deployments. We have also proposed new optimizations to graph processing systems. Now, we are looking at using these systems in the Connected Health domain, focusing on the intersection of Internet of Things (IoT) and healthcare.

publications & patents

Hieroglyph: Locally-Sufficient Graph Processing via Compute-Sync-Merge

, and
ACM SIGMETRICS
Champaign-Urbana, USA,

Version Traveler: Fast and Memory-Efficient Version Switching in Graph Processing Systems

, , and
USENIX Annutal Technical Conference
Denver, Co,

TideWatch: Fingerprinting the Cyclicality of Big Data Workloads

, , and
IEEE INFOCOM
Toronto, Canada,

Mizan: A System for Dynamic Load Balancing in Large-scale Graph Processing

, , , , and
ACM EuroSys
Prague, Czech Republic,

To 4,000 Compute Nodes and Beyond: Network-aware Vertex Placement in Large-scale Graph Processing Systems

, and
ACM SIGCOMM Posters and Demos
Hong Kong, China,

___rethinking middleboxes in enterprise clouds

This project focuses on rearchitecting middleboxes to improve their elasticity and high availability. We have created a new abstraction---called Split/Merge---that enables the migration of stateful flows across middleboxes. We have also implemented system support for pico replication of flow state, allowing significant improvements when compared to existing approaches. We are now looking into supporting middleboxes and network function virtualization inside Platform as a Service (PaaS) clouds, exploring new ways to use middleboxes to implement DevOps features in microservice-based applications.

publications

Opportunities and Challenges in Adopting Microservice Architectures for Enterprise Workloads

, , and
USENIX Annutal Technical Conference (Practitioner Talk)
Denver, Co,

Gremlin: Systematic Resilience Testing of Microservices

, , , and
IEEE Conference on Distributed Computing Systems (ICDCS)
Nara, Japan,

Stateless Network Functions

, , , and
ACM SIGCOMM Workshop on Hot Topics in Middleboxes and Network Function Virtualization (HotMiddlebox). Also appeared as a poster in USENIX NSDI.
London, UK,

App–Bisect: Autonomous Healing for Microservice-Based Apps

and
USENIX Workshop on Hot Topics in Cloud Computing (HotCloud)
Santa Clara, CA,

Don’t Call Them Middleboxes, Call Them Middlepipes

, and
ACM SIGCOMM Workshop on Hot Topics in Software Defined Networking (HotSDN)
Chicago, IL,

Pico Replication: A High Availability Framework for Middleboxes

, and
ACM Symposium on Cloud Computing (SoCC)
Santa Clara, California,

Cementing High Availability in OpenFlow with RuleBricks

and
ACM SIGCOMM Workshop on Hot Topics in Software Defined Networking (HotSDN)
Hong Kong, China,

Escape Capsule: Explicit State is Robust and Scalable

, , and
USENIX Workshop on Hot Topics in Operating Systems (HotOS XIV)
Santa Ana Pueblo, New Mexico,

Split/Merge: System Support for Elastic Execution in Virtual Middleboxes

, , and
USENIX Symposium on Networked Systems Design and Implementation (NSDI)
Lombard, Illinois,

Fault Tolerance Solution For Stateful Applications

, and
US Patent US9110864
, (Granted)

awards

Publication Achievement Award

IBM Watson Research,

___cloud workload discovery & optimization

Moving enterprise workloads to the cloud is not easy. It is filled with practical and systems challenges across the entire lifecycle. In this research project, we look a wide spectrum of problems ranging from workload discovery, to placement, to migration, focusing on different enterprise and big data workloads. This project was awarded a 'Outstanding Research Accomplishment' for demonstrating over $100M in revenue impact.

publications & patents

CRONets: Cloud-Routed Overlay Networks

, , , , and
IEEE Conference on Distributed Computing Systems (ICDCS)
Nara, Japan,

QoX: Quality of Service and Consumption in the Cloud

, and
USENIX Workshop on Hot Topics in Cloud Computing (HotCloud '16)
Denver, CO,

What to Discover Before Migrating to the Cloud

, , , , and
IFIP/IEEE Integrated Network Management Symposium (IM)
Ghent, Belgium,

Virtual Machine Migration in an Over-committed Cloud

, , and
IEEE/IFIP Network Operations and Management Symposium (NOMS)
Maui, Hawaii,

Overdriver: Handling Memory Overload in an Oversubscribed Cloud

, , and
ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments (VEE)
Newport Beach, CA,

Application-aware Virtual Machine Migration in Data Centers

, , , , and
IEEE INFOCOM Mini-conference
Shanghai, China,

Network-aware virtual machine migration in datacenters

, , and
US Patent US8423646
, (Granted)

awards

Outstanding Technical Achievement Award

IBM Watson Research,
The Analytics for Logical Dependency Mapping (ALDM) is a light-weight solution for deep discovery of infrastructure and application topologies. ALDM has and continue to be used across many client engagements as part of migrating enterprise applications across data centers. The project was awarded an "Outstanding Accomplishments" for demonstrating greater than combined $100M revenue/savings impact.

Technical Accomplishment

IBM Watson Research,

___superclouds

Cloud computing is often compared to the power utility model as part of a trend towards the commoditization of computing resources. However, today’s cloud providers do not simply supply raw computing resources as a commodity, but also act as distributors, dictating cloud services that are not compatible across providers. We propose a new cloud service distribution layer, called a supercloud, that is completely decoupled from the cloud provider. The superclouds are entire clouds within and across clouds. To transform today's clouds into superclouds, we have created a nested virtualization layer called the Xen-Blanked that allows running your own hypervisor on top of other Xen-based clouds (e.g., Amazon EC2). We have also explored device virtualization abstractions that enable new ways of wiring applications in a multi-cloud deployment.

publications

Enabling Efficient Hypervisor-as-a-Service Clouds with Ephemeral Virtualization

, , , , , and
The 12th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments (VEE'16)
Atlanta, GA,

Software Defining System Devices with the ‘Banana’ Double-Split Driver Model

, and
USENIX Workshop on Hot Topics in Cloud Computing (HotCloud)
Philadelphia, PA,

Plug into the Supercloud

, and
IEEE Internet Computing Special Issue on Virtualization
Vol. 17, No. 2,

VirtualWire: System Support for Live Migrating Virtual Networks Across Clouds

International Workshop on Virtualization Technologies in Distributed Computing (Invited Talk)
New York, NY,

The Xen-Blanket: Virtualize Once, Run Everywhere

, and
ACM EuroSys
Bern, Switzerland,

Unshackle the Cloud!

, , , , and
USENIX Workshop on Hot Topics in Cloud Computing (HotCloud)
Portland, OR,

Method and apparatus to replicate stateful virtual machines between clouds

and
US Patent US9256463B2
, (Granted)

awards

Outstanding Technical Achievement Award

Project Lead, Superclouds
IBM Watson Research,
The Superclouds are entire clouds within and across clouds. We have created a nested virtualization layer called the Xen-Blanked that allows running your own hypervisor on top of other Xen-based clouds (e.g., Amazon EC2). We have also explored device virtualization abstractions that enable new ways of wiring applications in a multi-cloud deployment.

___deep cloud

This project focused on offering Blue Gene/P capabilities as a cloud service. The work builds on research in Grid computing and places them in a highly consumable service’s model. Grid computing, which in theory, shares many aspects of cloud computing, has not gained wide-scale adoption, primarily because it optimizes for system use, rather than user expectation. In contrast, cloud computing has focused on meeting user expectation of underlying system resources: offering users what they need, when they need it, and charging them for what they use. This research has solved at a number of challenges in making HPC systems more consumable. We have also studied different HPC applications from this cloud perspective. This project was awarded a 'Research Accomplishment' for demonstrating over $10M revenue impact.

publications & patents

To 4,000 Compute Nodes and Beyond: Network-aware Vertex Placement in Large-scale Graph Processing Systems

, and
ACM SIGCOMM Posters and Demos
Hong Kong, China,

Enabling High-Performance Computing as a Service

, , , , , , , , , and
IEEE Computer
Vol. 45, No. 10,

Scheduling Batch and Heterogeneous Jobs with Runtime Elasticity in a Parallel Processing Environment

, and
IEEE International Parallel and Distributed Processing Symposium Workshops, the 21st International Heterogeneity in Computing Workshop (HCW)
Shanghai, China,

Analysis and Modeling of Social Influence in High Performance Computing Workloads

, , , and
17th International European Conference on Parallel and Distributed Computing (Euro-Par)
Bordeaux, France,

On the Design of a Deep Computing Service Cloud

, , , and
INFORMS 2010 Service Science Conference
Taipei, Taiwan,

Time-Of-Use Pricing Policies for Offering Cloud Computing as a Service

, , , and
IEEE Service Operations and Logistics, and Informatics (SOLI)
QingDao, China,

A Service Composition Framework for Market-Oriented High Performance Computing Cloud

, , and
ACM International Symposium on High Performance Distributed Computing (HPDC)
Chicago, Illinois,

Strategic Placement Of Jobs For Spatial Elasticity In A High Performance Computing Environment

, and
US Patent US9311146
, (Granted)

System and method for dynamic rescheduling of multiple varying resources with user social mapping

, , , , and
US Patent US8479212
, (Granted)

Dynamic pricing of a resource

, , , , , , and
US Patent US8458011
, (Granted)

awards

Outstanding Technical Achievement Award

Principle Investigator, Deep Cloud
IBM Watson Research,
Deep Cloud is a framework for offering Blue Gene/P as a Service. This was depeloped as part of KAUST IBM Center for Deep Computing Research. This project was awarded a ‘Research Accomplishment’ for demonstrating over $10M revenue impact.

Eminence and Excellence Award

IBM Watson Research,

First Place

Demonstrated how Deep Cloud can be effectively used to support ensemble applications on a geographically distributed federation of supercomputing systems

___cyano

Leveraging the collective knowledge of large user-base can yield enormous intellectual property (e.g., Wikipedia, YouTube, etc.). Fundamental to these successes are technologies that allow users to easily co-create knowledge. In the case of business processes and best practices, such technologies did not exist for orchestration, enrichment, and analytics. Cyano is a flexible social-networking based platform that enables community-based process co-creation. It was fully developed and deployed by Research team. Cyano is built on top of a scalable vector-based semantic engine and an adaptive recommendation system. With a user base of over 12,000 active IT practitioners, Cyano captured over 550 of IBM IT Best Practices. This project was awarded a 'Research Accomplishment' for demonstrating over $5M cost savings.

publications & patents

Crowdsourcing and Service Delivery

, , , , and
IBM Systems Journal
Vol. 53, No. 6,

iPoG: Fast Interactive Proximity Querying on Graphs

, and
ACM Conference on Information and Knowledge Management (CIKM)
Hong Kong, China,

Social Computing and Governance in an Enterprise Service for Managing Business Processes

, , and
World Conference on Services
Bangalore, India,

Rule-Based Problem Classification in IT Service Management

, and
IEEE International Conference on Cloud Computing
Bangalore, India,

Measuring Proximity on Graphs with Side Information

, and
IEEE International Conference on Data Mining (ICDM)
Pisa, Italy,

SOAR: SOcially Aware Routing for Request Matching in Enterprise Environments

and
IEEE International Conference on Services Computing (SCC)
Honolulu, Hawaii,

SCOOP: Automated Social Recommendation in Enterprise Process

, and
IEEE International Conference on Services Computing (SCC)
Honolulu, Hawaii,

Efficient calculation of node proximity on graphs with side information

, and
US Patent US8346766
, (Granted)

Method for dispatching service requests

, , , , , , , and
US Patent US8385534
, (Granted)

System and method for constructing flexible ordering to improve productivity and efficiency in process flows

, , , and
US Patent 8,036,865
, (Granted)

Apparatus and method for identifying process elements using request-response pairs, a process graph and noise reduction in the graph

, , and
US Patent 7,761,398
, (Granted)

awards

Technical Accomplishment

IBM Watson Research,

Outstanding Innovation Award

Principle Investigator and Development Lead, The Cyano Process Wiki
IBM Watson Research,
Cyano is a social networking-based process co-creation platform. The award was given for demonstrating how Cyano has enabled the acceleration of capturing and enrichment of over 550 IBM IT best practices, with a community of over 12,000 subject matter experts. This project was awarded a ‘Research Accomplishment’ for demonstrating over $5M cost savings.

___i3

The Integrated Infrastructure Intelligence (i3) is a mashup and analytics framework for managing disparate infrastructure components. Developed as part of a First-of-a-Kind project with a large cable provider, i3 was used to monitor over 400,000 devices. A key component of i3 is an inference algorithm that uses historical failure patterns for discovering hidden topologies, even in the presence of noisy and incomplete data. i3 automatically identifies shared failure risk. When failure is detected, i3 tracks impacted users using an adaptive monitoring technique. This project was awarded a 'Research Accomplishment' for demonstrating over $10M revenue impact.

publications & patents

NetworkMD: Topology Inference and Failure Diagnosis in the Last Mile

, , and
ACM Internet Measurement Conference (IMC)
San Diego, CA,

Service Assurance Process Re-Engineering Using Location-aware Infrastructure Intelligence

, , and
IFIP/IEEE International Symposium on Integrated Network Management (IM)
Munich, Germany,

System and method for monitoring large-scale distribution networks by data sampling

, , , and
China Patent CN101237356
, (Granted)

Method and apparatus for component association inference, failure diagnosis and misconfiguration detection based on historical failure data

, , and
US Patent 7,937,347
, (Granted)

awards

Outstanding Technical Achievement Award

Principle Investigator and Development Lead, Integrated Infrastructure Intelligence
IBM Watson Research,