The State of Latency, Containers & Microservices

Methodology

Purpose

Turbonomic conducted this survey to better understand the role, measurement, and mitigation of latency in the modern datacenter. The investigation extended into the implementation of containers and microservices, as these architectures introduce new latency challenges. Our hope is that the results will instigate a data-driven conversation across the broader virtual and cloud community.

Sample

The data in this report were collected through a survey conducted from July 22, 2015 to August 3, 2015. The 554 survey respondents came from across the Enterprise IT and data center landscape. Respondents are of 18 years of age and older. In order to reveal the range in characteristics, respondents were identified demographically by their business and environment characteristics, such as role, business type, hosts in production, and virtual machines in production. This sample represents organizations spanning SMB to large enterprise, with various roles and responsibilities in those organizations.

Procedure

This survey recruited participants from an internal email database. Participants were given an opportunity to win a pass to a major industry conference by entering their email address at the completion of the survey. Additionally, participants were given the option to participate in a one-on-one interview subsequent to completing the survey. While the survey successfully recruited a significant sample size, the distribution of the sample weighs highly in Operations Management as a role and was well-distributed across business types. Data were collected electronically through an online survey. The survey itself was designed internally by a team, which included microservices subject matter experts, namely Turbonomic engineers.

Survey Flow

The data in this survey report were collected within a twelve day period. Progression through the twenty-nine survey questions depended on a respondent's level of interest in containers and microservices and and the current status of adoption. All respondents were asked the same questions about latency and its role, measurement, and mitigation within their organization. If a respondent was investigating containers or microservices as a current or future application delivery architecture, they were then asked about expected deployment time, perceived challenges and benefits, industry influence, as well as running these architectures in production or non-production environments.

Citing this Survey

We welcome your use of the results in this survey as you share insights with members of the broader IT community. Please reference Turbonomic and include our homepage URL, turbonomic.com, as you do so. A downloadable version of the complete dataset is available at github.com/turbonomic/turbonomicsurvey. Thank you.

Executive Summary

In July 2015, Turbonomic conducted an industry survey titled How Are You Fighting Latency? The survey aimed to explore trends among three related themes:

  1. Latency, the methods organizations used to mitigate latency, and the designation of latency-critical workloads across industry verticals.
  2. Container adoption penetration, emerging reasons for container adoption, and emerging barriers to container adoption.
  3. Microservices adoption penetration, emerging reasons for microservices adoption, and emerging barriers to microservices adoption.

Several high-level findings included the following:

Latency Findings

  • 91% of participants agreed or strongly agreed that the minimization of latency is important to their company, with 23% of participants identifying a clear majority of hosted workloads (between 60 percent and 100 percent of workloads) as being Latency-Critical.
  • Although most participants are in agreement that the minimization of latency is important to their company, a full 32% of participants either do not measure latency or do not know if their company measures latency.
  • There exists a disconnect between the tactics organizations currently leverage to minimize latency, and those which they deem would be most effective at minimizing latency.

Container Findings

  • Just 4% of participants have actually deployed containers.
  • The greatest challenge in managing latency within container architectures is managing storage latency.
  • The top reason for implementing containers is to accelerate the application development lifecycle.
  • Docker remains far and away the top container standard, followed by CoreOS and LXC.

Microservices Findings

  • 11% of participants have microservices-based applications in their production environment, nearly three times the adoption rate of containers, suggesting predominantly VM-based microservices deployments.
  • Of those participants without microservices currently in production, 34% will deploy microservices within the next two years and 66% have no intention to deploy microservices.

The following analysis addresses these findings, their implications, and enables organizations to benchmark where they fall within the participant mix.

  • 46% of respondents are in an Operations Management role
  • Industry verticals were distributed with majority concentrations in Professional Services, MSP, Manufacturing, and Financial Services
  • 89% of participants operate a private or hybrid cloud environment
  • 53% of participants have an annual IT budget of less than $500,000 USD
  • 61% of respondents have an environment with at least 25 hosts and 500 VMs
  • 3% of respondents have environments with more than 1,000 hosts and 20,000 VMs

Why Latency?

Latency is the time interval between a stimulation and response, or, from a more general point of view, is a time delay between the cause and the effect of some physical change in the system being observed. In computing, latency is a physical constraint determined by the distance between networked components, their physical transmission limits and the manner in which software interacts with the infrastructure on which it runs.

Although your purview may lead you to associate latency with a given entity, storage or network for example, latency is truly the sum of all operations – including overhead inherent in application code – required to transmit the encoded impulses that constitute a service. Even as advances in compute, storage, and network architectures have reduced latency from minutes to seconds and milliseconds, so too have advances in business raised the expectations and reliance upon increasingly fast application response times.

Amid this shift in business expectations, a shift in software architectures – toward virtualization, containerization, and distributed computing – has introduced a new wave of complexities that threaten the millisecond scale on which we operate, and a new wave of challenges for the humans tasked with assuring this scale.

Our survey sought to understand the prevalence and measurement of latencycritical applications in today's data centers, the methods used to assure SLAs, and the perceived efficacy of these methods.

Latency: Measurement & Verticals

  • 32% of participants either don't measure latency ordon't know if their organization measures latency
  • 23% of participants classify a clear majority of workloads as latency-critical
  • Top 3 industry verticals for latency-critical workloads were Managed Service Providers, Financial Services & Insurance, and Healthcare

Latency: Mitigation Tactics

  • Respondents are predominantly focused on controlling and mitigating latency within the LAN/WAN edge as opposed to endpoints.
  • There is a greater focus on mitigating latency across IT domains as opposed to within specific domains.
  • 58% of respondents use infrastructure monitoring software and manual troubleshooting to mitigate application latency
  • 49% of respondents run virtualized workloads on dedicated clusters to mitigate application latency
  • For efficacy, infrastructure monitoring software and dedicated clustering were rated 6/13 and 4/13, respectively

Latency: Measurement, Mitigation, & Verticals

Latency: MSPs, Financials, & Healthcare

The charts on the previous pages illustrate three noteworthy survey findings. Survey Question 11, "How does your company measure network latency?" was compared against an uncharted question (Question 10) in which 90.7% (n = 356) of respondents Agreed or Strongly Agreed that the avoidance and minimization of latency is important to their company. When this proportion is considered in light of Question 11, a full 32.3% of participants either do not measure latency or do not know if they measure latency. Given the reported importance of latency mitigation in Question 10, it is concluded that participants have either overstated the importance of latency mitigation, or that a sizable portion of respondents take a reactive approach to latency mitigation.

In Question 7, "Approximately what proportion of your production workloads are Latency-Critical (i.e. business needs cannot tolerate high latency levels)?" 22.8% of respondents (n = 356) identified a clear majority (60%-100%) of their workloads as being Latency-Critical. Of these (n = 81), 51.8% fell within the Managed Service Provider, Financial Services & Insurance, and Healthcare verticals.

The scenarios and applications contributing to this effect are rather intuitive. Service Providers, whose customers must contend with the speed of the Internet, cannot afford to harbor latency within their local network impacting service delivery. Financial Services, whose environments often consist of banking and trading applications reliant on ultra-low latency, are a logical fit. As Meaningful Use becomes a healthcare imperative, immediate access and availability of Electronic Medical Records necessitates latency-critical classification in those environments.

Combatting Latency Across Domains

IT culture is inextricably bound to the architecture it supports. The days of monolithic client-server architecture were marked by appropriately monolithic teams, each fixed on its domain and its domain alone. As IT has largely transitioned to an era marked by virtualization, cloud, and

mobility, the data suggests that IT organizations have adjusted culturally also – to work across silos, reflective of their interdependence.

Our survey asked participants to rate their agreement, on a 4-point scale where 1 = Strongly Disagree and 4 = Strongly Agree, with the following statements:

  • I focus on minimizing application latency within my specific IT domain (e.g. only compute, only storage, only network)
  • I focus on minimizing application latency across various IT domains (e.g. across     compute, storage, and network)

For all participants, there was a majority focused on minimizing application latency across various IT domains (i.e. compute, storage, and network) (83.1%, X = 3.49) as opposed to within their specific domain (71.8%, X = 3.15). When scoped to the 81 participants with a clear majority of Latency-Critical workloads, the mix shifted to 100% (x̄ = 4.00) and 76.2% (x̄ = 3.29), respectively.

At a high level, this data supports the notion that as the imperative of low latency increases, so does the imperative of cross-functional latency mitigation. While additional surveying would be required to explore precisely what this means, it is clear that the old pattern of passing problems down the stack no longer suffices. Our hypothesis is that in addition to cross-domain cooperation, more and more IT professionals have become generalists of necessity, which is to say, no longer is it sufficient to specialize in one limited area. IT professionals must expand their skill sets to span domains and keep with the pace of architectural change. Again, this hypothesis requires additional surveying.

A Mismatch of Tactics

A primary objective of this survey was to identify and rank the methods organizations use to mitigate latency. Our findings were rather pronounced across methods, however, the most notable finding was that the most effective tactics are not practiced by a majority of organizations. This finding is most likely explained by budget, as the tactics deemed most effective also tend to be the most expensive.

Survey Question 14 asked Which tactics does your organization use to mitigate application latency (check all that apply)? and offered the following selections:

  • Running Latency-Critical applications on dedicated physical infrastructure (i.e. non-virtualized infrastructure)
  • Use of infrastructure monitoring software (i.e. Dell Foglight, Solarwinds, VMware vRealize

Operations) and manual troubleshooting

  • Use of application performance management/monitoring software (i.e. AppDynamics, New Relic, DynaTrace, AppNeta) and manual troubleshooting
  • Use of workflow scripting or load balancing to balance resource utilization
  • Running workloads on dedicated clusters
  • Virtualizing but not mixing diverse workloads
  • Implementation of low-latency network components
  • Implementation of Software-Defined Networking (SDN) technologies
  • Implementation of Network Function Virtualization (NFV) technologies
  • Implementation of All-Flash storage arrays
  • Implementation of Hybrid (auto-tiering Flash & Hard Disk) storage arrays
  • Implementation of Fibre Channel storage connectivity
  • Other (Please Specify)

Survey Question 15 asked For each of the practiced tactics you selected in Question 14, please rate their effectiveness at mitigating application latency within your organization.

The top tactics used to mitigate application latency (N=356):

  1. Infrastructure Monitoring Software (58.4%)
  2. Dedicated Clustering (49.2%)
  3. Hybrid Storage (43.3%)
  4. FC Connectivity (42.1%)
  5. Low-Latency Networking (40.2%)

The most effective tactics used to mitigation application latency:

  1. All-Flash Arrays (X = 3.27)
  2. FC Connectivity (X = 3.27)
  3. Hybrid Arrays (X = 3.21)
  4. Dedicated Clustering (X = 3.19)
  5. Low-Latency Networking (3.19)

When scoped to participants with a clear majority of Latency-Critical workloads, the results were thus:

The top tactics used to mitigate application latency (n=81):

  1. Infrastructure Monitoring Software (66.7%)
  2. Dedicated Clustering (65.43%)
  3. FC Connectivity (54.3%)
  4. Hybrid Storage (45.7%)
  5. Low-Latency Networking (40.7%)

The most effective tactics used to mitigation application latency:

  1. Low-Latency Networking (x̄ = 3.44)
  2. All-Flash Arrays (x̄ = 3.42)
  3. Hybrid Arrays (x̄ = 3.40)
  4. FC Connectivity (x̄ = 3.39)
  5. Dedicated Clustering (x̄ = 3.38)

Of particular note is that for both groups, the full participant population and the scoped sample, the usage of Infrastructure Monitoring Software was the number one tactic used to mitigate application latency. In both groups, however, this tactic failed to make the top tactics in terms of efficacy. Notable write-ins for Question 14 included the following:

  • Automated Workload Migration
  • Citrix XenApp + High Latency WAN Links
  • Hyperconverged Infrastructure (Nutanix)
  • 10 Gb LAN/VLAN + 40 Gb Storage Links
  • Nothing

Why Containers?

Containers are not new. In fact, their conceptual lineage dates back to the 1970s and chroot jail in UNIX, wherein an application's processes and dependencies were isolated from the rest of the system.

In 2014, the genesis of Docker catapulted containers back into the spotlight, as a prospective and seemingly inevitable replacement for their heavier cousins, virtual machines. Since Docker made its first headline, numerous production-ready alternatives have surfaced for consideration.

Similar to how virtualization abstracts the operating system away from the hardware, containerization abstracts the application away from the operating system. This concept unlocks a world of new methods for quickly developing, deploying, and delivering applications. Containers are orders of magnitude smaller than VMs, bear a fraction of the memory overhead, and can be provisioned in a matter of seconds instead of minutes.

Our purpose in exploring containers as part of this survey was to anticipate and investigate three related phenomena: (1) The continued adoption of containers in both pre-production and production (2) The driving forces behind this adoption and (3) The emergence of new latency-related challenges in containerized environments.

Containers: Adopters

  • 25% of respondents have already deployed or are investigating the deployment of containers
  • 84% of respondents who are investigating the deployment of containers plan to deploy in 2016 or later
  • Docker and CoreOS are the most popular container standards, with 59% and 24% adoption shares, respectively
  • 64% of respondents have deployed or will deploy containers into their production environment
  • 58% of respondents are interested in containers in order to Implement DevOps
  • 36% of respondents report Storage Latency as the greatest latency challenge in their container environment, however, 34% report not having any challenges

Why Microservices?

Microservices is an application architecture most succinctly defined as looselycoupled service orientated architecture (SOA) with bounded contexts. The term was first coined in 2005 by Dr. Peter Rogers, CEO of 1060 Research, at CloudComputing Expo during his presentation on Service-Oriented-Development on NetKernel. Specifically, Rodgers coined the term "Micro-Web-Services."

Microservices, as opposed to monoliths, relate application components as a graph of independent service-evoked functions, rather than a series of persistent and dependent tiers. Microservices deliver numerous benefits, including modularity, full encapsulation, and isolated persistence. Operationally, they enable continuous development/continuous integration, as well as independent testing, deployment, and scalability. When evoked, each microservices component communicates with its siblings laterally over the network, using language-agnostic APIs. When a component fails or requires an update, that component (rather than an entire tier) gets cloned or updated.

Our survey questioned respondents about their experience with microservices for two reasons. First, the rise of containers in production was hypothesized conducive to and/or correlated with the implementation of microservices architecture. Second, one of the greatest criticisms of microservices has been their introduction of network latency.

Critics argue that microservices trade one type of complexity - code complexity - for another - operational complexity. These viewpoints notwithstanding, our survey sought to understand microservices as it exists in the wild, at least as it does in 2015.

  • 11% of respondents have deployed microservices in production
  • 66% of respondents currently have no plans to deploy microservices
  • 51% of respondents are interested in microservices in order to Implement DevOps
  • 41% of respondents report Storage Latency as the greatest latency challenge in their microservices environment, however, 31% report not having any challenges

Containers: Still Early, and Not For Everyone

Our findings are consistent with prevailing commentary that despite a great deal of hype and publicity surrounding containers, very few organizations are actually using them. Question 16 asked participants, "Is your organization investigating using containers (i.e. Docker, CoreOS, LXC) for future deployment?" Of all respondents (n = 354), just 25% answered either 'Yes' (21%) or 'We already use them' (4%). An overwhelming 75% of respondents said No. While the promise and specific use cases of containers are well known, documented, and widely-accepted, the data suggest that at least for now, most organizations find their virtualized or bare metal deployments to be sufficient.

When scoped to the subset of respondents already using containers in their production environment, 42.9% (n = 14) have IT budgets of $5 million or greater and 50% have 250 or more hosts. Compared to the overall participant population, just 16% have budgets of $5 million or greater and 23% have 250 or more hosts. It is evident from the container-adopting sample that organizations with greater spending power are more prone to experiment with next-generation technologies like containerization. One must interpret this data and analysis with caution, as the sample size representative of container adopters is small.

Although at this time a majority of respondents are dismissive of containers, a similar survey in the future is likely to produce different results, especially since 84% of container contemplators plan to deploy in 2016 or beyond.

The momentum of central players like Docker and CoreOS, as well as the significant investments made by industry leaders VMware (Project Photon), Red Hat (Project Atomic), Microsoft (Hyper-V/ Windows Server Containers), and others, indicate that container technology is widely regarded as the future. It simply remains in its infancy.

Docker First, and Foremost

59% of container contemplators (n = 71, those who responded 'Yes' to Q16) plan to adopt

Docker. 71% of container adopters have chosen Docker. Clearly, the benefits of first mover advantage are playing-out for the container leader. Of particular challenge for every standard is differentiation in a category with so little room to differentiate. Containers, by their nature, are simple constructs. CoreOS' Rocket has positioned itself as a purer, more secure alternative to Docker. Management frameworks such as Docker Hub are one means of differentiation, however, a burgeoning landscape of third party tools dilutes much of this advantage.

"lmctfy" was the only notable mention of non-Docker/CoreOS/LXC container standards under consideration.

Microservices: Mostly VM-Based, For Now

Just 11% of respondents (n = 348) are currently running microservices in production. Given that only 4% of respondents are currently using containers, it can be concluded that a majority of existing microservices deployments are running as virtual machines (VMs).

Similar to the trends observed with containers, a disproportionately high number of respondents that have deployed microservices are from companies with IT budgets of $5 million or greater (x̄ = 32.4%). Microservices, it is said, trade code complexity for operational complexity. A portion of this operational complexity is that discrete services are assigned to many small development teams who develop, refine, and own that service. It follows logically that the organizations most capable of operating in this fashion are those with the budgets to support the requisite developer headcount.

Storage Latency: Emerging Challenge?

While the primary driver for adopting both containers and microservices is Implementing Dev/ Ops, we were surprised to find that the top challenge in mitigating both container- and microservices-related latency was fighting storage latency. The top criticism of these architectures is the east-west network latency they generate. We expected our survey to support this sentiment, however, given the low rates of adoption for both, we believe that future surveying when microservices are more mature will yield different challenge rankings.