Best Practices for Cloud-Era Application Performance Management

The Expanding Role of the Cloud in Enterprise IT

While on-premise deployments may still be the default for enterprise applications today, most everyone agrees that SaaS and Cloud are the future. The question from the CXO is no longer, "Should we consider putting this application in the cloud?" It is, "Why wouldn't we?" In fact, a recent IBM survey shows that the cloud's strategic importance to business decision-makers, such as CEOs, CMOs, CFOs, HR directors, and procurement executives, is poised to double from 34 percent to 72 percent, vaulting over their IT counterparts at 58 percent.

The reasons for this shift are many:

  • Better, faster, cheaper. Cloud-based applications and services offer a superior feature set, greater scalability, reduced complexity and cost, and easier management vs. deploying and managing applications on premise.
  • Strategic impact. As businesses face more competition and mounting pressure to innovate and accelerate time to market, there's little room for IT to squander time and money building and managing infrastructure services that are readily available via the Cloud.
  • User adoption. IT has a huge role in driving the broad adoption of cloud services in the enterprise, but most of their business users have already embraced it for their personal use. As such, they expect all the same benefits from the apps and services they use at work.
  • Proven value. Enterprise-class cloud applications like Office 365, SharePoint, Azure, Gmail, AWS, Salesforce.com, DropBox and WorkDay are proving themselves in the largest enterprises with superior features, security, performance and ROI. Given the choice between on-premise and cloud-based alternatives, most users will choose Cloud.
  • Mission-critical. One of the fastest-growing areas of cloud-based apps is in what Gartner refers to as the "cloud office" apps segment: Email, Communication & Collaboration, and Document Creation & Storage. These apps and services are fundamental to the operation of any organization. If they go down, the entire organization takes a huge productivity hit. IT organizations must deliver, manage and ensure availability of mission-critical cloud services.
  • The Future. The combination of strong market demand and healthy competition will help drive significant adoption of cloud-based apps over the next five years. Even Microsoft, a company synonymous with the use of personal computers within the business, is now leading their product cycles in the cloud, with features being delivered in on-premise versions months or even years after they are available from the cloud.

The Cloud's role in the enterprise is expanding quickly. Savvy IT teams are already embracing this change, but it's still early days and there are a lot of unknowns and potential pitfalls. The question is how IT can deliver on the high expectations of their users in an era of Cloud-based IT.

The Rainbow of Cloud

In modern computing, a "cloud," like its meteorological counterpart, is kind of a fluffy term that can be difficult to precisely define. Born out of datacenter co-location, hosters, and application service providers (ASPs), cloud computing is simply the delivery hardware (infrastructure), application sub-components (platform), or end-user applications as a service, meaning that customers effectively "rent" these facilities on an incremental basis. Originally conceived as a way to move to a more economical metered operational expense cost structure from a pay-up-front capital expense structure, cloud computing has evolved and matured. And many organizations now look to cloud solutions to provide best-in-class features as well as cost advantages.

Models of Cloud Computing

Software as a Service (SaaS) – in which applications are hosted by a vendor or service provider and made available to customers over a network, typically the Internet. Popular examples include Office 365, Google Apps, Salesforce.com, Workday, Expensify and Zendesk.

SaaS is becoming an increasingly prevalent delivery model as underlying technologies that support Web services and service-oriented architecture (SOA) mature and new developmental approaches, such as Ajax, become popular. In addition, broadband service has become increasingly available to support user access from more areas around the world.

Platform as a Service (PaaS) - provides a computing platform and a solution stack as a service. In PaaS, the consumer creates software or services using tools and/or libraries from the provider. The consumer also controls software deployment and configuration settings. The provider provides the networks, servers, storage, and other services that are required to host the consumer's application. Examples include Google App Engine, Microsoft Azure Services and Salesforce.com's Force.com.

PaaS is used primarily by organizations building and managing software that want to leverage 3rd party components (such as database) in an "as-a-service" model. There are several advantages. Operating system features can be changed and upgraded frequently. Geographically distributed development teams can work together on software development projects. Services can be obtained from diverse sources that cross international boundaries. PaaS also offers mechanisms for service management, such as workflow management, discovery, reservation, etc.

Infrastructure as a Service (SaaS) - (or hardware as a service) is a model in which an organization (the customer) outsources the equipment used to support operations, including server, storage and networking. The service provider owns the equipment and is responsible for housing, running and maintaining it. The client typically pays on a per-use basis and is responsible for managing what runs on the hardware; e.g., the OS, software, related updates, etc. There are a plethora of IaaS providers out there but the best known are Amazon Web Services (AWS), IBM SoftLayer, Microsoft Azure and Rackspace.

Your New Role in a Cloud-Enabled IT Environment

As Cloud becomes reality in the enterprise, IT professionals must adapt to a new set of expectations and requirements – driven by the following:

IT's Shift from Owner/Operator to Consumer Coordinator

Instead of acquiring and operating on-site infrastructure and applications for the enterprise, IT professionals will be expected to coordinate business services for employees and end-users. Their role will be to ensure their "customers" are getting the performance levels they need to speed communication, increase collaboration and accelerate individual and organizational productivity. While many IT folks have already embraced this role to some degree, the requirement becomes much more pronounced as a greater share of their IT infrastructure and services go the way of the Cloud.

The Growing Power of the Crowd

To assure service levels in a SaaS/cloud environment, IT will have to be able to monitor and troubleshoot infrastructure they cannot touch – the end-to-end service delivery chain from their premises, through the various ISPs, to the application provider and back. This is the only way to effectively detect, isolate, and resolve issues affecting cloud application performance before they negatively impact users and their organizations. As such, IT professionals need to embrace the concept of the "Crowd."

SaaS and Cloud applications are by definition shared by a global community of customers. So it stands to reason that monitoring of these services could and should be done in a shared manner as well. There are already examples of the Crowd monitoring the cloud in informal ways through Twitter. But that's not enough – especially for mission-critical apps and services. Technologies exist today that enable IT organizations to monitor and aggregate data across all users of a SaaS service. The greater the number of monitoring points, the more accurately IT can detect and isolate specific problem spots that degrade service levels and user experience. This concept will be key to delivering on service level expectations and you can expect IT professionals to find interesting new ways to put it to work for their organizations.

Monitoring Cloud App Performance – Common Misconceptions

As organizations make the move to the cloud, some IT teams fall prey to a few common misconceptions – grounded in a general belief that once they move to the cloud, IT no longer owns direct responsibility for service levels. These can put them on a path to protracted outages and frustrated users.

The fact is that if your users can't access a cloud-based service, they are not going to call the service provider. They are going to call the IT help desk (maybe you) directly and the IT team will be expected to fix whatever problem exists, ASAP. Users don't care whether or not the problem is located in infrastructure owned and operated by their IT department, the ISP, or the cloud service provider. If they aren't having a good experience, IT will take the heat.

With that in mind, here are four common misconceptions to watch out for:

  • "I don't need to monitor. I have a guaranteed SLA from the provider." A SaaS service provider is likely able to run their datacenters with higher availability than most IT organizations, but they are not 100%. Guarantees are great, but if you aren't monitoring your SaaS service, how do you know that your SLA is actually being met? In addition, service level guarantees only cover outages that the provider can control, i.e. their own networks, servers, and applications - not your infrastructure nor the ISPs that connect you. You're on your own to monitor and manage those.
  • "I don't need my own monitoring tools. I use the service provider dashboard." Service health dashboards only cover the service provider's infrastructure, not the end-to-end service. They provide generic information that may or may not be relevant to your users and may not be up to date. Remember, they are built to be general status communication tools, not real-time monitoring solutions.
  • "I didn't monitor my hosted application. Why monitor it now?" Consuming apps from the cloud is not the same as consuming managed/ hosted services. Managed Service Providers (MSPs) and ISPs are often running dedicated infrastructure for you and monitoring those services on your behalf. Those services often extend to provide monitoring and management of your on-premise infrastructure as well. While there are MSPs offering value added services around your cloud application, you still have to monitor the solution yourself. On the other end of the spectrum, web monitoring solutions often either run generic protocol tests or run from the providers' locations rather than within your own network. None of these solutions can provide active, end-to-end monitoring of service performance and user experience from behind your firewall to the service provider and back.
  • "I don't need to monitor. My users tell me when they are having problems." This may be okay for less critical applications, but for most organizations, communication and collaboration apps, like email, are mission critical. If the service is down, so is your company. So what happens when the users report a problem? Where do you start to look? Do you immediately get put on hold with the service provider's support line? The problem is likely not even on their end. Speed to resolution is key. You want to be notified before users are impacted and when an issue is identified you want to isolate it and get it resolved as quickly as possible. Moving to the cloud doesn't mean your monitoring and management responsibilities go away, but it does fundamentally change the rules of engagement. You have to be able to monitor and troubleshoot infrastructure you cannot touch – the end-to-end service delivery chain from your premises, through the various ISPs, to the application provider and back. By doing this, you have the ability to quickly detect, isolate, and resolve issues affecting cloud application performance before they negatively impact your users and your organization.

Using the Crowd to Manage Your Cloud

The Crowd. We talk about it a lot because we believe it holds the key to effectively monitoring and managing cloud based applications and platforms.

The way IT consumes the cloud is changing. The use of IT as a Service is continuing to grow, but use of "black box" Software as a Service (SaaS) and Platform as a Service (PaaS) is growing even faster. With these offerings IT can achieve significant capital and operational cost benefits through use of turn-key solutions that require no datacenter management. However, in realizing these benefits they can also lose the deep application performance visibility they've had with their traditional on-premise apps.

IT needs to view the SaaS and PaaS offerings they consume as distributed, global services that rely on an interconnected service delivery chain of Application Service Provider, Internet Service Provider, and their own local infrastructure to enable them to work correctly. No single monitoring point will ever be able to reliably tell you how well the service is operating end-to-end, much less help pinpoint problems in that service delivery chain outside your firewall. Even for organizations with many locations from which they can monitor, the number of potential service delivery paths will vastly outnumber their points of visibility.

And while a single monitoring point isn't up to the task, not monitoring at all and hoping you don't have problems isn't a strategy either. IT is still ultimately responsible for the application performance and user experience. But how do you do that? What if you could access data from a vast array of other customer locations globally, in addition to your own locations? In aggregate that data would show performance trends and service delivery problem points, be they at the service provider, in the Internet fabric, or in your own network.

That global network of SaaS and PaaS customers? That's the crowd, and IT teams must find ways to harness and expose data from that crowd to regain the visibility into application performance they lost when going to the cloud. The potential of the crowd for IT is huge and goes well beyond performance monitoring.

With easy exchange of performance and configuration data by the crowd, IT teams have the ability to take much of the guesswork out of their management of cloud-based apps. Looking for best practices in deploying Workday? Don't wait for an analyst whitepaper. Query and analyze real data from the crowd of other Workday customers. Wondering what the impact will be for a change in your ADFS configuration? Run a hypothetical analysis against the data. IT decisions can be made more quickly and with greater confidence.

We are in the early stages of a fundamental change in the roles and responsibilities of business IT, brought about by the irresistible benefits of cloud based apps and services. This transformation will be as profound as the changes brought on by desktop computing, server virtualization, or tablets. There are numerous challenges along the way, but as new tools and practices evolve, IT teams will find themselves able to support and accelerate business objectives in ways unimaginable before the cloud.

Legacy Monitoring Solutions Leave Users Hanging in the Cloud

Given that IT organizations continue to need to monitor apps, even in the cloud, the question becomes "how." The first place most IT organizations will (and should) look are the ones they currently use for their existing on-premise, legacy apps and infrastructure. However, it may not be that simple.

Recently we conducted a study asking IT teams about their current and planned use of cloud apps and services within their organizations. The full results are published here, but one particular point that stood out is that less than 20% of the respondents felt that their existing tools were doing a good job managing their cloud-based apps. The rest were at best ambivalent about their existing tools with more than 20% feeling that their existing tools just aren't up to the task.

Why is this? The Systems Management software market is mature and solutions from Microsoft, HP, CA, and BMC have been on the market for years. A look at their portfolios shows a wide range of sophisticated tools to manage everything from software distribution, to monitoring, to IT workflow and help desk activities. Surely these tools should be able to effectively manage cloud-based apps and services.

As it turns out, they don't. Here's why.

Infrastructure Ownership and Access

Prior to the cloud, IT's management responsibilities didn't extend much beyond the walls of the building, or for larger organizations, beyond the periphery of their corporate area network. And the systems management tools were built and optimized with that basic assumption. As an administrator I managed MY servers in MY datacenter on MY network. I had the luxury of having direct access to network, storage, and compute nodes that produced ample amounts of log files or SNMP messages. All I needed were tools that could tap into those data feeds, alert me when something happens that I care about, and maybe correlate logs from multiple systems so I could search for and identify trends.

But with cloud all that has changed and in fact, much has gone away. Yes, if I run my apps on an IaaS provider like AWS or Azure, I can still access app and even OS logs, but below the OS I'm blind. I can't directly access hardware or any of the network nodes.

If I use SaaS apps I don't even get this. They're completely black box. There are no log files to access, no SNMP messages to listen to, and most likely not even a management API to interface with.

If your tools rely on these mechanisms, you're stuck.

The Convergence of Application Management and Network Operations

For many organizations, the IT team is segregated into 3 basic camps: desktop management, application management, and infrastructure operations. The infrastructure team manages everything from the OS/VM down; the servers and network. They also tend to be the ones who manage security and Active Directory.

The apps team gets their machines from the ops team and from that point on they install, manage, and monitor their apps. If users experience a problem with the app, the apps team uses their tools to look for reported errors from the application. If those errors point to network issues, they contact the ops team, who use their own tools (which provide a lot of low level information about the network but don't really know anything about the applications running on the network) to try and hunt down the problem. For traditional on-premise apps, the user device, access network, and application likely all reside on the same network, so between the apps tools and the ops tools, the teams can usually find and fix whatever issue exists quickly.

With cloud-based apps everything changes. IT moves out of the role of infrastructure owner/operator into the role of service consumer/coordinator. It's the application as a service that must be monitored and maintained, and that service is built on a complex web of networks, servers, and other services, most of which fall outside the organization's firewall.

An application admin has to have insight into the health of the application service itself as well as the networks and services (like Active Directory Federation Services) that are key to the delivery of that application service. That wall between the apps and ops teams no longer works.

We see this a lot. An IT team will get stuck trying to find a problem affecting SaaS app performance. Neither the application admins, nor the network admins have a full view of the service delivery chain, so they go back and forth pointing fingers and guessing haphazardly trying to find the root cause.

We call this "chasing ghosts." You don't have time to chase ghosts.

The Agility Mismatch

Have you ever stood up and deployed one of these traditional systems management solutions? The fact that there is a healthy industry of consultants and system integrators with official certifications and ISO-9000 compliant project plans tells you something. It's not for the faint of heart.

It's not that these tools are poorly engineered or designed to be arbitrarily complex. They evolved into their complexity as the sophistication and complexity of on-premise enterprise applications and infrastructure management exploded over the past decade. It's that same complexity explosion (and the drag it has put on IT agility) that is driving many organizations to the cloud. In fact, in the earlier referenced survey, nearly 50% of respondents indicated that agility was a key driver for their move to the cloud, while less than 40% pointed to costs.

So, if your goal is agility, you need to make sure your IT management tools are as agile as the apps and services you plan to manage with them. If new apps and services come into your portfolio and are updated by the providers on a weekly basis, a management tool stack that updates every 12 months doesn't do you much good.

It's time to look beyond the tools that evolved out of complexity to ones that are born in the cloud.

With the changes in the role of IT and the need to manage and monitor infrastructure outside the four walls of the enterprise, IT management tools must change also. The "Consumerization of IT" has become a cliché, but you need only spend a few minutes in an app like Expensify or the admin consoles for Office 365, AWS, or Google to see that they are becoming much more like Twitter and Facebook and less like SAP. We are all now conditioned to expect user interfaces that are simple, intelligent, and friendly.

Yet, most IT systems management software solutions still require a lot of heavy lifting to deploy and use. We know these legacy solutions can't provide much visibility into the performance and availability of cloud based apps. Beyond that, though, a lot of them still subject IT to an amazing amount of effort and complexity just to deploy and manage the management software itself! IT teams won't put up with systems management and monitoring tools that require a team of consultants to stand-up, especially when they are trying to simplify things by moving apps to the cloud. Instead, management tool providers must make their solutions easier to deploy and manage – exactly like the cloud based business apps and services they are running their business on.

Who, What and When – User and Workload Readiness

Cloud-based applications are a lot easier to use and deploy, but they still involve some complexity, especially when it involves mission-critical services like email, collaboration and communication. Here are a few things organizations should consider in their planning:

  • What is the total cost of ownership (TCO) of your existing on-premise environment vs. the cloud-based alternative? What are the projected cost savings in Year 1? Year 3? Year 5?
  • What is the current utilization of your on-premise service? How much waste can be eliminated by moving to an on-demand, cloud-based service?
  • Are your users ready? How do they use current features of the on-premise service?
  • Which ones are best suited to take advantage of the advanced features that are likely available in a cloud-based service?
  • How can you segment those users to assist in your migration planning?
  • How can you ensure the best possible outcomes for all involved?

It's important to consider these questions before starting your migration to determine where to start your implementation and to ensure early success.

Performance Monitoring in the Cloud Era – It Takes a Village

As we've discussed above, the cloud fundamentally changes the requirements for monitoring performance. Instead of a dedicated app service completely controlled inside the four walls, IT teams now have manage and maintain app service levels for users where everything is shared - the datacenter it runs in; the servers, storage, databases; the ISP networks delivering the apps. Almost all of the service delivery chain, except for a little bit behind the company firewall, is shared by the organization and lots of other companies using the app. There's no "one butt to kick" if there are problems in that shared infrastructure. Rather, there are many – multiple app and network providers – so finding the right one to kick is itself a big challenge, which can result in lengthy outages and poor user experience.

With all these stakeholders, application service level management is no longer about "me." It's about "we." And because cloud apps and services are inherently global, distributed, and shared, so too should the task of monitoring. By recognizing this and adopting tools and practices optimized to these characteristics, IT can not only address the challenges of managing cloud-based apps but actually open-up opportunities to increase its value and relevance to the business.

Three technology trends are fundamental to this transformation: Crowd Sourcing, Real-time Collaboration, and Big Data Analytics.

Crowd Sourcing

A tremendous amount of the software being used in enterprises today is a result of crowd sourcing. This is another aspect of the "Power of the Crowd" we discussed earlier. By enabling hundreds or even thousands of developers to simultaneously work on a piece of software, crowd sourcing, like Open Source, speeds development and improves quality. Waze, Wikipedia, Kickstarter, and even CAPTCHA and Duolingo are examples of the power of crowds in action.

The common theme in all of the above is the ability to quickly solve problems by dividing them up and distributing them to a large, unmanaged, group of individuals. So, why not empower IT teams to do the same? As a user of Salesforce.com, for example, I'm inherently part of the global user community. It would be great if I could easily harness the power of that community, both passively (e.g. obtaining real user performance data from other customers) and actively (e.g. leveraging the crowd to globally test a DNS entry). This would allow me to scale my IT operational capacity as limitlessly as the cloud scales my compute and storage capacity.

Real-time Collaboration

To be effective though, this collaboration with the crowd needs to be done in real-time. Facebook, Twitter, and Instagram have conditioned us to expect instantaneous many-to-many communication in our personal lives, and this is pervading computing in the enterprise as well. Microsoft's Delve, which blends Office, SharePoint, and Yammer, is a great example of where things are heading.

However, real-time collaboration isn't just about tagging and text messaging. IT, in particular, needs real-time data collaboration with the crowd as well. In financial markets, stock exchanges provide this type of real-time data for securities, and investors make decisions (perhaps too readily) based on this data. In many ways, IT is becoming a cloud app broker for the business, and like their financial counterparts, IT can better serve its customers when it has this type of real-time crowd data access.

Big Data Analytics

Big data analytics and business intelligence solutions have become essential tools for many financial, sales, and marketing organizations. Collecting the data is important, but the real value comes from deriving actionable intelligence from it. IT will increasingly need to employ these types of solutions to help it detect, pinpoint, and resolve service delivery issues based on the real-time crowd data it collects. Splunk has been a leader in this for traditional on-premise applications, but more solutions like this are needed in the cloud.

Monitor Early and Monitor Often

"Vote early and vote often." Back in the 1920's and 30's, when neither election technology nor oversight were as effective as they are today, and the likes of Al Capone were at work gaming the system, this phrase wasn't a joke. It was a best practice!

What does this have to do with cloud computing? All too often we see IT teams taking a "buy it and hope it works" strategy when it comes to adopting cloud based apps. They migrate their entire user base to the cloud on faith, assuming that they can worry about performance and availability issues later, if ever. After all, everybody in the company accesses the internet today without issues so your cloud apps should work just fine, right?

Well…maybe…unless of course they don't.

In fact, if you think about it, your users frequently experience "outages" ranging from individual websites to the internet as a whole. As an admin, though, you haven't dealt with most of them because, frankly, it isn't your job to ensure Bob in Marketing can access YouTube. Now, however, you do own availability for your enterprise cloud apps. Wouldn't you rather find out if those will have problems before you've gone past the point of no return and have everybody, including your five bosses, screaming at you because they can't access their email, docs, or the CRM system?

Contrary to what you may have thought, performance and availability monitoring isn't something you roll-out at the tail end of your migration to the cloud, but something you get in place before you start migrating users, so you can work out all the kinks in the service delivery chain between your points of access and the cloud apps before you have users depending on them. This is something you probably already do when you roll-out a new on-premise application, and it's even more valuable with cloud apps.

Here's an example. One of our customers is a large electronics manufacturer with over 30 worldwide locations. As they recently looked to migrate to SharePoint Online they were conscious that user experience might vary considerably across their locations due to differences in local infrastructure, ISP networks, and the Microsoft Office 365 datacenters serving each location.

The IT team determined that the best way to ensure success at full deployment was to begin performance monitoring at the same time they began their SharePoint Online pilot, before any actual production users were on the service. During this time they were able to methodically test from each location, controlling network configurations and load to establish baseline performance expectations for each location and across locations for their full global deployment. They were also able to diagnose and troubleshoot configuration problems, not just with the SharePoint Online service itself but with their supporting network infrastructure, including DNS and Active Directory Federation Services (ADFS).

Finally, they were able to confirm their organizational readiness to move forward with a production roll-out. They had demonstrated and measured expected availability and performance during the pilot, and reports generated by CloudReady allowed them to communicate these results effectively to their business and IT decision makers responsible for the SharePoint Online deployment.

Armed with these baseline availability and performance expectations measured by Exoprise CloudReady, the IT team is now able to effectively monitor and detect anomalies to SharePoint Online service delivery as they proceed with their production roll-out. They know what performance they should be seeing at each location and are alarmed when/if performance deviates from those established thresholds.

They aren't simply hoping users are happy with their SharePoint Online experience. By leveraging performance monitoring from the very beginning the IT team has gone a long way to guaranteeing they are.

Are you CloudReady?

Cloud based apps and services will continue to transform enterprise IT every bit as much as the introduction of the PC, the internet, virtualization, and smart mobile devices. For most enterprises, software and infrastructure as a service (SaaS and IaaS) free IT departments from the never-ending and seemingly impossible task of building and maintaining the infrastructure required to run their business apps, supporting the ever increasing business demands for mobility, device support, disaster recovery, security and reliability. The Cloud service provider deals with all of that complexity, and with their scale and sophistication, they generally do it better and at lower cost than any individual IT team can. This is a huge win for IT which can focus instead on assembling and managing application portfolios that maximize the productivity of their users.

These shifts bring some big changes to your roles and responsibilities, but they don't change everything. You still own availability and performance for apps in the cloud and you still need to manage service levels and user experience. Service level dashboards are a piece of the puzzle, but dashboards alone fall short of the level of visibility and proactive notification that enterprises need to make the Cloud everything it can be. Most legacy systems management and monitoring tools have significant limitations in their ability to manage cloud-based apps and services. If your apps are moving to the cloud, then you also need to look to new tools that are fully optimized to monitor and manage cloud app performance – leveraging the inherent shared, distributed nature of cloud services – before, during and after migration to the cloud.

About Exoprise

Exoprise empowers IT teams with solutions that enable effective adoption and management of mission-critical, cloud based applications and services with its CloudReady application performance management solution. CloudReady provides real-time performance visibility from behind the firewall to the cloud and back. This synthetic monitoring technology also leverages network path diagnostics and crowd sourced data analytics to pinpoint problems and ensure the best possible cloud service performance. Exoprise helps customers get to the cloud faster and ensure success once they are there.

For a free trial of Exoprise, visit www.exoprise.com/freetrial