Operations Management Suite (OMS): Other OMS Solutions

In this chapter, we will cover multiple OMS solutions. These solutions are not closely connected, but in some cases, complement one another in delivering a more complete monitoring solution. The chapter will start with Active Directory (AD) Replication and will then move on to Containers, Capacity and Performance, custom Linux monitoring, and MySQL or MariaDB on Linux.

Even if these solutions are not directly connected, you will notice that with more OMS solutions enabled, more data is collected, providing a more complete picture of your environment. For example, in troubleshooting scenarios, you can correlate additional types of log data and more easily find the root cause by looking across many components and services.

AD Replication Status

Active Directory Domain Services is a key technical service to many organizations. Active Directory provides services to many critical business services in most IT environments. To make Active Directory highly available, each domain controller stores its own copy of the database. Changes to these databases are replicated between all domain controllers to make sure everyone has the latest information. If the replication fails between domain controllers, it can result in downtime and widespread identity and authentication-related issues for the organizations.

To make it easier for IT administrators to monitor the replication between domain controllers, OMS includes the AD Replication Status solution. The AD Replication Status solution is the cloud-based version of the on-premises AD Replication Tool, a well-respected tool in the AD community, built by Microsoft Technical Support.

Both the AD Replication Status solution and the AD Replication Tool provide an overview of replication status of all domain controllers in a forest, prioritize errors, and provide guidance on how to correct errors. Both the solution and tool provide an early warning if there are partitions in Active Directory that have not been replicated for some time, or if the partition has reached the tombstone lifetime time limit. The tombstone lifetime used by the solution is read from Active Directory, meaning that custom tombstone settings can be used with this solution.

The AD Replication Status solution uses the AdvisorAssessment program to collect data. The AdvisorAssessment is also used by multiple assessment solutions in OMS (see the chapter on "Assessment Solutions" in this book). Configuration (what to collect) is read from a configuration file and the data is then sent to OMS for analysis. The configuration files are of type EXECPKG (for example, SQLExecutionPackage.execpkg and BaselinePackage.execpkg). For the AD Replication Status solution, a configuration file named ADReplicationPackage.execpkg is used.

As with other solutions, you enable the AD Replication Status solution in the OMS Solution Gallery, as shown in Figure 1, and no additional configuration is required. By default, it will run on all domain controllers. If you do not have any domain controllers with an OMS agent, or would like to use a member server for other reasons, you can set the following registry key on any servers with the OMS agent.

HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\HealthService\Parameters\Management Groups\<ManagementGroupName>\Solutions\ADReplication

FIGURE 1. AD REPLICATION STATUS SOLUTION

By default, the solution collects data every 5 days (7200 minutes). Once data is collected the AD Replication Status dashboard will be populated with information, as shown in Figure 2.

FIGURE 2. AD REPLICATION STATUS SOLUTION DASHBOARD

The dashboard is divided into four columns. The first two columns from the left side provide information about destination and source servers with errors. Showing the servers both as destination and source is good from a troubleshooting point of view, so we can quickly see which types of AD replication traffic and servers that are having problems. In Figure 3, we can see that server west-id-001 seems to have problem with incoming replication traffic. West-id-001 has 13 errors as destination servers and the other servers have a total of 13 errors as source servers. We can quickly see that west-id001 most likely is the root cause for all replication errors.

FIGURE 3. AD REPLICATION ERRORS

By clicking on one of the servers, OMS takes us to the Log Search page where we can drill into details of each error, as shown in Figure 4. We can also see that computer submitting the monitoring data is west-mgm-01, which is a member server and not a domain controller. The Computer field is the server submitting the information, whereas the SourceServer and DestinationServer fields contain the source and destination domain controllers in AD replication.

FIGURE 4. DETAILS FOR AD REPLICATION ERROR

Figure 4 also shows Lastsyncresult, which in error scenarios, contains a non-zero value representing the error event ID. Each replication error also contains a help link with for information about the error. The help link page includes information that explains the error, the cause and how to resolve it. Figure 5 shows details for an event, with a help link.

Figure 6 shows the target help page of the help link.

FIGURE 5. DETAILS FOR AD REPLICATION ERROR INCLUDING HELP LINK

FIGURE 6. EXAMPLE ON REPLICATION ERROR KNOWLEDGE ARTICLE

The last column shown in Figure 1 is Tombstone Lifetime. By default, the tombstone lifetime is 180 days. If a domain controller fails to replicate a directory partition within the tombstone lifetime, replication will no longer be running for this domain controller and partition. This will result in complex manual cleanup tasks for an Active Directory administrator.

The last column in Figure 1 shows partitions that have been replicated and the age, based on the tombstone lifetime. Even if the AD Replication status solution only runs every five days, it gives an administrator time to see and react on replication errors before reaching thresholds beyond which point recovery steps are more complex.

Containers

Containers, and Docker in particular, are an area of great interest and widespread adoption. The Containers solution provides monitoring functionality for Docker environments, including:

  • Visualizing near real-time information about Container status and resource consumption, images, and log events
  • Visualizing real time performance information
  • Surfacing audit information from Docker host
  • Troubleshoot issues related to Dockers, images, and hosts

When looking at the capabilities of the solution more closely, it becomes clear there are no advanced analytics in the background. Instead, the power is in a centralized dashboard for the entire Docker environment, which positions container host stats and errors, and container (and source container image) details in a single dashboard, making identification of source cause much easier.

For example, if you were only looking at the comparative resource consumption of multiple container hosts, why one is exhibiting much higher CPU utilization than the others may not be obvious. Coupled with container details, like the number of running containers on each host, and container resource consumption, issues of unequal container density or a problem container image become easier to spot.

For the latest information about supported Linux Operating Systems, Docker, ACS Mesosphere DC/OS, and Windows Operating System, please see GitHub https://github.com/Microsoft/OMS-docker

You enable the Containers solution from the Solution Gallery in the OMS portal, shown in Figure 7. The solution collects data every 3 minutes and should quickly be populated with data if OMS agents are already installed on Container hosts. If you are planning to collect data from Container hosts that already have the OMS agent installed but no Container service (Docker is not present), then you need to reinstall the OMS agent after the Container service is installed.

Tip: If running Containers on CoreOS, which does not allow installation of agents on the host, you can download and install a containerized version of the OMS agent. The OMS agent in the container will listen to other containers and send data to OMS. For more information see GitHub https://github.com/Microsoft/OMS-Agent-for-Linux/blob/master/docs/OMS-Agent-for-Linux.md

FIGURE 7. CONTAINERS SOLUTION IN OMS

For Container hosts on Windows servers, you also need to run the following PowerShell script, shown in Figure 8, to enable the OMS agent to connect to the Docker TCP socket so that the agent can collect monitoring data.

FIGURE 8. SCRIPT TO CONNECT OMS AGENT TO DOCKER TCP SOCKET

Port 2375 is the default Docker REST API port. The OMS collection rules are configured to connect to 172.0.0.1:2375.

Once data is collected, you will see the Containers dashboard light up, as shown in Figure 8. The dashboard is divided into 9 different columns, as described in table 1 and shown in multiple figures that follow.

Column name

Description

Figure

Information

10

Container Events

Shows information about the status of containers, including host and image

10

Container Errors

Shows error from the Container service log

10

Containers Status

Shows running containers and container hosts

11

Containers Image

Inventory

Show number of images in the environment

11

Containers CPU

Performance

Shows CPU performance data per container

11

Containers Memory

Performance

Shows memory performance data per container

12

Computer Performance

Shows performance data for container host computers

12

Sample Queries

Useful queries to drill into collected Containers data. The queries can easy be modified to fit your environment.

12

TABLE 2. COLUMNS IN OMS CONTAINERS DASHBOARD

FIGURE 9. PART OF OMS CONTAINERS DASHBOARD

FIGURE 10. PART OF OMS CONTAINERS DASHBOARD

FIGURE 11. PART OF OMS CONTAINERS DASHBOARD

If you want to search the collected data, use the Container-related data types shown in Table 2 below.

Data Type

Description

Type=Perf

Performance data. This is not a Container specific data type, but performance data is useful when operate or troubleshoot container hosts.

Type=ContainerInventory

Inventory data. This data type is useful when you need information about containers, for example, which are running and information about them

Type=ContainerImageInventory

Image inventory data. This data type is useful when you need information about an image, for example, size.

Type=ContainerLog

Container logs. Data used to find information about specific log entries, for example, an error

Type=ContainerServiceLog

Logs for the container service. If you need to find information about stop, start, or delete command on the Docker service, then this is the data type to use.

TABLE 1. CONTINERS DATA TYPE

The OMS Container solution now supports monitoring of the most popular container orchestrators and schedulers, including Swarm, Kubernetes, and Mesosphere DC/OS, all three of which are supported in the Azure Container Solution. This provides visibility into the inventory of containers in hosts, including images running in them and detailed audit of commands executed. It provides a centralized view of CPU, memory, storage, and network usage and performance information for multiple different types of containers, including Docker and Windows, in your environment.

For Swarm, the OMS Agent for Linux can be run as a global service on Docker Swarm by running the commands below.

For Kubernetes, the same is accomplished on Kubernetes through DaemonSets.

DaemonSets are used by Kubernetes to run a single instance of a container on every host in the cluster.

These options greatly simplify full visibility into root cause of issues in container clusters.

Step-by-Step: For detailed setup instructions, see "Monitor an Azure Container Service cluster with OMS" at https://docs.microsoft.com/en-us/azure/container-service/kubernetes/container-service-kubernetes-oms

A few minutes after setup, you should be able to see data flowing to your OMS dashboard.

Capacity and Performance (Hyper-V)

The Capacity and Performance solution (currently in Preview) can be used to collect capacity and performance data from your Hyper-V servers. The solution gives you deep insight into your Hyper-V environment, including CPU, storage, and memory for all Hyper-V hosts in one dashboard. The solution can also be used to get an overview for multiple Hyper-V environments, that are not technically connected in any way

The solution is enabled from the solution gallery and does not require any extra configuration in OMS or on the OMS agent side. The OMS agent must be installed on Hyper-V hosts running Windows Server 2012 or higher.

The solution focuses on hosts, virtual machines, and storage, as shown in Figures 11, 12 and 13. Figure 14 also shows links to recommended search queries that can be used to drill into the collected data. With the default dashboard, you can see the following

  • Host CPU and Memory Utilization. Shows hosts with highest and lowest CPU and memory utilization.
  • VM CPU and Memory Utilization. Shows virtual machines with highest and lowest CPU and memory utilization.
  • VM Total Disk IOPS. Shows the VM with highest and lowest disk utilization, gives is a good base for reallocate virtual machines or troubleshoot disk performance issues.
  • Cluster Shared Volumes, including total throughput, iops, and latency. Gives a good overview of connected storage, useful in troubleshooting and capacity planning scenario.
  • Host Density. Show number of virtual machine per host gives a good base for reallocate virtual machines between hosts.

Each tile in the dashboard is linked to the log search page where you can drill deeper into the collected information. As all Hyper-V environments are different and performance requirements vary, there is no analysis of the collect data. Instead, each administrator can use this solution as a single-pane-of-glass a broad set of performance metrics, making problem performance areas easier to spot, as shown in Figures 12 and 13.

FIGURE 12. HYPER-V CAPACITY AND PERFORMANCE DASHBOARD - PART ONE

FIGURE 13. HYPER-V CAPACITY AND PERFORMANCE DASHBOARD - PART TWO

FIGURE 14. HYPER-V CAPACITY AND PERFORMANCE DASHBOARD - PART THREE

Other than enabling the Capacity and Performance solution in your OMS Workspace and installing the OMS agent on each Hyper-V host, no additional configuration is required.

VMware monitoring with OMS: If you want to monitor VMware with OMS, you can find more info in "VMware Monitoring (Preview) solution in Log Analytics" at https://docs.microsoft.com/en-us/azure/log-analytics/log-analytics-vmware

Apache on Linux

If Apache is detected during installation of the OMS agent, a special performance provider will be installed. This provider is named apache-cimprov, shown in Figure 15. If the OMS agent is already installed when you install Apache, simply re-run the OMS agent installation script to update the agent.

FIGURE 15. LINUX OMS AGENT INSTALLATION INCLUDING APACHE PROVIDERS

The Apache provider (apache-cimprov) depends on a module that must be loaded into the Apache HTTP Server to access performance data. The following command loads that module. The module can be unloaded by using "-u" instead of "-c" in the command.

sudo /opt/microsoft/apache-cimprov/bin/apache_config.sh -c

Before any performance data is collected, the Apache performance counters have to be added to Linux Performance Counters (see Figure 16), using the Performance Counter feature described in "Chapter 8: Log Management & Performance Data".

FIGURE 16. ADDING APACHE PERFORMANCE COUNTERS

Figure 17 shows collected Apache performance counters in the Log Search interface.

FIGURE 17. COLLECTED PERFORMANCE DATA FROM APACHE SERVER

Note Only Apache performance data is collected. If you need to collect other Apache log data, for example, /var/log/apache2/error.log, you can use the custom logfile feature of OMS, as described in "Chapter 17: Custom OMS Solutions".

MySQL or MariaDB on Linux

If MySQL or MariaDB is detected during installation of the OMS agent on a supported Linux OS, a performance monitoring provider will be installed automatically. This provider connects to the local installation of MySQL or MariaDB and collect performance data. To access MySQL performance data, an account must be configured to be used by the OMS agent.

If the OMS agent is already installed when you install MySQL or MariaDB, simply re-run the OMS agent installation script to update the agent.

The username and password needs to be specified in a file named mysql-auth located in the /car/opt/Microsoft/mysql-cimprov/auth/omsagent folder. Figure 18 shows an example of the configuration file

FIGURE 18. THE MYSQL-AUTH FILE

  • The first part of the configuration specify port that the MySQL instance is listening on. 0 means that it is the default instance.
  • Mysqloms is the user name followed by the password. You can also configure if the mysql-auth file should be overwritten or not when the MySQL OMI Provider is upgraded. If the server has multiple MySQL instances running it is possible to enter multiple lines with different configuration.

It is also possible set the credentials with the following command. The command can have challenges with complex password, in those scenarios, the mysql-auth file is a better option.

sudo su omsagent -c '/opt/microsoft/mysqlcimprov/bin/mycimprovauth default 127.0.0.1 mysqloms SecretPassword123'

After the file has been created or the command has been run, the OMI daemon must be restarted with the following command

sudo /opt/omi/bin/service_control restart

For additional information about required permissions, see GitHub, https://github.com/Microsoft/OMS-Agent-for-Linux/blob/master/docs/OMS-Agent-for-Linux.md

Before any performance data is collected, even if username and provider is in place, performance counters for MySQL must be configured. Figure 18 shows how MySQL performance counters are configured to be collected.

FIGURE 19. CONFIGURATION OF MYSQL PERFORMANCE COUNTERS

Figure 19 shows collected MySQL performance data.

FIGURE 20. COLLECTED MYSQL PERFORMANCE DATA

For additional information about which performance counters are collected for MySQL, see GitHub https://github.com/Microsoft/OMS-Agent-for-Linux/blob/master/docs/OMS-Agent-for-Linux.md

Summary

In this chapter, we discussed a number of OMS solutions, addressing a variety of monitoring scenarios going beyond the Windows monitoring and management Microsoft is known for. We started with a look at the AD Replication Status solution to pinpoint root cause in AD replication issues. Next, we explored monitoring for Docker containers, host clusters with the most popular container management platforms, including Swarm and Kubernetes.

Then, we explored monitoring the LAMP stack (Linux, Apache, MySQL and PHP) with Apache on Linux monitoring, as well as data collection from MySQL and its free open source cousin, MariaDB.

In recent years, Microsoft has been investing heavily in Linux and open source not only from a management perspective but through direct support and integration with new solutions like SQL Server and .NET Core running on Linux. Clearly, the OMS team is following this trend. Expect continued investment from Microsoft in supporting monitoring and management of non-Windows workloads in OMS.