Infrastructure-as-a-Service Product Line Architecture

Introduction

The goal of the Infrastructure-as-a-Service (IaaS) product line architecture (PLA) is to help organizations develop and implement private cloud infrastructures quickly while reducing complexity and risk. The IaaS PLA provides a reference architecture that combines Microsoft software, consolidated guidance, and validated configurations with partner technologies such as compute, network, and storage architectures, in addition to value-added software features.

The private cloud model provides much of the efficiency and agility of cloud computing, with the increased control and customization that are achieved through dedicated private resources. By implementing private cloud configurations that align to the IaaS PLA, Microsoft and its hardware partners can help provide organizations the control and the flexibility that are required to reap the potential benefits of the private cloud.

The IaaS PLA utilizes the core capabilities of the Windows Server operating system, Hyper-V, and System Center to deliver a private cloud infrastructure as a service offering. These are also the key software features and components that are used for every reference implementation.

Scope

The scope of this document is to provide customers with the necessary guidance to develop solutions for a Microsoft private cloud infrastructure in accordance with the IaaS PLA patterns that are identified for use with the Windows Server operating system. This document provides specific guidance for developing fabric architectures (compute, network, storage, and virtualization layers) for an overall private cloud solution. Guidance is provided for the development of an accompanying fabric management architecture that uses System Center.

Microsoft Services

This PLA document was developed by Microsoft Services. Microsoft Services is comprised of a global team of architects, engineers, consultants, and support professionals who are dedicated to helping customers maximize the value of their investment in Microsoft software. Microsoft Services supports customers in over 82 countries, helping them plan, deploy, support, and optimize Microsoft technologies. Microsoft Services works closely with Microsoft Partners by sharing their technological expertise, solutions, and product knowledge. For more information about the solutions that Microsoft Services offers or to learn about how to engage with Microsoft Services and Microsoft Partners, please visit the Microsoft Services website.

IaaS Product Line Architecture Overview

The IaaS PLA is focused on deploying virtualization fabric and fabric management technologies in Windows Server and System Center to support private cloud scenarios. This PLA includes reference architectures, best practices, and processes for streamlining deployment of these platforms to support private cloud scenarios.

This part of the IaaS PLA focuses on delivering core foundational virtualization fabric infrastructure guidance that aligns to the defined architectural patterns within this and other Windows Server private cloud programs. The resulting Hyper-V infrastructure in Windows Server can be leveraged to host advanced workloads. The accompanying Fabric Management Architecture Guide, contains fabric management scenarios that use System Center components. Scenarios that are relevant to this release include:

  • Resilient infrastructure: Maximize the availability of IT infrastructure through cost-effective redundant systems that prevent downtime, whether planned or unplanned.
  • Centralized IT: Create pooled resources with a highly virtualized infrastructure that support maintaining individual tenant rights and service levels.
  • Consolidation and migration: Remove legacy systems and move workloads to a scalable high-performance infrastructure.
  • Preparation for the cloud: Create the foundational infrastructure to begin the transition to a private cloud solution.

IaaS Reference Architectures

Microsoft Private Cloud programs have two main solutions as shown in Figure 1. This document focuses on the open solutions model, which can be used to service the enterprise and hosting service provider audiences.

Figure 1 Branches of the Microsoft Private Cloud

Small- or medium-sized enterprises should plan a reference architecture that defines the requirements that are necessary to design, build, and deliver virtualization and private cloud solutions, including hosting service provider implementations.

Figure 2 shows examples of these reference architectures.

Figure 2 Examples of reference architectures

Each reference architecture combines concise guidance with validated configurations for the compute, network, storage, and virtualization layers. Each architecture presents multiple design patterns to enable the architecture, and each design pattern describes the minimum requirements for each solution.

Product Line Architecture Fabric Design Patterns

As previously described, Windows Server utilizes innovative hardware capabilities, and it enables what were previously considered advanced scenarios and capabilities from commodity hardware. These capabilities have been summarized into initial design patterns for the IaaS PLA. Identified patterns include the following infrastructures:

  • Software-defined infrastructure
  • Non-converged infrastructure
  • Converged infrastructure

Each design pattern guide outlines the high-level architecture, provides an overview of the scenario, identifies technical requirements, outlines all dependencies, and provides guidelines as to how the architectural guidance applies to each deployment pattern. Each pattern also includes an array of fabric constructs in the categories of compute, network, storage, and virtualization. Each pattern is outlined in this guide with an overview of the pattern and a summary of how each pattern leverages each feature area.

The following features are common across each of the design patterns:

Required Features

Optional Features

  • Dedicated fabric management hosts
  • Addition of single root I/O virtualization (SR-IOV) network interface cards
  • 10 gigabit Ethernet (GbE) or higher network connectivity
  • Addition of a certified Hyper-V extensible virtual switch extension
  • Redundant paths for all storage networking components (such as redundant serial attached SCSI (SAS) paths, and Multipath I/O (MPIO) for Fibre Channel and SMB Multichannel where appropriate)
 
  • SMI-S or SMP–compliant management interfaces for storage components
 
  • Remote direct memory access (RDMA) network connectivity (RoCE, InfiniBand or iWARP)
 
  • Shared storage
 

The following table outlines the Windows Server features and technologies that are common to all patterns:

Windows Server Feature

Key Scenarios

Increased Virtual Processor to Logical Processor ratio

Removal of previous limits of 8:1 processor ratios for server workloads and 12:1 processor ratios for client workloads.

Increased virtual memory and Dynamic Memory

Supports up to 1 TB of memory inside virtual machines.

Virtual machine guest clustering enhancements

Supports virtual machine guest clusters by using a shared virtual hard disk, iSCSI connections, or the Hyper-V Fibre Channel adapter to connect virtual machines to shared storage.

Hyper-V virtual switch

A virtual Ethernet switch that allows filtering, capturing, and forwarding extensions that are to be added by non-Microsoft vendors to support additional virtual-switch functionality on the Hyper-V platform.

Cluster-aware updating

Provides the ability to apply updates to running failover clusters through coordinated patching of individual failover-cluster nodes.

Live migration enhancements

Supports migrating virtual machines without using shared storage, memory compression, or the SMB 3.0 protocol

Support for SR-IOV

Provides the ability to assign a network adapter that supports single-root I/O virtualization (SR-IOV) directly to a virtual machine.

Support for 4K physical disks

Supports native 4K disk drives on hosts.

Diskless network boot with iSCSI Target Server

Provides the network-boot capability on commodity hardware by using an iSCSI boot–capable network adapter or a software boot loader.

Virtual machine generation enhancements

Windows Server introduces Generation 2 virtual machines, which support new functionality on virtual machines such as UEFI firmware, PXE boot, and Secure Boot.

Virtual machine storage enhancements (VHDX format)

Supports VHDX format for disks that are up to 64 TB in size and shared virtual hard disks

Windows NIC Teaming

Supports switch-independent and switch-dependent load distribution by using physical and virtual network connections.

Data Center Bridging

Provides hardware support for converged fabrics, which allows bandwidth allocation and priority flow control.

Table 1 Windows Server features and key scenarios applicable to all patterns

Windows Hardware Certification

In IaaS PLA implementations, it is mandatory that each architecture solution pass the following validation requirements:

  • Windows hardware certification
  • Failover-clustering validation
  • Clustered RAID controller validation (if a non-Microsoft clustered RAID controller is used)

These rule sets are described in the following subsections.

Windows Hardware Certification

Hardware solutions must receive validation through the Microsoft "Certified for Windows Server" program before they can be presented in the Windows Server Catalog. The catalog contains all servers, storage, and other hardware devices that are certified for use with Windows Server 2012 and Hyper-V.

The Certified for Windows Server logo demonstrates that a server system meets the high technical bar set by Microsoft for security, reliability, and manageability, and any required hardware components that support all of the roles, features, and interfaces that Windows Server supports.

The logo program and the support policy for failover-clustering solutions require that all the individual components that make up a cluster configuration earn the appropriate "Certified for" or "Supported on" Windows Server designations before they are listed in their device-specific categories in the Windows Server Catalog.

For more information, open the Windows Server Catalog. Under Hardware Testing Status, click Certified for Windows Server. The two primary entry points for starting the logo-certification process are Windows Hardware Certification Kit (HCK) downloads and the Windows Dev Center Hardware and Desktop Dashboard.

Validation requirements include failover-clustering validation and clustered RAID controller validation, as described in the following subsections.

Failover-Clustering Validation

For Windows Server, failover clustering can be validated by using the Cluster Validation Tool to confirm network and shared storage connectivity between the nodes of the cluster. The tool runs a set of focused tests on the servers that are to be used as nodes in a cluster, or are already members of a given cluster. This failover-cluster validation process tests the underlying hardware and software directly, and individually obtains an accurate assessment of whether the failover cluster has the ability to support a given configuration.

Cluster validation is used to identify hardware or configuration issues before the cluster enters production. This helps make sure that a solution is truly dependable.

In addition, cluster validation can be performed as a diagnostic tool on configured failover clusters. Failover clusters must be tested and they must pass the failover-cluster validation to receive customer support from Microsoft Customer Support Services (CSS).

Clustered RAID Controller Validation

Clustered RAID controllers are a relatively new type of storage interface card that can be used with shared storage and cluster scenarios. RAID controllers that are set up across configured servers provide shared storage, and the clustered RAID controller solution must pass the clustered RAID controller validation.

If the solution includes a clustered RAID controller, this validation requirement includes Windows Licensing.

Windows Licensing

IaaS PLA architectures use the Windows Server Standard or Windows Server Datacenter.

The packaging and licensing for Windows Server have been updated to simplify purchasing and reduce management requirements, as shown in the following table. The Windows Server Standard and Datacenter editions are differentiated only by virtualization rights—two virtual instances for the Standard edition, and an unlimited number of virtual instances for the Datacenter edition.

For more information about Windows Server licensing, see the Windows Server Datasheet or Windows Server 2012: How to Buy.

For information about licensing in virtual environments, see Microsoft Volume Licensing Brief: Licensing Microsoft Server Products in Virtual Environments.

Software Defined Infrastructure Pattern Overview

The Software Defined Infrastructure pattern (previously referred to as Continuous Availability over Server Message Block (SMB) Storage) supports deployments in Windows Server that use Hyper-V and Failover Clustering. Continuous availability and transparent failover are delivered over a Scale-Out File Server cluster infrastructure, and SMB shared storage is provided by using a converged hardware configuration and native capabilities in the Windows Server operating system. This pattern has three variations:

  • Variation A: SMB Direct using shared serial attached SCSI (SAS) and Storage Spaces
  • Variation B: SMB Direct using storage area network (SAN)
  • Variation C: SMB 3.0-enabled storage

Note SMB Direct is based on SMB 3.0, and it supports the use of network adapters that have remote direct memory access (RDMA) capability.

Variation A uses SMB Direct using shared SAS and Storage Spaces to provide storage capabilities over direct-attached storage technologies. This pattern combines a Scale-Out File Server cluster infrastructure with SMB Direct to provide back-end storage that has similar characteristics to traditional SAN infrastructures and supports Hyper-V and SQL Server workloads.

Figure 3 outlines a conceptual view of Variation A.

Figure 3 Conceptual view of variation A

Variation B uses SMB Direct with SAN-based storage, which provides the advanced storage capabilities that are found in storage area network (SAN) infrastructures. SAN-based storage solutions typically provide additional features beyond what can be provided natively through the Windows Server operating system by using shared direct-attached "Just a Bunch of Drives" (JBOD) storage technologies. Although this variation is generally more expensive, its primary trade-offs weigh capability and manageability over cost.

Variation B is similar to Variation A. It utilizes a Scale-Out File Server cluster infrastructure with SMB Direct; however, the back-end storage infrastructure is a SAN-based storage array. In this variation, innovative storage capabilities that are typically associated with SAN infrastructures can be utilized in conjunction with RDMA and SMB connectivity for Hyper-V workloads.

Figure 4 outlines a conceptual view of Variation B.

Figure 4 Conceptual view of variation B

In Variation C, instead of using Scale-Out File Server clusters and SMB Direct, SMB 3.0-enabled storage devices are used to provide basic storage capabilities, and Hyper-V workloads utilize the SMB shared resources directly. This configuration might not provide advanced storage capabilities, but it provides an affordable storage option for Hyper-V workloads.

Figure 5 outlines a conceptual view of Variation C.

Figure 5 Conceptual view of variation C

Although the following list of requirements is not comprehensive, the Software Defined Infrastructure pattern requires the following features:

  • All the common features listed earlier
  • Dedicated hosts for a Scale-Out File Server cluster (for Variations A and B)
  • Shared SAS JBOD storage array (required for Variation A)
  • SMB 3.0-enabled storage array (optional for Variation C only)

Table 3 outlines Windows Server features and technologies that are utilized in this architectural design pattern in addition to the common features and capabilities mentioned earlier.

Windows Server Feature

Key Scenarios

Quality of Service (QoS) minimum bandwidth

Assigns a certain amount of bandwidth to a given type of traffic and helps make sure that each type of network traffic receives up to its assigned bandwidth.

Storage Quality of Service (QoS)

Provides storage performance isolation in a multitenant environment and mechanisms to notify you when the storage I/O performance does not meet defined thresholds.

Shared virtual hard disks

Shared virtual hard disks (.vhdx file) can be used as shared storage for multiple virtual machines that are configured as a guest failover cluster. This avoids the need to use iSCSI in this scenario.

Storage spaces

Enables cost-effective, optimally used, high availability, scalable, and flexible storage solutions in virtualized or physical deployments.

Storage Spaces tiering

Enables the creation of virtual disks that are comprised of two tiers of storage—a solid-state drive tier for frequently accessed data and a hard disk drive tier for less-frequently accessed data. Storage Spaces transparently moves data between the two tiers based on how frequently data is accessed.

Hyper-V over SMB

Supports use of SMB 3.0 file shares as storage locations for running virtual machines by using low-latency RDMA network connectivity.

SMB Direct

Provides low latency SMB 3.0 connectivity when using remote direct memory access (RDMA) adapters.

Data deduplication

Involves finding and removing duplication within data without compromising its fidelity or integrity.

SMB Multichannel

Allows file servers to use multiple network connections simultaneously, which provides increased throughput and network fault tolerance.

Virtual Receive Side Scaling (vRSS)

Allows virtual machines to support higher networking traffic loads by distributing the processing across multiple cores on the Hyper-V host and virtual machine.

Table 2 Windows Server features and key scenarios

Key drivers that would encourage customers to select the Software Defined Infrastructure pattern include lower cost of ownership and flexibility with shared SAS JBOD storage solutions (Variation A only). Decision points for this design pattern over others focus primarily on the storage aspects of the solution in combination with the innovative networking capabilities of SMB Multichannel and RDMA.

Non-Converged Infrastructure Pattern Overview

The non-converged infrastructure pattern uses Hyper-V and Failover Clustering in a standard deployment with non-converged storage (traditional SAN) and a network infrastructure. The storage network and network paths are isolated by using dedicated I/O adapters. Failover and scalability are achieved on the storage network through Multipath I/O (MPIO). The TCP/IP network uses NIC Teaming.

In this pattern, Fibre Channel or iSCSI is expected to be the primary connectivity to a shared storage network. High-speed 10 gigabit Ethernet (GbE) adapters are common for advanced configurations of TCP/IP traffic.

Figure 6 provides an overview of the non-converged infrastructure pattern.

Figure 6 Non-converged design pattern

The non-converged infrastructure pattern has two connectivity variations:

  • Variation A: Fibre Channel
  • Variation B: iSCSI

Figure 7 outlines a conceptual view of this pattern.

Figure 7 Non-converged design pattern variations

Although the following list of requirements is not comprehensive, this design pattern uses the following features:

  • All the common components listed earlier
  • Fibre channel, iSCSI, or SMB 3.0-enabled SAN-based storage
  • Storage-array support for offloaded data transfer (ODX) (optional)

Table 3 outlines the Windows Server features that are utilized in this architectural design pattern in addition to the common features and capabilities outlined earlier.

Windows Server 2012 Feature

Key Scenarios

Virtual machine guest clustering enhancements (iSCSI, Virtual Fibre Channel, or shared virtual hard disks)

Supports virtual machine guest clusters by using iSCSI connections or by using the Hyper-V Fibre Channel adapter to connect to shared storage. Alternatively, a shared virtual hard disk feature can be used regardless of the shared storage protocol that is used at the host level.

Offloaded data transfer (ODX)

Support for storage-level transfers that use ODX technology (SAN feature).

Diskless network boot with iSCSI Target Server

Provides the network-boot capability on commodity hardware by using an iSCSI boot–capable network adapter or a software boot loader (such as iPXE or netBoot/i).

Table 3 Windows Server features and key scenarios

Key drivers that would encourage customers to select this design pattern include current capital and intellectual investments in SAN and transformation scenarios that include using an existing infrastructure for upgrading to a newer platform. Decision points for this design pattern include storage investments, familiarity, and flexibility of hardware.

Converged Infrastructure Pattern Overview

In this context, a "converged infrastructure" refers to sharing a network topology between traditional network and storage traffic. This typically implies Ethernet network devices and network controllers that have particular features to provide segregation, quality of service (performance), and scalability. The result is a network fabric that features less physical complexity, greater agility, and lower costs than those that are associated with traditional Fiber Channel-based storage networks.

This topology supports many storage designs, including traditional SANs, SMB 3.0-enabled SANs, and Windows-based Scale-Out File Server clusters. In a converged infrastructure, all storage connectivity is network-based, and it uses a single media (such as copper). SFP+ adapters are more commonly used.

Servers for a converged infrastructure pattern typically include converged blade systems and rack-mount servers, which also are prevalent in other design patterns. The key differentiators in this pattern are how the servers connect to storage and the advanced networking features that are provided by converged network adapters (CNA). High-density blade systems are common, which feature advanced hardware options that present physical or virtual network adapters to the Hyper-V host that is supporting a variety of protocols.

Figure 8 depicts a configuration for the converged infrastructure. Note the following:

  • Host storage adapters can be physical or virtual, and they must support iSCSI, Fibre Channel over Ethernet (FCoE), and optionally SMB Direct.
  • Many storage devices are supported, including traditional SANs and SMB Direct–capable storage.

Figure 8 Converged infrastructure pattern

Although the following list of requirements is not comprehensive, the converged infrastructure pattern uses the following features:

  • All the common components listed earlier
  • Fibre channel, iSCSI, or SMB 3.02-enabled SAN-based storage
  • Storage-array support for ODX (optional)

Table 4 outlines Windows Server features that are utilized in the converged infrastructure pattern in addition to the common features and capabilities outlined earlier.

Windows Server 2012 Feature

Key Scenarios

Virtual machine guest clustering enhancements (iSCSI, Virtual Fibre Channel, or shared virtual hard disks)

Supports virtual machine guest clusters by using iSCSI connections or by using the Hyper-V Fibre Channel adapter to connect to shared storage. Alternatively, the shared virtual hard disk feature can be used regardless of the shared storage protocol that is used on the host level.

Offloaded data transfer (ODX)

Support for storage-level transfers that use ODX technology (SAN feature).

Table 4 Windows Server features and key scenarios

Hybrid Infrastructure Pattern Overview

The Hybrid Infrastructure pattern includes reference architectures, best practices, and processes for extending a private cloud infrastructure to Microsoft Azure or a Microsoft service-provider partner for hybrid cloud scenarios such as:

  • Extending the data center fabric to the cloud
  • Extending fabric management to the cloud
  • Hybrid deployment of Microsoft applications

The overall Microsoft "Cloud OS" strategy supports this architecture and approach. For more information about this strategy, see:

The key attribute of the Cloud OS vision is the hybrid infrastructure, in which you have the option to leverage an on-premises infrastructure, a Microsoft Azure infrastructure, or a Microsoft hosting partner infrastructure. Your IT organization is a consumer and a provider of services, enabling workload and application development teams to make sourcing selections for services from all three of the possible infrastructures or create solutions that span them.

The following diagram illustrates the infrastructure level, the cloud service catalog space, and examples of application scenarios and service-sourcing selections (for example, a workload team determining if it will use virtual machines that are provisioned on-premises, in Microsoft Azure, or by a Microsoft hosting partner.)

Figure 9 Hybrid IT infrastructure

By having a hybrid infrastructure in place, IT consumers focus on the service catalog instead of infrastructure. Historically, full supporting stacks would be designed from hardware, through operating system and application stack. Workloads in a hybrid environment draw from the service catalog that is provided by IT, which consists of services that are delivered by the hybrid infrastructure.

As an example, all three hybrid infrastructure pattern choices provide virtual machines; but in each case, those virtual machines have different attributes and costs. The consumer will have the choice of which one or which combination to utilize. Some virtual machines might be very low-cost but have limited features available, while others might be higher-cost but support more capability.

The hybrid infrastructure pattern enables customers to utilize private, public, and service provider clouds, each of which utilize the same product and architecture foundations.

Storage Architecture

Drive Architectures

The type of hard drives in the host server or in the storage array that are used by the file servers have significant impact on the overall performance of the storage architecture. The critical performance factors for hard drives are:

  • The interface architecture (for example, SAS or SATA)
  • The rotational speed of the drive (for example, 10K, or 15K RPM) or a solid-state drive (SSD) that does not have moving parts
  • The Read and Write speed
  • The average latency in milliseconds (ms)

Additional factors, such as the cache on the drive, and support for advanced features, such as Native Command Queuing (NCQ), TRIM (SATA only), Tagged Command Queuing, and UNMAP (SAS and Fibre Channel) can improve performance and duration.

As with the storage connectivity, high I/O operations per second (IOPS) and low latency are more critical than maximum sustained throughput when it comes to sizing and guest performance on the server running Hyper-V. This translates into selecting drives that have the highest rotational speed and lowest latency possible, and choosing when to use SSDs for extreme performance.

Serial ATA (SATA)

Serial ATA (SATA) drives are a low-cost and relatively high-performance option for storage. SATA drives are available primarily in the 3 Gbps and 6 Gbps standards (SATA II and SATA III), with a rotational speed of 7,200 RPM and average latency of around four milliseconds. Typically, SATA drives are not designed to enterprise-level standards of reliability, although new technologies in Windows Server such as the Resilient File System (ReFS) can help make SATA drives a viable option for single server scenarios. However, SAS disks are required for all cluster and high availability scenarios that use Storage Spaces.

SAS

Serial attached SCSI (SAS) drives are typically more expensive than SATA drives, but they can provide higher performance in throughput, and more importantly, low latency. SAS drives typically have a rotational speed of 10K or 15K RPM with an average latency of 2ms to 3ms and 6 Gbps interfaces.

There are also SAS SSDs. Unlike SATA drives, SAS drives have dual interface ports that are required for using clustered storage spaces. (Details are provided in subsequent sections.) The SCSI Trade Association has a range of information about SAS drives. In addition, you can find several white papers and solutions on the LSI website.

The majority of SAN arrays today use SAS drives; but a few higher end arrays also use Fibre Channel, SAS drives, and SATA drives. For example, SAS drives are used in the Software Defined Infrastructure pattern in conjunction with a JBOD storage enclosure, which enables the Storage Spaces feature. Aside from the enclosure requirements that will be outlined later, the following requirements exist for SAS drives when they are used in this configuration:

  • Drives must provide port association. Windows depends on drive enclosures to provide SES-3 capabilities such as drive-slot identification and visual drive indications (commonly implemented as drive LEDs). Windows matches a drive in an enclosure with SES-3 identification capabilities through the port address of the drive. The computer hosts can be separate from drive enclosures or integrated into drive enclosures.
  • Multiport drives must provide symmetric access. Drives must provide the same performance for data-access commands and the same behavior for persistent reservation commands that arrive on different ports as they provide when those commands arrive on the same port.
  • Drives must provide persistent reservations. Windows can use physical disks to form a storage pool. From the storage pool, Windows can define virtual disks, called storage spaces. A failover cluster can create high availability for a pool of physical disks, the storage spaces that they define, and the data that they contain. In addition to the standard Windows Hardware Compatibility Test qualification, physical disks should pass through the Microsoft Cluster Configuration Validation Wizard.

In addition to the drives, the following enclosure requirements exist:

  • Drive enclosures must provide drive-identification services. Drive enclosures must provide numerical (for example, a drive bay number) and visual (for example, a failure LED or a drive-of-interest LED) drive-identification services. Enclosures must provide this service through SCSI Enclosure Service (SES-3) commands. Windows depends on proper behavior for the following enclosure services. Windows correlates enclosure services to drives through protocol-specific information and their vital product data page 83h inquiry association type 1.
  • Drive enclosures must provide direct access to the drives that they house. Enclosures must not abstract the drives that they house (for example, form into a logical RAID disk). If they are present, integrated switches must provide discovery of and access to all of the drives in the enclosure, without requiring additional physical host connections. If possible, multiple host connections must provide discovery of and access to the same set of drives.

Hardware vendors should pay specific attention to these storage drive and enclosure requirements for SAS configurations when they are used in conjunction with the Storage Spaces feature in Windows Server.

Nearline SAS (NL-SAS)

Nearline SAS (NL-SAS) drives deliver the larger capacity benefits of enterprise SATA drives with a fully capable SAS interface. The physical properties of the drive are identical to those of traditional SATA drives, with a rotational speed of 7,200 RPM and average latency of around four milliseconds. However, exposing the SATA drive through a SAS interface provides all the enterprise features that come with an SAS drive, including multiple host support, concurrent data channels, redundant paths to the disk array (required for clustered storage spaces), and enterprise command queuing. The result is SAS drives that are capable of much larger capacities and available at significantly lower costs.

It is important to consider that although NL-SAS drives provide greater capacity through the SAS interface, they have the same latency, reliability, and performance limitations of traditional enterprise SATA drives, which results in the higher drive failure rates of SATA drives compared to native SAS drives.

As a result, NL-SAS drives can be used in cluster and high availability scenarios that use Storage Spaces, although using SSD drives to implement storage tiers is highly recommended to improve storage performance.

Fibre Channel

Fibre Channel is traditionally used in SAN arrays and it provides high speed (same as SAS), low latency, and enterprise-level reliability. Fibre Channel is usually more expensive than using SATA or SAS drives. Fibre Channel typically has performance characteristics that are similar to those of SAS drives, but it uses a different interface. The choice of Fibre Channel or SAS drives is usually determined by the choice of storage array or drive tray. In many cases, SSDs can also be used in SAN arrays that use Fibre Channel interfaces. All arrays also support SATA devices, sometimes with an adapter. Disk and array vendors have largely transitioned to SAS drives.

Solid-State Storage

SATA, SAS, NL-SAS, and Fibre Channel describe the interface type, and a solid-state drive (SSD) refers to the classification of media type. Solid-state storage has several advantages over traditional spinning media disks, but it comes at a premium cost. The most prevalent type of solid-state storage is a solid-state drive. Some advantages include significantly lower latency, no spin-up time, faster transfer rates, lower power and cooling requirements, and no fragmentation concerns.

Recent years have shown greater adoption of SSDs in enterprise storage markets. These more expensive devices are usually reserved for workloads that have high-performance requirements. Mixing SSDs with spinning disks in storage arrays is common to minimize cost. These storage arrays often have software algorithms that automatically place the frequently accessed storage blocks on the SSDs and the less frequently accessed blocks on the lower-cost disks (referred to as auto-tiering), although manual segregation of disk pools is also acceptable. NAND Flash Memory is most commonly used in SSDs for enterprise storage.

Hybrid Drives

Hybrid drives combine traditional spinning disks with nonvolatile memory or small SSDs that act as a large buffer. This method provides the potential benefits of solid-state storage with the cost effectiveness of traditional disks. Currently, these drives are not commonly found in enterprise storage arrays.

Advanced Format (4K) Disk Compatibility

Windows Server 2012 introduced support for large sector disks that support 4096-byte sectors (referred to as 4K), rather than the traditional 512-byte sectors, which ship with most hard drives. This change offers higher capacity drives, better error correction, and more efficient signal-to-noise ratios. Windows Server provides continued support for this format and these drives are becoming more prevalent in the market.

However, this change introduces compatibility challenges. To support compatibility, two types of 4K drives exist: 512-byte emulation (512e) and 4K native. 512e drives present a 512-byte logical sector to use as the unit of addressing, and they present a 4K physical sector to use as the unit of atomic write (the unit defined by the completion of Read and Write operations in a single operation).

Storage Controller Architectures

For servers that will be directly connected to storage devices or arrays (which could be servers running Hyper-V or file servers that will provide storage), the choice of storage controller architecture is critical to performance, scale, and overall cost.

SATA III

SATA III controllers can operate at speeds of up to 6 Gbps, and enterprise-oriented controllers can include varying amounts of cache on the controller to improve performance.

SAS controllers can also operate at speeds of up to 6 Gbps, and they are more common in server form factors than SATA. If you are running Windows Server, it is important to understand the difference between host bus adapters (HBAs) and RAID controllers.

SAS HBAs provide direct access to the disks, trays, or arrays that are attached to the controller. There is no controller-based RAID. Disk high availability is provided by the array or by the tray. In Storage Spaces, high availability is provided by higher-level software layers. SAS HBAs are common in one-, two-, and four-port models.

To support the Storage Spaces feature in Windows Server, the HBA must report the HBA that is used to connected devices. For example, drives that are connected through the SAS HBA provide a valid configuration, whereas drives that are connected through the RAID HBA provides an invalid configuration.

All commands must be passed directly to the underlying physical devices. The physical devices must not be abstracted (that is, formed into a logical RAID device), and the HBA must not respond to commands on behalf of the physical devices.

Figure 10 Example SAS JBOD Storage architecture

PCIe/SAS HBA

SAS controllers can also operate at speeds of up to 6 Gbps, and they are more common in server configurations than SATA. With Windows Server, it is important to understand the difference between host bus adapters (HBAs) and RAID controllers. A SAS HBA provides direct access to the disks, trays, or arrays attached to the controller. There is no support for configurations which use controller-based RAID. Disk high availability is provided by either the array or tray itself, or—in the case of Storage Spaces—by higher-level software layers. SAS HBAs are common in one-, two-, and four-port models.

To support Storage Spaces, HBAs must report the physical bus that is used to connected devices (for example, drives connected via the SAS bus is a valid configuration whereas drives as connected via the RAID bus is an invalid configuration). All commands must be passed directly to the underlying physical devices. The physical devices must not be abstracted (that is, formed into a logical RAID device), and the bus adapter must not respond to commands on behalf of the physical devices.

PCIe RAID/Clustered RAID

Peripheral Component Interconnect Express (PCIe) RAID controllers are the traditional cards that are found in servers. They provide access to storage systems, and they can include RAID technology. RAID controllers are not typically used in cluster scenarios, because clustering requires shared storage. If Storage Spaces is used, RAID should not be enabled because Storage Spaces handles data availability and redundancy.

Clustered RAID controllers are a type of storage interface card that can be used with shared storage and cluster scenarios. The clustered RAID controllers can provide shared storage across configured servers. The clustered RAID controller solution must pass the Cluster in a Box Validation Kit. This step is required to make sure that the solution provides the storage capabilities that are necessary for failover cluster environments.

Figure 11 Clustered RAID controllers

Fibre Channel HBA

Fibre Channel HBAs provide one of the more common connectivity methods for storage, particularly in clustering and shared storage scenarios. Some HBAs include two or four ports, with ranges of four, eight, or 16 (Gen5) Gbps. Windows Server supports a large number of logical unit numbers (LUNs) per HBA. The capacity is expected to exceed the needs of customers for addressable LUNs in a SAN.

Hyper-V in Windows Server provides the ability to support virtual Fibre Channel adapters within a guest operating system. This is not necessarily required even if Fibre Channel presents storage to servers running Hyper-V

Although virtual Fibre Channel is discussed in later sections of this document, it is important to understand that the HBA ports that are used with virtual Fibre Channel should be set up in a Fibre Channel topology that supports N_Port ID Virtualization (NPIV), and they should be connected to an NPIV-enabled SAN. To utilize this feature, the Fibre Channel adapters must also support devices that present LUNs.

Storage Networking

A variety of storage networking protocols exist to support traditional SAN-based scenarios, network-attached storage scenarios, and the newer software-defined infrastructure scenarios that support file-based storage through the use of SMB.

Fibre Channel

Historically, Fibre Channel has been the storage protocol of choice for enterprise data centers for a variety of reasons, including performance and low latency. These considerations have offset the typically higher costs of Fibre Channel. The continually advancing performance of Ethernet from one Gbps to 10 Gbps and beyond has led to great interest in storage protocols that use Ethernet transports, such as iSCSI and Fibre Channel over Ethernet (FCoE).

Given the long history of Fibre Channel in the data center, many organizations have a significant investment in a Fibre Channel–based SAN infrastructure. Windows Server continues to provide full support for Fibre Channel hardware that is logo-certified. There is also support for virtual Fibre Channel in guest virtual machines through a Hyper-V feature in Windows Server 2012 and Windows Server.

iSCSI

In Windows Server and Windows Server 2012, the iSCSI Target Server is available as a built-in option under the file and storage service role instead of a separate downloadable add-on, so it is easier to deploy. The iSCSI Target Server capabilities were enhanced to support diskless network-boot capabilities. This demonstrates how the storage protocols in Windows Server and Windows Server 2012 are designed to complement each other across all layers of the storage stack.

In Windows Server and Windows Server 2012, the iSCSI Target Server feature provides a network-boot capability from operating system images that are stored in a centralized location. This capability supports commodity hardware for up to 256 computers.

Windows Server supports up to a maximum of 276 LUNs and a maximum number of 544 sessions per target. In addition, this capability does not require special hardware, but it is recommended to be used in conjunction with 10 GbE adapters that support iSCSI boot capabilities. For Hyper-V, iSCSI-capable storage provides an advantage because it is the protocol that is utilized by Hyper-V virtual machines for clustering.

Initially introduced in Hyper-V in Windows Server 2012, the iSCSI Target Server uses the VHDX format to provide a storage format for LUNs. The VHDX format provides data corruption protection during power failures and optimizes structural alignments of dynamic and differencing disks to prevent performance degradation on large-sector physical disks. This also provides the ability to provision target LUNs up to 64 TB and the ability to provision fixed-size and dynamically growing disks. In Windows Server, all new disks that are created in the iSCSI Target Server use the VHDX format; however, standard disks with the VHD format can be imported.

In addition, the iSCSI Target Server in Windows Server enables Force Unit Access (FUA) for I/O on its back-end virtual disk only if the front-end that the iSCSI Target Server received from the initiator requires it. This has the potential to improve performance, assuming FUA-capable back-end disks or JBODs are used with the iSCSI Target Server.

Fibre Channel over Ethernet (FCoE)

A key advantage of the protocols using an Ethernet transport is the ability to use a converged network architecture. Converged networks have an Ethernet infrastructure that serves as the transport for LAN and storage traffic. This can reduce costs by eliminating dedicated Fibre Channel switches and reducing cables.

Fibre Channel over Ethernet (FCoE) allows the potential benefits of using an Ethernet transport, while retaining the advantages of the Fibre Channel protocol and the ability to use Fibre Channel storage arrays.

Several enhancements to standard Ethernet are required for FCoE. The enriched Ethernet is commonly referred to as enhanced Ethernet or Data Center Ethernet. These enhancements require Ethernet switches that are capable of supporting enhanced Ethernet.

InfiniBand

InfiniBand is an industry-standard specification that defines an input/output architecture that is used to interconnect servers, communications infrastructure equipment, storage, and embedded systems. InfiniBand is a true fabric architecture that utilizes switched, point-to-point channels with data transfers of up to 120 gigabits per second (Gbps), in chassis backplane applications and through external copper and optical fiber connections.

InfiniBand provides a low-latency, high-bandwidth interconnection that requires low processing overhead. It is ideal for carrying multiple traffic types (such as clustering, communications, storage, and management) over a single connection.

Switched SAS

Although switched SAS is not traditionally viewed as a storage networking technology, it is possible to design switched SAS storage infrastructures. In fact, this can be a low cost and powerful approach when combined with Windows Server features, such as Storage Spaces and SMB 3.0.

SAS switches enable multiple host servers to be connected to multiple storage trays (SAS JBODs) with multiple paths between each, as shown in Figure 10. Multiple path SAS implementations use a single domain method to provide fault tolerance. Current mainstream SAS hardware supports 6 Gbps. SAS switches support domains that enable functionality similar to zoning in Fibre Channel.

Figure 12 SAS switch connected to multiple SAS JBOD arrays

Network File System

File-based storage is a practical alternative to more SAN storage because it is straightforward to provision. It has gained viability because it is simple to provision and manage. An example of this trend is the popularity of deploying and running VMware vSphere virtual machines from file-based storage that is accessed over the Network File System (NFS) protocol.

To help you utilize this, Windows Server includes an updated Server for NFS that supports NFS 4.1 and can utilize many other performance, reliability, and availability enhancements that are available throughout the storage stack in Windows. Some of the key features that are available in Server for NFS include:

  • Storage for VMware virtual machines over NFS. In Windows Server, you can confidently deploy Server for NFS as a high availability storage back end for VMware virtual machines. Critical components of the NFS stack have been designed to provide transparent failover semantics to NFS clients.
  • NFS 4.1 protocol. The NFS 4.1 protocol is a significant evolution, and Microsoft delivers a standards-compliant server-side implementation in Windows Server. Some of the features of NFS 4.1 include a flexible single-server namespace for easier share management, full Kerberos version 5 support for enhanced security (including authentication, integrity, and privacy), VSS snapshot integration for backup, and Unmapped UNIX User Access for easier user account integration. Windows Server supports simultaneous SMB 3.0 and NFS access to the same share, identity mapping by using stores based on RFC-2307 for easier and more secure identity integration, and high availability cluster deployments.
  • Windows PowerShell. In response to customer feedback, over 40 Windows PowerShell cmdlets provide task-based remote management for every aspect of Server for NFS—from the configuring NFS server settings to provisioning shares and share permissions.
  • Simplified identity mapping. Windows Server includes a flat file–based identity-mapping store. Windows PowerShell cmdlets replace cumbersome manual steps to provision Active Directory Lightweight Directory Services (AD LDS) as an identity-mapping store and to manage mapped identities.

SMB 3.0

Similar to the other file-based storage options that were discussed earlier, Hyper-V and SQL Server can take advantage of Server Message Block (SMB)-based storage for virtual machines and SQL Server database data and logs. This support is enabled by the features and capabilities provided in Windows Server and Windows Server 2012 as outlined in the following sections.

Note It is strongly recommended that Scale-Out File Server is used in conjunction with IaaS architectures that use SMB storage as discussed in the following sections.

Figure 13 Example of a SMB 3.0-enabled network-attached storage

SMB Direct (SMB over RDMA)

The SMB protocol in Windows Server includes support for remote direct memory access (RDMA) on network adapters, which allows storage-performance capabilities that rival Fibre Channel. RDMA enables this performance capability because network adapters can operate at full speed with very low latency due to their ability to bypass the kernel and perform Read and Write operations directly to and from memory. This capability is possible because effective transport protocols are implemented on the network adapter hardware, which allows for zero-copy networking by bypassing the kernel.

By using this capability, applications (including SMB) can transfer data directly from memory, through the network adapter, to the network, and then to the memory of the application that is requesting data from the file share. This means that two kernel calls (one from the server and one from the client) are largely removed from the data transfer process, which results in greatly improved data transfer performance. This capability is especially useful for Read and Write intensive workloads, such as in Hyper-V or Microsoft SQL Server, and it results in remote file server performance that is comparable to local storage.

SMB Direct requires:

  • At least two computers running Windows Server. No additional features have to be installed, and the technology is available by default.
  • Network adapters that are RDMA-capable with the latest vendor drivers installed. SMB Direct supports common RDMA-capable network adapter types, including Internet Wide Area RDMA Protocol (iWARP), InfiniBand, and RDMA over Converged Ethernet (RoCE).

SMB Direct works in conjunction with SMB Multichannel to transparently provide a combination of exceptional performance and failover resiliency when multiple RDMA links between clients and SMB file servers are detected. Be aware that RDMA bypasses the kernel stack, and therefore RDMA does not work with NIC Teaming; however, it does work with SMB Multichannel, because SMB Multichannel is enabled at the application layer. The result being that SMB multichannel provides the equivalent of NIC teaming for access to SMB file shares.

In Windows Server, SMB Direct optimizes I/O with high-speed network adapters, including 40 Gbps Ethernet and 56 Gbps InfiniBand through the use of batching operations, RDMA remote invalidation, and non-uniform memory access (NUMA) optimizations. In Windows Server, SMB Direct also leverages Network Direct Kernel Provider Interface 1.2 (NDKPI 1.2), and it is backwards compatible with NDKPI 1.1.

SMB Multichannel

SMB 3.0 protocol in Windows Server supports SMB Multichannel, which provides scalable and resilient connections to SMB shares that dynamically create multiple connections for single sessions or multiple sessions on single connections, depending on connection capabilities and current demand. This capability to create flexible session-to-connection associations gives SMB a number of key features, including:

  • Connection resiliency: With the ability to dynamically associate multiple connections with a single session, SMB gains resiliency against connection failures that are usually caused by network interfaces or components. SMB Multichannel also allows clients to actively manage paths of similar network capability in a failover configuration that automatically switches sessions to the available paths if one path becomes unresponsive.
  • Network usage: SMB can utilize receive-side scaling (RSS)–capable network interfaces with the multiple connection capability of SMB Multichannel to fully use high-bandwidth connections, such as those that are available on 10 GbE networks, during Read and Write operations with workloads that are evenly distributed across multiple CPUs.
  • Load balancing: Clients can adapt to changing network conditions to dynamically rebalance loads to a connection or across a set of connections that are more responsive when congestion or other performance issues occur.
  • Transport flexibility: Because SMB Multichannel also supports single session to multiple connection capabilities, SMB clients are flexible enough to adjust dynamically when new network interfaces become active. This is how SMB Multichannel is automatically enabled whenever multiple UNC paths are detected and can grow dynamically to use multiple paths as more are added, without administrator intervention.

SMB Multichannel has the following requirements, which are organized by how SMB Multichannel prioritizes connections when multiple connection types are available:

  • RDMA-capable network connections: SMB Multichannel can be used with a single InfiniBand connection on the client and server or with a dual InfiniBand connection on each server, connected to different subnets. Although SMB Multichannel offers scaling performance enhancements in single adapter scenarios through RDMA and RSS, if available, it cannot supply failover and load balancing capabilities without multiple paths. RDMA-capable network adapters include iWARP, InfiniBand, and RoCE.
  • RSS-capable network connections: SMB Multichannel can utilize RSS-capable connections in 1-1 connection scenarios or multiple connection scenarios. Multichannel load balancing and failover capabilities are not available unless multiple paths exist, but it provides scaling performance usage by spreading overhead between multiple processors by using RSS-capable hardware.
  • Load balancing and failover or aggregate interfaces: When RDMA or RSS connections are not available, SMB prioritizes connections that use a collection of two or more physical interfaces. This requires more than one network interface on the client and server, where both are configured as a network adapter team. In this scenario, load balancing and failover are the responsibility of the teaming protocol, not SMB Multichannel, when only one NIC Teaming connection is present and no other connection path is available.
  • Standard interfaces and Hyper-V virtual networks: These connection types can use SMB Multichannel capabilities but only when multiple paths exist. For all practical intentions, one GB Ethernet connection is the lowest priority connection type that is capable of using SMB Multichannel.
  • Wireless network interfaces: Wireless interfaces are not capable of multichannel operations.

When connections are not similar between client and server, SMB Multichannel utilizes available connections when multiple connection paths exist. For example, if the SMB file server has a 10 GbE connection, but the client has only four 1 GbE connections, and each connection forms a path to the file server, then SMB Multichannel can create connections on each 1 GbE interface. This provides better performance and resiliency, even though the network capabilities of the server exceed the network capabilities of the client.

Note that SMB Multichannel only affects multiple file operations and you cannot distribute a single file operation (such as accessing a particular VHD) over multiple channels simultaneously. However, a single file copy or accessing a single file on a VHD uses multiple channels although each Read or Write operation travels through only one of the channels.

SMB Transparent Failover

SMB Transparent Failover helps administrators configure file shares in Windows failover cluster configurations for continuous availability. Continuous availability enables administrators to perform hardware or software maintenance on any cluster node without interrupting the server applications that are storing their data files on these file shares.

If there is a hardware or software failure, the server application nodes transparently reconnect to another cluster node without interrupting the server application I/O operations. By using Scale-Out File Server, SMB Transparent Failover allows the administrator to redirect a server application node to a different file-server cluster node to facilitate better load balancing.

SMB Transparent Failover has the following requirements:

  • A failover cluster that is running Windows Server with at least two nodes. The configuration of servers, storage, and networking must pass all of the tests performed in the Cluster Configuration Validation Wizard.
  • Scale-Out File Server role installed on all cluster nodes.
  • Clustered file server configured with one or more file shares that have continuous availability.
  • Client computers running Windows Server, Windows Server 2012, Windows 8.1, or Windows 8.

To realize the potential benefits of the SMB Transparent Failover feature, the client computer and the server must support SMB 3.0, which was first introduced in Windows Server 2012 and Windows 8. Computers running down-level SMB versions, such as SMB 2.1 or SMB 2.0 can connect and access data on a file share that has continuous availability, but they will not be able to realize the potential benefits of the SMB Transparent Failover feature. It is important to note that Hyper-V over SMB requires SMB 3.0; therefore, down-level versions of the SMB protocol are not relevant for these designs.

SMB Encryption

SMB Encryption protects incoming data from unintentional snooping threats on untrusted networks, with no additional setup requirements. SMB 3.0 in Windows Server secures data transfers by encrypting incoming data, to protect against tampering and eavesdropping attacks. The biggest potential benefit of using SMB Encryption instead of general solutions (such as IPsec) is that there are no deployment requirements or costs beyond changing the SMB settings in the server. The encryption algorithm that is used is AES-CCM, which also provides data-integrity validation.

SMB 3.0 uses a newer algorithm (AES-CMAC) for validation, instead of the HMAC-SHA-256 algorithm that SMB 2.0 uses. AES-CCM and AES-CMAC can be dramatically accelerated on most modern CPUs that have AES instruction support.

By using Windows Server, an administrator can enable SMB Encryption for the entire server, or for only specific file shares. Because there are no other deployment requirements for SMB Encryption, it is an extremely cost effective way to protect data from snooping and tampering attacks. Administrators can turn on SMB Encryption simply by using the File Server Manager or Windows PowerShell.

Volume Shadow Copy Service

Volume Shadow Copy Service (VSS) is a framework that enables volume backups to run while applications on a system continue to write to the volumes. A new feature called "VSS for SMB File Shares" was introduced in Windows Server to support applications that store their data files on remote SMB file shares. This feature enables VSS-aware backup applications to perform consistent shadow copies of VSS-aware server applications that store data on SMB 3.0 file shares. Prior to this feature, VSS supported only shadow copies of data that were stored on local volumes.

Scale-Out File Server

One the main advantages of file storage over block storage is the ease of configuration, paired with the ability to configure folders that can be shared by multiple clients. SMB takes this one step farther by introducing the SMB Scale-Out feature, which provides the ability to share the same folders from multiple nodes of the same cluster. This is made possible by the use of the Cluster Shared Volumes (CSV) feature, which supports file sharing in Windows Server.

For example, if you have a four-node file-server cluster that uses Scale-Out File Server, an SMB client will be able to access the share from any of the four nodes. This active-active configuration lets you balance the load across cluster nodes by allowing an administrator to move clients without any service interruption. This means that the maximum file-serving capacity for a given share is no longer limited by the capacity of a single cluster node.

Scale-Out File Server also helps keep configurations simple, because a share is configured only once to be consistently available from all nodes of the cluster. Additionally, SMB Scale-Out simplifies administration by not requiring cluster virtual IP addresses or by creating multiple cluster file-server resources to utilize all cluster nodes.

Scale-Out File Server requires:

  • A failover cluster that is running Windows Server with at least two nodes. The cluster must pass the tests in the Cluster Configuration Validation Wizard. In addition, the clustered role should be created for scale out. This is not applicable to traditional file server clustered roles. An existing traditional file server clustered role cannot be used in (or upgraded to) a Scale-Out File Server cluster.
  • File shares that are created on a Cluster Shared Volume with continuous availability. This is the default setting.
  • Computers running Windows Server, Windows Server 2012, Windows 8.1, or Windows 8.

Windows Server provides several capabilities with respect to Scale-Out File Server functionality including support for multiple instances, bandwidth management, and automatic rebalancing. SMB in Windows Server provides an additional instance on each cluster node in Scale-Out File Server specifically for CSV traffic. A default instance can handle incoming traffic from SMB clients that are accessing regular file shares, while another instance only handles inter-node CSV traffic.

The SMB in Windows Server uses data structures (locks, queues, and threads) to satisfy requests between clients and cluster nodes. In Windows Server, each node contains two logical instances of SMB—one instance to handle CSV metadata or redirected traffic between nodes and a second instance to handle SMB clients that are accessing file share data. Windows Server provides separate data structures (locks and queues) for each type of traffic, improving the scalability and reliability of traffic between nodes.

SMB in Windows Server also supports the ability to configure bandwidth limits based for predefined categories. SMB traffic is divided into three predefined categories that are named Default, VirtualMachine, and LiveMigration. It is possible to configure a bandwidth limit for each predefined category through Windows PowerShell or WMI by using bytes-per-second. This is especially useful when live migration and SMB Direct (SMB over RDMA) are utilized.

In Windows Server, Scale-Out File Server also supports automatic rebalancing. SMB client connections are tracked per file share (instead of per server) when direct I/O is not available on the volume. Clients are then redirected to the cluster node with the best access to the volume that is used by the file share. This automatic behavior improves efficiency by reducing redirection traffic between file server nodes. Clients are redirected following an initial connection and when cluster storage is reconfigured.

File Services

Storage Spaces

Storage Spaces introduces a new class of sophisticated storage virtualization enhancements to the storage stack that incorporates two concepts:

  • Storage pools: Virtualized units of physical disk units that enable storage aggregation, elastic capacity expansion, and delegated administration.
  • Storage spaces: Virtual disks with associated attributes that include a desired level of resiliency, thin or fixed provisioning, automatic or controlled allocation on diverse storage media, and precise administrative control.

The Storage Spaces feature in Windows Server can utilize failover clustering for high availability, and it can be integrated with CSV for scalable deployments.

Figure 14 CSV v2 can be integrated with Storage Spaces

The features that Storage Spaces includes are:

  • Storage pooling: Storage pools are the fundamental building blocks for Storage Spaces. IT administrators can flexibly create storage pools, based on the needs of the deployment. For example, given a set of physical disks, an administrator can create one pool by using all of the physical disks that are available or multiple pools by dividing the physical disks as required. In addition, to promote the value from storage hardware, the administrator can map a storage pool to combinations of hard disk drives in addition to solid-state drives (SSDs). Pools can be expanded dynamically simply by adding more drives, thereby seamlessly scaling to cope with increasing data growth as needed.
  • Multitenancy: Administration of storage pools can be controlled through access control lists (ACLs) and delegated on a per-pool basis, thereby supporting hosting scenarios that require tenant isolation. Storage Spaces follows the familiar Windows security model. Therefore, it can be integrated fully with Active Directory Domain Services (AD DS).
  • Resilient storage: Storage Spaces supports two optional resiliency modes: mirroring and parity. Capabilities such as per-pool hot spare support, background scrubbing, and intelligent error correction enable optimal service availability despite storage component failures.
  • Continuous availability through integration with failover clustering: Storage Spaces is fully integrated with failover clustering to deliver continuous availability. One or more pools can be clustered across multiple nodes in a single cluster. Storage Spaces can then be instantiated on individual nodes, and it will seamlessly migrate or fail over to a different node in response to failure conditions or because of load balancing. Integration with CSV 2.0 enables scalable access to data on storage infrastructures.
  • Optimal storage use: Server consolidation frequently results in multiple datasets that share the same storage hardware. Storage Spaces supports thin provisioning to enable businesses to easily share storage capacity among multiple unrelated datasets, thereby promoting capacity use. Trim support enables capacity reclamation when possible.
  • Operational simplicity: Fully scriptable remote management is permitted through the Windows Storage Management API, Windows Management Instrumentation (WMI), and Windows PowerShell. Storage Spaces can be managed easily through the File Services GUI in Server Manager or by using task automation with many new Windows PowerShell cmdlets.
  • Fast rebuild: If a physical disk fails, Storage Spaces will regenerate the data from the failed physical disk in parallel. During parallel regeneration, a single disk in the pool serves as the source of data or the target of data, and Storage Spaces maximizes peak sequential throughput. No user action is necessary, and the newly created Storage Spaces will use the new policy.

For single-node environments, Windows Server requires the following:

  • Serial or SAS-connected disks (in an optional JBOD enclosure)

For multiserver and multisite environments, Windows Server requires the following:

  • Any requirements that are specified for failover clustering and CSV 2.0
  • Three or more SAS-connected disks (JBODs) to encourage compliance with Windows Certification requirements
Storage Spaces Write-Back Cache

Storage Spaces in Windows Server supports an optional write-back cache that can be configured with simple, parity, and mirrored spaces. The write-back cache is designed to improve performance for workloads with small, random writes by using solid-state drives (SSDs) to provide a low-latency cache. The write-back cache has the same resiliency requirements as the Storage Spaces it is configured for. That is, a simple space requires a single journal drive, a two-way mirror requires two journal drives, and a three-way mirror requires three journal drives.

If you have dedicated journal drives, they are automatically selected for the write-back cache, and if there are no dedicated journal drives, drives that report the media type as SSDs are selected to host the write-back cache.

The write-back cache is associated with an individual space, and it is not shared. However, the physical SSDs that are associated with the write-back cache can be shared among multiple write-back caches. The write-back cache is enabled by default, if there are SSDs available in the pool. The default size of the write-back cache is 1 GB, and it is not recommended to change the size. After the write-back cache is created, its size cannot be changed.

Storage Tiers

Storage Spaces in Windows Server supports the capability for a virtual hard disk to have the best characteristics of SSDs and hard disk drives to optimize placement of workload data. The most frequently accessed data is prioritized to be placed on high-performance SSDs, and the less frequently accessed data is prioritized to be placed on high-capacity, lower-performance hard disk drives.

Data activity is measured in the background and periodically moved to the appropriate location with minimal performance impact to a running workload. Administrators can further override automated placement of files, based on access frequency. It is important to note that storage tiers are compatible only with mirror spaces or simple spaces. Parity spaces are not compatible with storage tiers.

Resilient File System

Windows Server supports the updated local file system called Resilient File System (ReFS). ReFS promotes data availability and online operation, despite errors that would historically cause data loss or downtime. Data integrity helps protect business-critical data from errors and helps make sure that the data is available when needed. The ReFS architecture provides scalability and performance in an era of constantly growing dataset sizes and dynamic workloads.

ReFS was designed with three key goals in mind:

  • Maintain the highest possible levels of system availability and reliability, under the assumption that the underlying storage might be unreliable.
  • Provide a full end-to-end resilient architecture when it is used in conjunction with Storage Spaces, so that these two features magnify the capabilities and reliability of one another when they are used together.
  • Maintain compatibility with widely adopted and successful NTFS file system features, while replacing features that provide limited value.

In Windows Server, Cluster Shared Volumes includes compatibility for ReFS in addition to many other storage enhancements. ReFS in Windows Server also provides support to automatically correct corruption in a Storage Spaces parity space.

NTFS Improvements

In Windows Server and Windows Server 2012, NTFS has been enhanced to maintain data integrity when using cost-effective, industry-standard SATA drives. NTFS provides online corruption scanning and repair capabilities that reduce the need to take volumes offline. When they are combined, these capabilities let you deploy very large NTFS volumes with confidence.

Two key enhancements were made to NTFS in Windows Server 2012. The first one targets the need to maintain data integrity in inexpensive commodity storage. This was accomplished by enhancing NTFS so that it relies on the flush command instead of "forced unit access" for all operations that require Write ordering. This improves resiliency against metadata inconsistencies that are caused by unexpected power loss. This means that you can more safely use cost-effective, industry-standard SATA drives.

NTFS availability is the focus of the second key enhancement, and this is achieved through a combination of features, which include:

  • Online corruption scanning: Windows Server performs online corruption scanning operations as a background operation on NTFS volumes. This scanning operation identifies and corrects areas of data corruption if they occur, and it includes logic that distinguishes between transient conditions and actual data corruption, which reduces the need for CHKDSK operations.
  • Improved self-healing: To further improve resiliency and availability, Windows Server significantly increases online self-healing to resolve many issues on NTFS volumes without the need to take the volume offline to run CHKDSK.
  • Reduced repair times: In the rare case of data corruption that cannot be fixed with online self-healing, administrators are notified that data corruption has occurred, and they can choose when to take the volume offline for a CHKDSK operation. Furthermore, because of the online corruption-scanning capability, CHKDSK scans and repairs only tagged areas of data corruption. Because it does not have to scan the whole volume, the time that is necessary to perform an offline repair is greatly reduced. In most cases, repairs that would have taken hours on volumes that contain a large number of files, now take seconds.

Scale-Out File Server Cluster Architecture

In Windows Server, the following clustered file-server types are available:

  • Scale-Out File Server cluster for application data: This clustered file server lets you store server application data (such as virtual machine files in Hyper-V) on file shares, and obtain a similar level of reliability, availability, manageability, and performance that you would expect from a storage area network. All file shares are online on all nodes simultaneously. File shares that are associated with this type of clustered file server are called scale-out file shares. This is sometimes referred to as an Active/Active cluster.
  • File Server for general use: This is the continuation of the clustered file server that has been supported in Windows Server since the introduction of failover clustering. This type of clustered file server, and thus all of the file shares that are associated with the clustered file server, is online on one node at a time. This is sometimes referred to as an Active/Passive or a Dual/Active cluster. File shares that are associated with this type of clustered file server are called clustered file shares.

In Windows Server, a Scale-Out File Server cluster is designed to provide file shares that are continuously available for file-based server application storage. As discussed previously, scale-out file shares provide the ability to share the same folder from multiple nodes of the same cluster. For instance, if you have a four-node file-server cluster that is using the Server SMB Scale-Out feature, which was introduced in Windows Server 2012, a computer that is running Windows Server or Windows Server 2012 can access file shares from any of the four nodes. This is achieved by utilizing failover-clustering features in Windows Server and Windows Server 2012 and new capabilities in SMB 3.0.

File server administrators can provide scale-out file shares and continuously available file services to server applications and respond to increased demands quickly by bringing more servers online. All of this is completely transparent to the server application. When combined with Scale-Out File Server, this provides comparable capabilities to traditional SAN architectures.

Key potential benefits that are provided by a Scale-Out File Server cluster in Windows Server include:

  • Active/Active file shares. All cluster nodes can accept and serve SMB client requests. By making the file-share content accessible through all cluster nodes simultaneously, SMB 3.0 clusters and clients cooperate to provide transparent failover (continuous availability) to alternative cluster nodes during planned maintenance and unplanned failures without service interruption.
  • Increased bandwidth. The maximum file share bandwidth is the total bandwidth of all file-server cluster nodes. Unlike in previous versions of Windows Server, the total bandwidth is not constrained to the bandwidth of a single cluster node, but instead to the capability of the backing storage system. You can increase the total bandwidth by adding nodes.
  • CHKDSK with zero downtime. CHKDSK in Windows Server is significantly enhanced to dramatically shorten the time a file system is offline for repair. Cluster Shared Volumes (CSV) in Windows Server 2012 take this one step further and eliminate the offline phase. A CSV file system can perform CHKDSK without affecting applications that have open handles in the file system.
  • Clustered Shared Volumes cache. CSV in Windows Server support a Read cache, which can significantly improve performance in certain scenarios, such as a Virtual Desktop Infrastructure.
  • Simplified management. You create the Scale-Out File Server cluster and then add the necessary CSV and file shares. It is no longer necessary to create multiple clustered file servers, each with separate cluster disks, and then develop placement policies to confirm activity on each cluster node.

Storage Features

Data Deduplication

The Fibre Channel SAN component in iSCSI often provides data deduplication functionality. By using the data deduplication feature in Windows Server, organizations can significantly improve the efficiency of storage capacity usage. In Windows Server, data deduplication provides the following features:

  • Capacity optimization: Data deduplication lets you store more data in less physical space. You can achieve significantly better storage efficiency than was previously possible with Single Instance Storage (SIS) or New Technology File System (NTFS) compression. Data deduplication uses variable size chunking and compression, which deliver optimization ratios of up to 2:1 for general file servers and up to 20:1 for VHD libraries.
  • Scalability and performance: Data deduplication is highly scalable, resource-efficient, and non-intrusive. It can run on dozens of large volumes of primary data simultaneously, without affecting other workloads on the server.
  • Reliability and data integrity: When you apply data deduplication, you must maintain data integrity. To help with data integrity, Windows Server utilizes checksum, consistency, and identity validation. In addition, to recover data in the event of corruption, Windows Server maintains redundancy for all metadata and the most frequently referenced data.
  • Bandwidth efficiency with BranchCache: Through integration with BranchCache, the same optimization techniques that are applied for improving data storage efficiency on the disk are applied to transferring data over a wide area network (WAN) to a branch office. This integration results in faster file download times and reduced bandwidth consumption.

Windows Server data deduplication uses a post-processing approach, which identifies files for optimization and then applies the algorithm for deduplication. The deduplication process moves data to a chunk store and selectively compresses the data, replacing redundant copies of each chunk with a reference to a single copy. When this process is complete, the original files are replaced with reparse points that contain references to optimized data chunks.

Data deduplication is supported only on NTFS data volumes that are hosted on Windows Server or Windows Server 2012 or Cluster Shared Volumes in Windows Server. Deduplication is not supported on boot volumes, FAT or ReFS volumes, remote mapped or remote mounted drives, CSV file system volumes in Windows Server 2012, live data (such as SQL Server databases), Exchange Server stores, or virtual machines that use local storage on a server running Hyper-V.

Files that are not supported include those with extended attributes, encrypted files, and files smaller than 32K. Files with reparse points will also not be processed. From a design perspective, deduplication is not supported for files that are open and constantly changing for extended periods or that have high I/O requirements, such as running virtual machines or live SQL Server databases. The exception is that in Windows Server, deduplication supports live VHDs for VDI for workloads.

Thin Provisioning and Trim

Like data deduplication, thin-provisioning technology improves the efficiency of how we use and provision storage. Instead of removing redundant data on the volume, thin provisioning gains efficiencies by making it possible to allocate just enough storage at the moment of storage allocation, and then increase capacity as your business needs grow over time.

Windows Server provides full support for thinly provisioned storage arrays, which lets you get the most out of your storage infrastructure. These sophisticated storage solutions offer just-in-time allocations (known as thin provisioning), and the ability to reclaim storage that is no longer needed (known as trim).

Volume Cloning

Volume cloning is another common practice in virtualization environments. Volume cloning can be used for host and virtual machine volumes to dramatically improve provisioning times.

Volume Snapshot

SAN volume snapshots are a common method of providing a point-in-time, instantaneous backup of a SAN volume or LUN. These snapshots are typically block-level, and they only utilize storage capacity as blocks change on the originating volume. However, Windows Server does not control this behavior—it varies by storage array vendor.

Storage Tiering

Storage tiering physically partitions data into multiple distinct classes such as price or performance. Data can be dynamically moved among classes in a tiered storage implementation, based on access, activity, or other considerations.

Storage tiering normally combine varying types of disks that are used for different data types (for example, production, non-production, or backups).

Storage Management and Automation

Windows Server provides a unified interface, which is a powerful and consistent mechanism for managing storage. The storage interface can reduce complexity and operational costs, and it provides capabilities for advanced management of storage, in addition to a core set of defined WMI and Windows PowerShell interfaces.

The interface uses WMI for comprehensive management of physical and virtual storage, including non-Microsoft intelligent storage subsystems, and it provides a rich experience for IT professionals and developers by using Windows PowerShell scripting to help make a diverse set of solutions available.

Management applications can use a single Windows API to manage storage types by using SMP or standards-based protocols such as Storage Management Initiative Specification (SMI-S). The WMI-based interface provides a single mechanism through which to manage all storage, including non-Microsoft intelligent storage subsystems and virtualized local storage (known as Storage Spaces).

Figure 13 shows the unified storage management architecture.

Figure 15 Unified storage management architecture

Windows Offloaded Data Transfers (ODX)

Whenever possible, the speed of your virtualization platform should rival that of physical hardware. Offloaded data transfers (ODX) is a feature in the storage stack in Windows Server. When used with offload-capable SAN storage hardware, ODX lets a storage device perform a file copy operation without the main processor of the server running Hyper-V reading the content from one storage place and writing it to another.

ODX enables rapid provisioning and migration of virtual machines, and it provides significantly faster transfers of large files, such as database or video files. By offloading the file transfer to the storage array, ODX minimizes latencies, promotes the use of array throughput, and reduces host resource usage such as central processing unit (CPU) and network consumption.

File transfers are automatically and transparently offloaded when you move or copy files, regardless of whether you perform drag-and-drop operations in Windows Explorer or use command-line file copy commands. No administrator setup or intervention is necessary.

Network Architecture

A variety of designs and new approaches to data center networks have emerged in recent years. The objective in most cases is to improve resiliency and performance while optimizing performance for highly virtualized environments.

Network Architecture Patterns

Hierarchical

Many network architectures include a hierarchical design with three or more tiers including the core tier, the aggregation (or distribution) tier, and the access tier. Designs are driven by the port bandwidth and quantity that are required at the edge, in addition to the ability of the core and aggregation tiers to provide higher speed uplinks to aggregate traffic. Additional considerations include Ethernet broadcast boundaries and limitations and loop-avoidance technologies.

Core tier

The core tier is the high-speed backbone for the network architecture. The core typically comprises two modular switch chassis to provide a variety of service and interface module options. The data entry of the core tier might interface with other network modules.

Aggregation tier

The aggregation (or distribution) tier consolidates connectivity from multiple access tier switch uplinks. This tier is commonly implemented in end-of-row switches, a centralized wiring closet, or main distribution frame room. The aggregation tier provides high-speed switching and more advanced features, like Layer 3 routing and other policy-based networking capabilities. The aggregation tier must have redundant, high-speed uplinks to the core tier for high availability.

Access tier

The access tier provides device connectivity to the data center network. This tier is commonly implemented by using Layer 2 Ethernet switches—typically through blade chassis switch modules or top-of-rack (ToR) switches. The access tier must provide redundant connectivity for devices, required port features, and adequate capacity for access (device) ports and uplink ports.

The access tier can also provide features that are related to NIC Teaming, such as Link Aggregation Control Protocol (LACP). Certain teaming solutions might require LACP switch features.

Figure 15 illustrates two three-tier network models—one provides 10 GbE to devices and the other provides 1 GbE to devices.

Figure 16 Comparative of gigabit Ethernet edge topology

Flat Network

A flat network topology is adequate for very small networks. In a flat network design, there is no hierarchy. Each networking device has essentially the same job, and the network is not divided into layers or modules. A flat network topology is easy to design and implement, and it is easy to maintain if the network stays small. When the network grows, however, a flat network is undesirable. The lack of hierarchy makes troubleshooting difficult because instead of being able to concentrate troubleshooting efforts in just one area of the network, you might have to inspect the entire network.

Network Virtualization (Software-Defined Networking)

Hyper-V Network Virtualization (HNV) supports the concept of a virtual network that is independent of the underlying physical network. With this concept of virtual networks (which are not the same as VPNs) which are composed of one or more virtual subnets, the exact physical location of an IP subnet is decoupled from the virtual network topology. As a result, customers can easily move their subnets to the cloud while preserving their existing IP addresses and topology in the cloud, so that existing services continue to work and they are unaware of the physical location of the subnets.

Hyper-V Network Virtualization in Windows Server provides policy-based, software-controlled network virtualization that reduces the management overhead that enterprises face when they expand dedicated infrastructure-as-a-service (IaaS) clouds. In addition, it provides cloud hosting service providers with better flexibility and scalability for managing virtual machines so that they can achieve higher resource utilization.

Network Performance and Low Latency

Data Center Bridging

Separate isolated connections for network, live migration, and management traffic make managing network switches and other networking infrastructure a challenge. As data centers evolve, IT organizations look to some of the latest innovations in networking to help solve these issues. The introduction of 10 GbE networks, for example, helps support converged networks that can handle network, storage, live migration, and management traffic through a single connection, reducing the requirements for and costs of IT management.

Data-center bridging (DCB) refers to enhancements to Ethernet LANs that are used in data center environments. These enhancements consolidate the various forms of a network into a single technology, known as a converged network adapter or CNA. In the virtualized environment, Hyper-V in Windows Server and Windows Server 2012 can utilize data-center bridging hardware to converge multiple types of network traffic on a single network adapter, with a maximum level of service to each.

Data-center bridging is a hardware mechanism that classifies and dispatches network traffic, supporting far fewer traffic flows. It converges various types of traffic, including network, storage, management, and live migration traffic. However, it also can classify network traffic that does not originate from the networking stack (for example, hardware that is accelerated through iSCSI when the system does not use the Microsoft iSCSI initiator).

Virtual Machine Queue (VMQ)

The virtual machine queue (VMQ) feature allows the network adapter of the host to pass DMA packets directly into individual virtual machine memory stacks. Each virtual machine device buffer is assigned a VMQ, which avoids needless packet copies and route lookups in the virtual switch. Essentially, VMQ allows the single network adapter of the host to appear as multiple network adapters to the virtual machines, to allow each virtual machine its own dedicated network adapter. The result is less data in the buffers of the host and an overall performance improvement in I/O operations.

The VMQ is a hardware virtualization technology that is used for the efficient transfer of network traffic to a virtualized host operating system. A VMQ-capable network adapter classifies incoming frames to be routed to a VMQ receive queue, based on filters that associate the queue with the network adapter of a virtual machine. These hardware queues can have affinities to specific CPUs to allow scaling on a per–virtual machine network adapter basis.

Windows Server dynamically distributes the incoming network traffic to host processors, based on processor use and network load. In times of heavy network load, dynamic VMQ automatically uses more processors. In times of light network load, dynamic VMQ relinquishes those same processors.

Dynamic VMQ requires hardware network adapters and drivers that support Network Device Interface Specification (NDIS) 6.30 or higher.

IPsec Task Offload

IPsec protects network communication by authenticating and encrypting some or all of the content of network packets. IPsec Task Offload in Windows Server utilizes the hardware capabilities of server network adapters to offload IPsec processing. This reduces the CPU overhead of IPsec encryption and decryption significantly.

In Windows Server, IPsec Task Offload is extended to virtual machines. Customers who use virtual machines and want to help protect their network traffic by using IPsec can utilize the IPsec hardware offload capability that is available in server network adapters. Doing so frees up CPU cycles to perform more application-level work and leaves the per-packet encryption and decryption to hardware.

Quality of Service (QoS)

Quality of Service (QoS) is a set of technologies that provide you with the ability to cost-effectively manage network traffic in network environments. There are three options for deploying QoS in Windows Server:

  • Data center bridging. This is performed by hardware, and it is good for iSCSI environments. However, it requires hardware investments and it can be complex.
  • Policy-based QoS. Historically present in Windows Server, this capability is managed with Group Policy. Challenges include that it does not provide the required capabilities within the Microsoft iSCSI initiator or Hyper-V environments.
  • Hyper-V QoS. This capability works well for virtual machine workloads and virtual network adapters on servers running Hyper-V. However, it requires careful planning and an implementation strategy because it is not managed with Group Policy. (Networking is managed somewhat differently in Virtual Machine Manager.)

For most deployments, one or two 10 GbE network adapters should provide enough bandwidth for all the workloads on a server running Hyper-V. However, 10 GbE network adapters and switches are considerably more expensive than the 1 GbE counterparts. To optimize the 10 GbE hardware, a server running Hyper-V requires new functionality to manage bandwidth.

Windows Server expands the power of QoS by providing the ability to assign a minimum bandwidth to a virtual machine or service. This feature is important for service providers and companies that honor SLA clauses that promise a minimum network bandwidth to customers. It is equally important to enterprises that require predictable network performance when they run virtualized server workloads on shared hardware.

In addition to the ability to enforce maximum bandwidth, QoS in Windows Server provides a new bandwidth management feature: minimum bandwidth. Unlike maximum bandwidth, which is a bandwidth cap, minimum bandwidth is a bandwidth floor, and it assigns a certain amount of bandwidth to a specific type of traffic. It is possible to implement minimum and maximum bandwidth limits simultaneously.

Remote Direct Memory Access and SMB Direct

SMB Direct (SMB over RDMA) is a storage protocol in Windows Server. It enables direct memory-to-memory data transfers between server and storage, with minimal CPU usage, while using standard RDMA-capable network adapters. SMB Direct is supported on three types of RDMA technology: iWARP, InfiniBand, and RoCE. In Windows Server there are more scenarios that can take advantage of RDMA connectivity including CSV redirected mode and live migration.

Receive Segment Coalescing

Receive segment coalescing improves the scalability of the servers by reducing the overhead for processing a large amount of network I/O traffic. It accomplishes this by coalescing multiple inbound packets into a large buffer.

Receive-Side Scaling

Receive-side scaling (RSS) spreads monitored interruptions over multiple processors, so a single processor is not required to handle all I/O interruptions, which was common in earlier versions of Windows Server. Active load balancing between the processors tracks the load on the CPUs and then transfers the interruptions as necessary.

You can select which processors will be used to handle RSS requests beyond 64 processors, which allows you to utilize high-end computers that have a large number of logical processors.

RSS works with NIC Teaming to remove a limitation in earlier versions of Windows Server, where you had to choose between using hardware drivers or RSS. RSS will also work for User Datagram Protocol (UDP) traffic, and it can manage and debug applications that use WMI and Windows PowerShell.

Virtual Receive-Side Scaling

Windows Server includes support for virtual receive-side scaling, which much like standard RSS, allows virtual machines to distribute network processing loads across multiple virtual processors to increase network throughput within virtual machines.

Virtual receive-side scaling is only available on virtual machines running the Windows Server and Windows 8.1 operating systems, and it requires VMQ support on the physical adapter. Virtual receive-side scaling is disabled by default if the VMQ-capable adapter is less than 10 Gbps.

In addition, SR-IOV cannot be enabled in a virtual machine network interface to take advantage of virtual receive-side scaling. Virtual receive-side scaling coexists with NIC Teaming, live migration, and Network Virtualization using Generic Routing Encapsulation (NVGRE).

SR-IOV

The SR-IOV standard was introduced by the Peripheral Component Special Interest Group (PCI-SIG). This group owns and manages the Peripheral Component Interconnect (PCI) specifications as open industry standards. SR-IOV works with system support for virtualization technologies that provides remapping for interrupts and DMA, and it lets SR-IOV–capable devices be assigned directly to a virtual machine.

Hyper-V in Windows Server enables support for SR-IOV–capable network devices, and it allows the direct assignment of the SR-IOV virtual function of a physical network adapter to a virtual machine. This increases network throughput and reduces network latency, while reducing the host CPU overhead that is required for processing network traffic.

TCP Chimney Offload

The TCP chimney architecture offloads the data transfer portion of TCP protocol processing for one or more TCP connections to a network adapter. This architecture provides a direct connection, called a chimney, between applications and an offload-capable network adapter.

The chimney offload architecture reduces host network processing for network-intensive applications, so networked applications scale more efficiently and end-to-end latency is reduced. In addition, fewer servers are needed to host an application, and servers are able to use the full Ethernet bandwidth.

Note Virtual machine chimney, also called TCP offload, was removed in Windows Server. The TCP chimney will not be available to guest operating systems because it applies only to host traffic.

Network High Availability and Resiliency

To increase reliability and performance in virtualized environments, Windows Server and Windows Server 2012 include built-in support for network adapter hardware that is NIC Teaming–capable. NIC Teaming is also known as "load balancing and failover or LBFO".

Not all traffic will benefit from NIC Teaming. The most noteworthy exception is storage traffic, where iSCSI should be handled by MPIO, and SMB should be backed by SMB Multichannel. However, when a single set of physical network adapters is used for storage and networking traffic, teaming for storage traffic is acceptable and encouraged.

NIC Teaming

NIC Teaming, allows multiple network adapters to be placed into a team for the purposes of bandwidth aggregation and traffic failover. This helps maintain connectivity in the event of a network component failure. NIC Teaming is included in Windows Server and Windows Server 2012.

NIC Teaming is compatible with all networking capabilities in Windows Server with five exceptions: SR-IOV, RDMA, Policy-Based QoS, TCP chimney, and 802.1X authentication. From a scalability perspective in Windows Server, a minimum of two and a maximum of 32 network adapters can be added to a single team, and an unlimited number of teams can be created on a single host.

NIC Teaming Types

Establishing NIC Teaming requires setting the teaming mode and distribution mode for the team. Two basic sets of algorithms are used for the teaming modes. These are exposed in the UI as three options—a switch-independent mode, and two switch-dependent modes: Static Teaming and Dynamic Teaming (LACP).

  • Switch-independent mode: These algorithms make it possible for team members to connect to different switches because the switch does not know that the interface is part of a team on the host. These modes do not require the switch to participate in the teaming. This is generally recommended for Hyper-V deployments.
  • Switch-dependent modes: These algorithms require the switch to participate in the teaming. All interfaces of the team are connected to the same switch.

There are two common choices for switch-dependent modes of NIC Teaming:

  • Static teaming (based on IEEE 802.3ad): This mode requires configuration on the switch and on the host to identify which links form the team. Because this is a statically configured solution, there is no additional protocol to assist the switch and host to identify incorrectly plugged cables or other errors that could cause the team to fail. Typically, this mode is supported by server-class switches.
  • Dynamic teaming (based on IEEE 802.1ax): This mode works by using the LACP to dynamically identify links that are connected between the host and a specific switch. Typical server-class switches support IEEE 802.1ax, but most require administration to enable LACP on the port. There are security challenges to allow an almost completely dynamic IEEE 802.1ax to operate on a switch. As a result, switches require that the switch administrator configure the switch ports that are allowed to be members of such a team.

Switch-dependent modes result in inbound and outbound traffic that approach the practical limits of the aggregated bandwidth.

Traffic-Distribution Algorithms

Aside from teaming modes, three algorithms are used for traffic distribution within NIC Teaming in Windows Server and Windows Server 2012. These are exposed in the UI under the Load Balancing mode as Dynamic, Hyper-V Switch Port, and Address Hash.

  • Dynamic: The Dynamic traffic distribution algorithm (sometimes referred to as adaptive load balancing) adjusts the distribution of load continuously in an attempt to more equitably carry the load across team members. This produces a higher probability that all the available bandwidth of the team can be used. In this mode, NIC Teaming recognizes bursts of transmission for each flow, called a flowlet, and allows it to be redirected to a new network adapter for transmission. Outgoing traffic uses the address hash for distribution. On the receiving side, specific types of traffic (Address Resolution Protocol [ARP] and Internet Control Message Protocol [ICMP]) from a virtual machine is forced to encourage traffic to arrive on a particular network adapter. The remaining traffic is sent according to the address hash distribution. This mechanism provides the benefits of multiple distribution schemes. This mode is particularly useful when virtual machine queues (VMQs) are used, and it is generally recommended for Hyper-V deployments where guest teaming is not enabled.
  • Hyper-V Port: Used when virtual machines have independent MAC addresses that can be used as the basis for dividing traffic. There is an advantage in using this scheme in virtualization, because the adjacent switch always sees source MAC addresses on only one connected interface. This causes the switch to balance the egress load (the traffic from the switch to the host) on multiple links, based on the destination MAC address on the virtual machine. Like Dynamic mode, this mode is particularly useful when virtual machine queues (VMQs) are used, because a queue can be placed on the specific network adapter where the traffic is expected to arrive. However, this mode might not be granular enough to get a well-balanced distribution, and it will always limit a single virtual machine to the bandwidth that is available on a single interface. Windows Server uses the Hyper-V Switch Port as the identifier instead of the source MAC address, because a virtual machine in some instances might be using more than one MAC address.
  • Address Hash: Creates a hash value that is based on components of the packet and then assigns packets that have that hash value to one of the available interfaces. This keeps all packets from the same TCP stream on the same interface. Components that can be used as inputs to the hashing function include:
    • Source and destination MAC addresses
    • Source and destination IP addresses, with or without considering the MAC addresses (2-tuple hash)
    • Source and destination TCP ports, usually used with the IP addresses (4-tuple hash)
Guest Virtual Machine NIC Teaming

NIC Teaming in Windows Server allows virtual machines to have virtual network adapters that are connected to more than one virtual switch and still have connectivity, even if the network adapter that is under that virtual switch is disconnected. This is particularly important when you are working with a feature such as SR-IOV traffic, which does not go through the Hyper-V virtual switch. Thus, it cannot be protected by a network adapter team that is under a virtual switch.

By using the virtual machine teaming option, you can set up two virtual switches, each of which is connected to its own SR-IOV–capable network adapter. NIC Teaming then works in one of the following ways:

  • Each virtual machine can install a virtual function from one or both SR-IOV network adapters, and if a network adapter disconnection occurs, it will fail over from the primary virtual function to the backup virtual function.
  • Each virtual machine can have a virtual function from one network adapter and a non-virtual function interface to the other switch. If the network adapter that is associated with the virtual function becomes disconnected, the traffic can fail over to the other switch without losing connectivity.
NIC Teaming Feature Compatibility

Information about the compatibility of the NIC Teaming feature in Windows Server is provided in the following whitepaper on Microsoft TechNet: Windows Server NIC Teaming (LBFO) Deployment and Management.

Network Isolation and Security

Windows Server contains new security and isolation capabilities through the Hyper-V virtual switch. With Windows Server, you can configure servers running Hyper-V to enforce network isolation among any set of arbitrary isolation groups, which are typically defined for individual customers or sets of workloads.

Windows Server provides isolation and security capabilities for multitenancy by offering the following new features:

  • Multitenant virtual machine isolation through private virtual LANs (pVLANs)
  • Protection from Address Resolution Protocol (ARP) and Neighbor Discovery protocol spoofing
  • Protection against Dynamic Host Configuration Protocol (DHCP) spoofing with DHCP guard
  • Isolation and metering by using virtual port access control lists (Port ACLs)
  • The ability to use the Hyper-V virtual switch trunk mode to direct traffic from multiple VLANs to a single network adapter in a virtual machine
  • Resource metering
  • Windows PowerShell and Windows Management Instrumentation (WMI)

VLANs

Currently, VLANs are the mechanism that most organizations use to help support tenant isolation and reuse address space. A VLAN uses explicit tagging (VLAN ID) in the Ethernet frame headers, and it relies on Ethernet switches to enforce isolation and restrict traffic to network nodes that have the same VLAN ID.

Trunk Mode to Virtual Machines

With a VLAN, a set of host machines or virtual machines appear to be on the same LAN (i.e., the same physical network segment or collision domain), independent of their actual physical locations. By using the Hyper-V virtual switch trunk mode, traffic from multiple VLANs can be directed to a single network adapter in a virtual machine that could previously receive traffic from only one VLAN. As a result, traffic from different VLANs is consolidated, and a virtual machine can listen to multiple VLANs. This feature can help you shape network traffic and enforce multitenant security in your data center.

Private VLANs

VLAN technology is traditionally used to subdivide a network and provide isolation for individual groups that share a common physical infrastructure. Windows Server introduces support for private VLANs, which is a technique that is used with VLANs that can be used to provide isolation between two virtual machines that are on the same VLAN. This could be useful in the following scenarios:

  • Lack of free primary VLAN numbers in the data center or on physical switches. (There is a maximum of 4096, possibly less depending on the hardware that is used.)
  • Isolating multiple tenants from each other in community VLANs while still providing centralized services (such as Internet routing) to all of them simultaneously (by using promiscuous mode).

When a virtual machine does not have to communicate with other virtual machines, you can use private VLANs to isolate it from other virtual machines that are in your data center. By assigning each virtual machine in a private VLAN only one primary VLAN ID and one or more secondary VLAN IDs, you can put the secondary private VLANs into one of three modes, as shown in the following table. These private VLAN modes determine which virtual machines on the private VLAN a virtual machine can talk to. To isolate a virtual machine, you should put it in isolated mode.

Private VLAN Mode

Description

Isolated

Isolated virtual machines cannot exchange packets with each other at Layer 2, and they cannot see each other. However, they can see virtual machines in promiscuous mode that are located in the same primary VLAN. There can be only one isolated secondary VLAN in a primary VLAN.

Promiscuous

Much like a traditional VLAN, virtual machines in promiscuous mode can exchange packets with any other virtual machine that is located on the same primary VLAN.

Community

Virtual machines in community mode that are on the same VLAN, can exchange packets with each other at Layer 2, and they can talk to other virtual machines in the same VLAN and to virtual machines that are in promiscuous mode. Virtual machines in community mode cannot talk to virtual machines that are in community mode or isolated mode.

Table 5 Private VLAN modes

ARP and Neighbor Discovery Spoofing Protection

The Hyper-V virtual switch helps provide protection against a malicious virtual machine stealing IP addresses from other virtual machines through ARP spoofing (also known as ARP poisoning in IPv4). This type of man-in-the-middle attack is known as Neighbor Discovery spoofing. A malicious virtual machine sends a fake ARP message, which associates its own MAC address to an IP address that it does not own.

Unsuspecting virtual machines send network traffic that is targeted to that IP address to the MAC address of the malicious virtual machine, instead of to the intended destination. For IPv6, Windows Server helps provide equivalent protection for Neighbor Discovery spoofing. This is a mandatory scenario for hosting companies to consider when the virtual machine is not under control of the fabric or cloud administrators.

Router Guard

In Windows Server, the Hyper-V virtual switch helps protect against router advertisement and redirection messages that come from an unauthorized virtual machine pretending to be a router. In this situation, a malicious virtual machine attempts to be a router for other virtual machines. If a virtual machine accepts the network routing path, the malicious virtual machine can perform man-in-the-middle attacks, for example, steal passwords from SSL connections.

DHCP Guard

In a DHCP environment, a rogue DHCP server could intercept client DHCP requests and provide incorrect address information. The rogue DHCP server could cause traffic to be routed to a malicious intermediary that inspects all traffic before forwarding it to the legitimate destination.

To protect against this man-in-the-middle attack, the Hyper-V administrator can designate which Hyper-V virtual switches can have DHCP servers connected to them. DHCP server traffic from other Hyper-V virtual switches is automatically dropped. The Hyper-V virtual switch now helps protect against a rogue DHCP server that is attempting to provide IP addresses that would cause traffic to be rerouted.

Port ACLs

Port ACLs provide a mechanism for isolating networks and metering network traffic for a virtual port on the Hyper-V virtual switch. By using port ACLs, you can meter the IP addresses or MAC addresses that can (or cannot) communicate with a virtual machine. For example, you can use port ACLs to enforce isolation of a virtual machine by letting it talk only to the Internet, or communicate only with a predefined set of addresses.

By using the metering capability, you can measure network traffic that is going to or from a specific IP address or MAC address, which lets you report on traffic that is sent or received from the Internet or from network storage arrays.

You also can configure multiple port ACLs for a virtual port. Each port ACL consists of a source or destination network address, and permit, deny or meter action. The metering capability also supplies information about the number of instances that traffic was attempted to or from a virtual machine from a restricted (deny) address.

Virtual Switch Extended Port ACLs

In Windows Server, extended port ACLs can be configured on the Hyper-V virtual switch to allow and block network traffic to and from the virtual machines that are connected to a virtual switch through virtual network adapters. By default, Hyper-V allows virtual machines to communicate with each other when they are connected to the same virtual switch and the network traffic between virtual machines doesn't leave the physical machine. In these cases, network traffic configurations on the physical network cannot manage traffic between virtual machines.

Service providers can greatly benefit from extended port ACLs because they can be used to enforce security policies between resources in the fabric infrastructure. The use of extended port ACLs is useful in multitenant environments, such as those provided by service providers. Tenants can also enforce security policies through extended port ACLs to isolate their resources.

In addition, virtualization has increased the number of security features. For example, port ACLs are required on physical top-of-rack switches due to the many-to-one nature of virtual machine connectivity. Extended port ACLs could potentially decrease the number of security policies that are required for the large number of servers in a service provider or large enterprise IaaS fabric infrastructure.

Network Virtualization

Isolating the virtual machines for different departments or customers can be a challenge on a shared network. When entire networks of virtual machines must be isolated, the challenge becomes even greater. Traditionally, VLANs have been used to isolate networks, but VLANs are very complex to manage on a large scale. The primary drawbacks of VLANs include:

  • Cumbersome reconfiguration of production switches is required whenever virtual machines or isolation boundaries must be moved. Moreover, frequent reconfigurations of the physical network for the purposes of adding or modifying VLANs increases the risk of an outage.
  • VLANs have limited scalability because typical switches support no more than 1,000 VLAN IDs (while the specification supports a maximum of 4,095).
  • VLANs cannot span multiple Ethernet subnets, which limits the number of nodes in a single VLAN and restricts the placement of virtual machines based on physical location.

Windows Server Hyper-V Network Virtualization (HNV) enables you to isolate network traffic from different business units or customers on a shared infrastructure, without having to use VLANs. HNV also lets you move virtual machines as needed within your infrastructure while preserving their virtual network assignments. You can even use network virtualization to transparently integrate these private networks into a preexisting infrastructure on another site.

HNV extends the concept of server virtualization to permit multiple virtual networks, potentially with overlapping IP addresses, to be deployed on the same physical network. By using Hyper-V Network Virtualization, you can set policies that isolate traffic in a dedicated virtual network, independently of the physical infrastructure.

To virtualize the network, HNV uses the following elements:

  • Network Virtualization using Generic Routing Encapsulation (NVGRE)
  • Policy management server (Virtual Machine Manager)
  • Network Virtualization Gateway servers

The potential benefits of network virtualization include:

  • Tenant network migration to the cloud with minimum reconfiguration or effect on isolation. Customers can keep their internal IP addresses while they move workloads onto shared clouds for IaaS, thus minimizing the configuration changes that are needed for IP addresses, DNS names, security policies, and virtual machine configurations. In software-defined, policy-based data center networks, network traffic isolation does not depend on VLANs, but it is enforced within Hyper-V hosts, and based on multitenant isolation policies. Network administrators can use VLANs to manage traffic in the physical infrastructure if the topology is primarily static.
  • Tenant virtual machine deployment anywhere in the data center. Services and workloads can be placed or migrated to any server in the data center while keeping their IP addresses, without being limited to a physical IP subnet hierarchy or VLAN configurations.
  • Simplified network design and improved server and network resource use. The rigidity of VLANs and the dependency of virtual machine placement on a physical network infrastructure results in overprovisioning and underuse. By breaking this dependency, a virtual network increases the flexibility of virtual machine workload placement, thus simplifying network management and improving the use of servers and network resources. Placement of server workloads is simplified because migration and placement of workloads are independent of the underlying physical network configurations. Server administrators can focus on managing services and servers, while network administrators can focus on overall network infrastructure and traffic management.
  • Works with current hardware (servers, switches, appliances) to promote performance; however, network adapters that use NVGRE are recommended. Hyper-V Network Virtualization can be deployed in current data centers, and yet it is compatible with emerging data center "flat network" technologies such as Transparent Interconnection of Lots of Links (TRILL), which is an IETF-standard architecture that is intended to expand Ethernet topologies.
  • Full management through Windows PowerShell and WMI. Although a policy management server such as System Center Virtual Machine Manager is highly recommended, it is possible to use Windows PowerShell to easily script and automate administrative tasks. Windows Server includes Windows PowerShell cmdlets for network virtualization that let you build command-line tools or automated scripts to configure, monitor, and troubleshoot network isolation policies.

HNV in Windows Server is implemented as part of the Hyper-V virtual switch (whereas, in Windows Server 2012 it was implemented as part of the NDIS filter driver). This allows forwarding extensions for Hyper-V virtual switch to co-exist with network virtualization configurations.

HNV also supports dynamic IP address learning, which allows network virtualization to learn about manually assigned or DHCP addresses to set on the virtual network. In environments that use System Center Virtual Machine Manager, when a host learns a new IP address, it will notify Virtual Machine Manager, which adds it to the centralized policy. This allows rapid dissemination and reduces the overhead that is associated with distributing the network virtualization routing policy. In Windows Server, HNV is also supported in configurations that also use NIC Teaming.

Windows Server provides HNV Gateway server services to support site-to-site virtual private networks, Network Address Translation (NAT), and forwarding between physical locations to support multitenant hosting solutions that leverage network virtualization. This allows service providers and organizations that use HNV to support end-to-end communication from the corporate network or the Internet to the data center running HNV.

Without such gateway devices, virtual machines in a network are completely isolated from the outside, and they cannot communicate with non-network-virtualized systems such as other systems in the corporate network or on the Internet. In Windows Server and Windows Server 2012, HNV Gateway can encapsulate and decapsulate NVGRE packets, based on the centralized network virtualization policy. It can perform the gateway-specific functionality on the resulting native CA packets, such as IP forwarding and routing, NAT, or site-to-site tunneling.

Compute Architecture

Server Architecture

The host server architecture is a critical component of the virtualized infrastructure, and a key variable in the consolidation ratio and cost analysis. The ability of the host server to handle the workload of a large number of consolidation candidates increases the consolidation ratio and helps provide the desired cost benefit.

The system architecture of the host server refers to the general category of the server hardware. Examples include rack mounted servers, blade servers, and large symmetric multiprocessor servers (SMP). The primary tenet to consider when selecting system architectures is that each server running Hyper-V will contain multiple guests with multiple workloads. Processor, RAM, storage, and network capacity are critical, as are high I/O capacity and low latency. The host server must be able to provide the required capacity in each of these categories.

Note The Windows Server Catalog is useful for assisting customers in selecting appropriate hardware. It contains information about all servers, and storage and other hardware devices that are certified for Windows Server and Hyper-V. The logo program and support policy for failover-cluster solutions changed in Windows Server 2012, and cluster solutions are not listed in the Windows Server Catalog. All individual components that make up a cluster configuration must earn the appropriate "Certified for" or "Supported on" Windows Server designations. These designations are listed in their device-specific category in the Windows Server Catalog.

Server and Blade Network Connectivity

Use multiple network adapters or multiport network adapters on each host server. For converged designs, network technologies that provide teaming or virtual network adapters can be utilized, provided that two or more physical adapters can be teamed for redundancy, and multiple virtual network adapters or VLANs can be presented to the hosts for traffic segmentation and bandwidth control.

Microsoft Multipath I/O

Multipath I/O (MPIO) architecture supports iSCSI, Fibre Channel, and SAS SAN connectivity by establishing multiple sessions or connections to the storage array.

Multipath solutions use redundant physical path components—adapters, cables, and switches—to create logical paths between the server and the storage device. If one or more of these components should fail (causing the path to fail), multipath logic uses an alternate path for I/O, so that applications can still access their data. Each network adapter (in the iSCSI case) or HBA should be connected by using redundant switch infrastructures, to provide continued access to storage in the event of a failure in a storage fabric component.

Failover times vary by storage vendor, and they can be configured by using timers in the Microsoft iSCSI Initiator driver or by modifying the parameter settings of the Fibre Channel host bus adapter driver.

In all cases, storage multipath solutions should be used. Generally, storage vendors will build a device-specific module on top of the MPIO software in Windows Server or Windows Server 2012. Each device-specific module and HBA will have its own unique multipath options and recommended number of connections.

Consistent Device Naming (CDN)

Windows Server supports consistent device naming (CDN), which provides the ability for hardware manufacturers to identify descriptive names of onboard network adapters within the BIOS. Windows Server assigns these descriptive names to each interface, providing users with the ability to match chassis printed interface names with the network interfaces that are created within Windows. The specification for this change is outlined in the Slot Naming PCI-SIG Engineering Change Request.

Failover Clustering

Cluster-Aware Updating (CAU)

Cluster-Aware Updating (CAU) reduces server downtime and user disruption by allowing IT administrators to update clustered servers with little or no loss in availability when updates are performed on cluster nodes. CAU transparently takes one node of the cluster offline, installs the updates, performs a restart (if necessary), brings the node back online, and moves on to the next node. This feature is integrated into the existing Windows Update management infrastructure, and it can be further extended and automated with Windows PowerShell for integrating into larger IT automation initiatives.

CAU facilitates the cluster updating operation while running from a computer running Windows Server or Windows 8.1. The computer running the CAU process is called an orchestrator (not to be confused with System Center Orchestrator). CAU supports two modes of operation: remote-updating mode or self-updating mode. In remote-updating mode, a computer that is remote from the cluster that is being updated acts as an orchestrator. In self-updating mode, one of the cluster nodes that is being updated acts as an orchestrator, and it is capable of self-updating the cluster on a user-defined schedule.

When using Cluster-Aware Updating, the end-to-end cluster update process by way of the CAU is cluster-aware, and it is completely automated. It integrates seamlessly with an existing Windows Update Agent (WUA) and Microsoft Windows Server Update Services (WSUS) infrastructure.

CAU also includes an extensible architecture that supports new plug-in development to orchestrate any node-updating tools, such as custom software installers, BIOS updating tools, and network adapter and HBA firmware updating tools. After they have been integrated with CAU, these tools can work across all cluster nodes in a cluster-aware manner.

Cluster Shared Volume (CSV)

The Cluster Shared Volumes (CSV) feature was introduced in Windows Server 2008 R2 as a more efficient way for administrators to deploy storage for cluster-enabled virtual machines on Hyper-V and other server roles, such as Scale-Out File Server and SQL Server.

Before CSV, administrators had to provision a LUN on shared storage for each virtual machine, so that each machine had exclusive access to its virtual hard disks, and mutual Write conditions could be avoided. By using CSV, all cluster hosts have simultaneous access to a single or multiple shared volumes where storage for multiple virtual machines can be hosted. Thus, there is no need to provision a new LUN whenever you created a new virtual machine.

Windows Server provides the following CSV capabilities:

  • Flexible application and file storage: Cluster Shared Volumes extends its potential benefits beyond Hyper-V to support other application workloads and flexible file storage solutions. CSV 2.0 provides capabilities to clusters through shared namespaces to share configurations across all cluster nodes, including the ability to build continuously available cluster-wide file systems. Application storage can be served from the same shared resource as data, eliminating the need to deploy two clusters (an application and a separate storage cluster) to support high availability application scenarios.
  • Integration with other features in Windows Server: Allows for inexpensive scalability, reliability, and management simplicity through tight integration with Storage Spaces. You gain high performance and resiliency capabilities with SMB Direct and SMB Multichannel, and create more efficient storage with thin provisioning. In addition, Windows Server supports ReFS, data deduplication, parity, tiered Storage Spaces, and Write-back cache in Storage Spaces.
  • Single namespace: Provides a single consistent file namespace where files have the same name and path when viewed from any node in the cluster. CSV are exposed as directories and subdirectories under the ClusterStorage root directory.
  • Improved backup and restore: Supports several backup and restore capabilities, including support for the full feature set of VSS and support for hardware and software backup of CSV. CSV also offers a distributed backup infrastructure for software snapshots. The Software Snapshot Provider coordinates to create a CSV 2.0 snapshot, point-in-time semantics at a cluster level, and the ability to perform remote snapshots.
  • Optimized placement policies: CSV ownership is evenly distributed across the failover cluster nodes, based on the number of CSV that each node owns. Ownership is automatically rebalanced during conditions such as restart, failover, and the addition of cluster nodes.
  • Increased resiliency: The SMB protocol is comprised of multiple instances per failover cluster node: a default instance that handles incoming traffic from SMB clients that access regular file shares, and a second CSV instance that handles only internode CSV metadata access and redirected I/O traffic between nodes. This improves the scalability of internode SMB traffic between CSV nodes.
  • CSV Cache: Allows system memory (RAM) as Write-back cache. The CSV Cache provides caching of Read-only, unbuffered I/O, which can improve performance for applications that use unbuffered I/O when accessing files (for example, Hyper-V). CSV Cache delivers caching at the block level, which enables it to perform caching of pieces of data being accessed within the VHD file. CSV Cache reserves its cache from system memory and handles orchestration across the sets of nodes in the cluster. In Windows Server, CSV Cache is enabled by default and you can allocate up 80% of the total physical RAM for CSV Write-back cache.

There are several CSV deployment models, which are outlined in the following sections.

Single CSV per Cluster

In the "single CSV per cluster" design pattern, the SAN is configured to present a single large LUN to all the nodes in the host cluster. The LUN is configured as a CSV in failover clustering. All files that belong to the virtual machines that are hosted on the cluster are stored on the CSV. Optionally, data deduplication functionality that is provided by the SAN can be utilized (if it is supported by the SAN vendor).

Figure 17 Virtual machine on a single large CSV

Multiple CSV per Cluster

In the "multiple CSV per cluster" design pattern, the SAN is configured to present two or more large LUNs to all the nodes in the host cluster. The LUNs are configured as CSV in failover clustering. All virtual machine–related files are hosted on the cluster are stored on the CSV.

In addition, data deduplication functionality that the SAN provides can be utilized (if supported by the SAN vendor).

Figure 18 Virtual machines on multiple CSV with minimal segregation

For the single and multiple CSV patterns, each CSV has the same I/O characteristics, so that each individual virtual machine has all of its associated virtual hard disks (VHDs) stored on one CSV.

Figure 19 The virtual disks of each virtual machine reside on the same CSV

Multiple I/O Optimized CSV per Cluster

In the "multiple I/O optimized CSV per cluster" design pattern, the SAN is configured to present multiple LUNs to all the nodes in the host cluster; however, the LUNs are optimized for particular I/O patterns like fast sequential Read performance, or fast random Write performance. The LUNs are configured as CSV in failover clustering. All VHDs that belong to the virtual machines that are hosted on the cluster are stored on the CSV, but they are targeted to appropriate CSV for the given I/O needs.

Figure 20 Virtual machines with a high degree of virtual disk segregation

In the "multiple I/O optimized CSV per cluster" design pattern, each individual virtual machine has all of its associated VHDs stored on the appropriate CSV, per required I/O requirements.

Figure 21 Virtual machines with a high degree of virtual disk segregation

Note A single virtual machine can have multiple VHDs, and each VHD can be stored on different CSV (provided that all CSV are available to the host cluster on which the virtual machine is created).

BitLocker-Encrypted Cluster Volumes

Hyper-V, failover clustering, and BitLocker work together to create an ideal, highly secure platform for a private cloud infrastructure. Cluster disks that are encrypted with BitLocker Drive Encryption in Windows Server enable better physical security for deployments outside secure data centers (if there is a critical safeguard for private cloud infrastructure) and help protect against data leaks.

Hyper-V Failover Clustering

A Hyper-V host failover cluster is a group of independent servers that work together to increase the availability of applications and services. The clustered servers are connected by physical cables and software. If one of the cluster nodes fails, another node begins to provide service—a process that is known as failover. In the case of a planned live migration, users will experience no perceptible service interruption.

The host servers are one critical component of a dynamic, virtual infrastructure. Consolidation of multiple workloads on the host servers requires that those servers have high availability. Windows Server provides advances in failover clustering that enable high availability and live migration of virtual machines between physical nodes.

Host Failover-Cluster Topology

We recommend that the server topology consist of at least two Hyper-V host clusters. The first needs at least two nodes, and it is referred to as the fabric management cluster. The second, plus any additional clusters, is referred to as fabric host clusters.

In scenarios of smaller scale or specialized solutions, the management and fabric clusters can be consolidated in the fabric host cluster. Special care has to be taken to provide resource availability for the virtual machines that host the various parts of the management stack.

Each host cluster can contain up to 64 nodes. Host clusters require some form of shared storage such as a Storage Spaces, Scale-Out File Server cluster, Fibre Channel SAN, or iSCSI SAN.

Cluster Quorum and Witness Configurations

In quorum configurations, every cluster node has one vote, and a witness (disk or file share) also has one vote. A witness is recommended when the number of voting nodes is even, but it is not required when the number of voting nodes is odd. We always recommend that you keep an odd total number of votes in a cluster. Therefore, a cluster witness should be configured to support cluster configurations in Hyper-V when the number of failover cluster nodes is even.

Choices for a cluster witness include a shared disk witness and a file-share witness. There are distinct differences between these two models.

  • Disk witness Consists of a dedicated LUN to serve as the quorum disk that is used as an arbitration point. A disk witness stores a copy of cluster database for all nodes to share. We recommend that this disk consist of a small partition that is at least 512 MB in size; however, it is commonly recommended to reserve a 1 GB disk for each cluster. This LUN can be NTFS- or ReFS-formatted, and it does not require the assignment of a drive letter.
  • File-share witness Configurations use a simple, unique file share that is located on a file server to support one or more clusters. This file share must have Write permissions for the cluster name object (CNO), along with all of the nodes. We highly recommend that this file share exist outside any of the cluster nodes, and therefore, carry the requirement of additional physical or virtual servers outside the Hyper-V compute cluster within the fabric. Writing to this file share results in minimal network traffic, because all nodes contain separate copies of the cluster database, and only cluster membership changes are written to the file share.

The additional challenge that this creates is that file-share witness configurations are susceptible to "split" or "partition in time" scenarios, and it could create situations in which surviving nodes and starting nodes have different copies of the cluster database. File-share witness disks should be used only in configurations in which no shared disk infrastructure exists.

Additional quorum and witness capabilities in Windows Server include:

  • Dynamic witness: By default Windows Server clusters are configured to use dynamic quorum, which allows the witness vote to be dynamically adjusted and reduces the risk that the cluster will be impacted by witness failure. By using this configuration, the cluster decides whether to use the witness vote based on the number of voting nodes that are available in the cluster, which simplifies the witness configuration. In addition, a Windows Server cluster can dynamically adjust a running node's vote to keep the total number of votes at an odd number, which allows the cluster to continue to run in the event of a 50% node split where neither side would normally have quorum.
  • Force quorum resiliency: This change allows for the ability to force quorum in the event of a partitioned cluster. A partitioned cluster occurs when a cluster breaks into subnets that are not aware of each other and the service is restarted by forcing quorum.

Host Cluster Networks

A variety of host cluster networks are required for a Hyper-V failover cluster. The network requirements help enable high availability and high performance. The specific requirements and recommendations for network configuration are published in the TechNet Library in the Hyper-V: Live Migration Network Configuration Guide.

Note the following list provides some examples, and it does not contain all network access types. For instance, some implementations would include a dedicated backup network.

Network Access Type

Purpose of the network-access type

Network-traffic requirements

Recommended network access

Storage

Access storage through SMB, iSCSI, or Fibre Channel. (Fibre Channel does not need a network adapter.)

High bandwidth and low latency.

Usually dedicated and private access. Refer to your storage vendor for guidelines.

Virtual machine access

Workloads that run on virtual machines usually require external network connectivity to service client requests.

Varies.

Public access, which could be teamed for link aggregation or to fail over the cluster.

Management

Managing the Hyper-V management operating system. This network is used by Hyper-V Manager or System Center Virtual Machine Manager (VMM).

Low bandwidth.

Public access, which could be teamed to fail over the cluster.

Cluster and Cluster Shared Volumes (CSV)

Preferred network that is used by the cluster for communications to maintain cluster health. Also used by CSV to send data between owner and non-owner nodes.

If storage access is interrupted, this network is used to access the CSV or to maintain and back up the CSV.

The cluster should have access to more than one network for communication to help make sure that it is a high availability cluster.

Usually low bandwidth and low latency. Occasionally high bandwidth.

Private access.

Live migration

Transfer virtual machine memory and state.

High bandwidth and low latency during migrations.

Private access.

Table 6 Network access types

Management Network

A dedicated management network is required so that hosts can be managed through a dedicated network to prevent competition with guest traffic needs. A dedicated network provides a degree of separation for the purposes of security and ease-of-management. A dedicated management network typically implies dedicating one network adapter for each host and port per network device to the management network.

Additionally, many server manufacturers provide a separate out-of-band (OOB) management capability that enables remote management of server hardware outside the host operating system.

iSCSI Network

If you are using iSCSI, a dedicated iSCSI network is required so that storage traffic is not in contention with any other traffic. This typically implies dedicating two network adapters for each host, and two ports per network device to the management network.

CSV/Cluster Communication Network

Usually, when the cluster node that owns a VHD file in a CSV performs disk I/O, the node communicates directly with the storage. However, storage connectivity failures sometimes prevent a given node from communicating directly with the storage. To maintain functionality until the failure is corrected, the node redirects the disk I/O through a cluster network (the preferred network for CSV) to the node where the disk is currently mounted. This is called CSV redirected I/O mode.

Live-Migration Network

During live migration, the contents of the memory of the virtual machine that is running on the source node must be transferred to the destination node over a LAN connection. To enable high-speed transfer, a dedicated live-migration network is required.

Virtual Machine Network(s)

The virtual machine networks are dedicated to virtual machine LAN traffic. A virtual machine network can be two or more 1 GbE networks, one or more networks that have been created through NIC Teaming, or virtual networks that have been created from shared 10 GbE network adapters.

Hyper-V Application Monitoring

With Windows Server, Hyper-V and failover clustering work together to bring higher availability to workloads that do not support clustering. They do so by providing a lightweight, simple solution to monitor applications that are running on virtual machines and by integrating with the host. By monitoring services and event logs inside the virtual machine, Hyper-V and failover clustering can detect if the key services that a virtual machine provides are healthy. If necessary, they provide automatic corrective action such as restarting the virtual machine or restarting a service within the virtual machine.

Virtual Machine Failover Prioritization

Virtual machine priorities can be configured to control the order in which specific virtual machines fail over or start. This helps make sure that high-priority virtual machines get the resources that they need and that lower-priority virtual machines are given resources as they become available.

Virtual Machine Anti-Affinity

Administrators can specify that two specific virtual machines cannot coexist on the same node in a failover scenario. By leveraging anti-affinity workloads, resiliency guidelines can be respected when they are hosted on a single failover cluster.

Virtual Machine Drain on Shutdown

Windows Server supports shutting down a failover cluster node in Hyper-V without first putting the node into maintenance mode to drain any running clustered roles. The cluster automatically migrates all running virtual machines to another host before the computer shuts down.

Shared Virtual Hard Disk

Hyper-V in Windows Server includes support for virtual machines to leverage shared VHDX files for shared storage scenarios such as guest clustering. Shared and non-shared virtual hard disk files that are attached as virtual SCSI disks appear as virtual SAS disks when you add a SCSI hard disk to a virtual machine.

Hyper-V Virtualization Architecture

Hyper-V Features

Hyper-V Host and Guest Scale-Up

Hyper-V in Windows Server supports running on a host system that has up to 320 logical processors on hardware and 4 terabytes (TB) of physical memory. This helps encourage compatibility with the largest scale-up server systems.

Hyper-V in Windows Server lets you configure a virtual machine with up to 64 virtual processors and up to 1 TB of memory, to support very large workload scenarios.

Hyper-V in Windows Server also supports running up to 8,000 virtual machines on a 64-node failover cluster. This is a significant improvement over Hyper-V in Windows Server 2008 R2, which supported a maximum of 16 cluster nodes and 1,000 virtual machines per cluster.

Hyper-V over SMB 3.0

Prior to Windows Server 2012, remote storage options for Hyper-V were limited to expensive Fibre Channel SAN or iSCSI SAN solutions that were difficult to provision for Hyper-V guests, or more inexpensive options that did not offer many features. By enabling Hyper-V to use SMB file shares for VHDs, administrators have an option that is simple to provision with support for CSV, inexpensive to deploy, and offers performance capabilities and features that rival those available with Fibre Channel SANs. As outlined earlier in the Storage Architecture section, SMB 3.0 can be leveraged for SQL Server and Hyper-V workloads. Hyper-V over SMB requires:

  • One or more computers running Windows Server with the Hyper-V and File and Storage Services roles installed.
  • A common Active Directory infrastructure. (The servers that are running AD DS do not need to be running Windows Server.)
  • Failover clustering on the Hyper-V side, on the File and Storage Services side, or both.

Hyper-V over SMB supports a variety of flexible configurations that offer several levels of capabilities and availability, which include single-node, dual-node, and multiple-node file server modes.

Virtual Machine Mobility

Hyper-V live migration makes it possible to move running virtual machines from one physical host to another with no effect on the availability of virtual machines to users. Hyper-V in Windows Server introduces faster and simultaneous live migration inside or outside a clustered environment.

In addition to providing live migration in the most basic of deployments, this functionality facilitates more advanced scenarios, such as performing a live migration to a virtual machine between multiple, separate clusters to balance loads across an entire data center.

If you use live migration, you will see that live migrations can now use higher network bandwidths (up to 10 gigabits) to complete migrations faster. You can also perform multiple simultaneous live migrations to move many virtual machines quickly.

Hyper-V Live Migration

Hyper-V in Windows Server lets you perform live migrations outside a failover cluster. The two scenarios for this are:

  • Shared storage-based live migration. In this instance, the VHD of each virtual machine is stored on a local CSV or on a central SMB file share, and live migration occurs over TCP/IP or the SMB transport. You perform a live migration of the virtual machines from one server to another while their storage remains on the central local CSV or SMB file share.
  • "Shared-nothing" live migration. In this case, the live migration of a virtual machine from one non-clustered Hyper-V host to another begins when the hard drive storage of the virtual machine is mirrored to the destination server over the network. Then you perform the live migration of the virtual machine to the destination server while it continues to run and provide network services.

In Windows Server, improvements have been made to live migration, including:

  • Live migration compression. To reduce the total time of live migration on a system that is network constrained, Windows Server provides performance improvements by compressing virtual machine memory data before it is sent across the network. This approach utilizes available CPU resources on the host to reduce the network load.
  • SMB Multichannel. Systems that have multiple network connections between them can utilize multiple network adapters simultaneously to support live migration operations and achieve higher total throughput
  • SMB Direct-based live migration. Live migrations that use SMB over RDMA-capable adapters use SMB Direct to transfer virtual machine data, which provides higher bandwidth, multichannel support, hardware encryption, and reduced CPU utilization during live migrations.
  • Cross-version live migration. Live migration from servers running Hyper-V in Windows Server 2102 to Hyper-V in Windows Server are supported to provide zero downtime. However, this only applies when you are migrating virtual machines from Hyper-V in Windows Server 2102 to Hyper-V in Windows Server, because down-level live migrations (moving from Hyper-V in Windows Server to a previous version of Hyper-V) are not supported.

Storage Migration

Windows Server supports live storage migration, which lets you move VHDs that are attached to a virtual machine that is running. When you have the flexibility to manage storage without affecting the availability of your virtual machine workloads, you can perform maintenance on storage subsystems, upgrade storage appliance firmware and software, and balance loads while the virtual machine is in use.

Windows Server provides the flexibility to move VHDs on shared and non-shared storage subsystems if an SMB network shared folder on Windows Server is visible to both Hyper-V hosts.

Hyper-V Virtual Switch

The Hyper-V virtual switch in Windows Server is a Layer 2 virtual switch that provides programmatically managed and extensible capabilities to connect virtual machines to the physical network. The Hyper-V virtual switch is an open platform that lets multiple vendors provide extensions that are written to standard Windows API frameworks.

The reliability of extensions is strengthened through the Windows standard framework, and the required non-Microsoft code for functions is reduced. It is backed by the Windows Hardware Quality Labs certification program. You can manage the Hyper-V virtual switch and its extensions by using Windows PowerShell, or programmatically by using WMI or the Hyper-V Manager UI.

The Hyper-V virtual switch architecture in Windows Server is an open framework that lets non-Microsoft vendors add new functionality such as monitoring, forwarding, and filtering into the virtual switch. Extensions are implemented by using Network Device Interface Specification (NDIS) filter drivers and Windows Filtering Platform (WFP) callout drivers. These public Windows platforms for extending Windows networking functionality are used as follows:

  • NDIS filter drivers: Used to monitor or modify network packets in Windows. NDIS filters were introduced with the NDIS 6.0 specification.
  • WFP callout drivers: Introduced in Windows Vista and Windows Server 2008, these drivers let independent software vendors (ISVs) create functionality to filter and modify TCP/IP packets, monitor or authorize connections, filter IPsec-protected traffic, and filter remote procedure calls (RPCs). Filtering and modifying TCP/IP packets provides unprecedented access to the TCP/IP packet processing path. In this path, you can examine or modify outgoing and incoming packets before additional processing occurs. By accessing the TCP/IP processing path at different layers, you can more easily create firewalls, antivirus software, diagnostic software, and other types of applications and services.

The Hyper-V virtual switch is a module that runs in the root partition of Windows Server. The switch module can create multiple virtual switch extensions per host. All virtual switch policies—including QoS, VLAN, and ACLs—are configured per virtual network adapter.

Any policy that is configured on a virtual network adapter is preserved during a virtual machine state transition, such as a live migration. Each virtual switch port can connect to one virtual network adapter. In the case of an external virtual port, it can connect to a single physical network adapter on the Hyper-V server or to a team of physical network adapters.

The framework allows for non-Microsoft extensions to extend and affect the behavior of the Hyper-V virtual switch. The extensibility stack comprises an extension miniport driver and an extension protocol driver that are bound to the virtual switch. Switch extensions are lightweight filter drivers that bind between these drivers to form the extension stack.

There are three classes of extensions:

  • Capture: Sit on top of the stack and monitor switch traffic.
  • Filter: Sit in the middle of the stack and can monitor and modify switch traffic.
  • Forwarding: Sit on the bottom of the stack and replace the virtual switch forwarding behavior.

Table 9 lists the various types of Hyper-V virtual switch extensions.

Feature

Purpose

Examples

Extensibility Component

Network Packet Inspection

Inspect network packets, but does not change them

sFlow and network monitoring

NDIS filter driver

Network Packet Filter

Inject, modify, and drop network packets

Security

NDIS filter driver

Network Forwarding

Non-Microsoft forwarding that bypasses default forwarding

Cisco Nexus 1000V, OpenFlow, Virtual Ethernet Port Aggregator (VEPA), and proprietary network fabrics

NDIS filter driver

Firewall/Intrusion Detection

Filter and modify TCP/IP packets, monitor or authoriz connections, filter IPsec-protected traffic, and filter RPCs

Virtual firewall and connection monitoring

WFP callout driver

Table 7 Windows Server virtual switch extension types

Only one forwarding extension can be installed per virtual switch (overriding the default switching of the Hyper-V virtual switch), although multiple capture and filtering extensions can be installed. In addition, by monitoring extensions, you can gather statistical data by monitoring traffic at different layers of the switch. Multiple monitoring and filtering extensions can be supported at the ingress and egress portions of the Hyper-V virtual switch. Figure 21 shows the architecture of the Hyper-V virtual switch and the extensibility model.

Figure 22 Hyper-V extension layers

The Hyper-V virtual switch data path is bidirectional, which allows all extensions to see the traffic as it enters and exits the virtual switch. The NDIS send path is used as the ingress data path, while the NDIS receive path is used for egress traffic. Between ingress and egress, forwarding of traffic occurs by the Hyper-V virtual switch or by a forwarding extension.

Figure 22 outlines this interaction.

Figure 23 Hyper-V virtual switch bidirectional filter

Windows Server provides Windows PowerShell cmdlets for the Hyper-V virtual switch that lets you build command-line tools or automated scripts for setup, configuration, monitoring, and troubleshooting. Windows PowerShell also helps non-Microsoft vendors build tools to manage their Hyper-V virtual switch.

Monitoring/Port Mirroring

Many physical switches can monitor the traffic that flows from specific ports through virtual machines. The Hyper-V virtual switch also provides port mirroring, which can help you designate which virtual ports should be monitored and to which virtual port the monitored traffic should be directed for further processing.

For example, a security-monitoring virtual machine can look for anomalous patterns in the traffic that flows through other virtual machines on the switch. In addition, you can diagnose network connectivity issues by monitoring traffic that is bound for a particular virtual switch port.

Virtual Fibre Channel

Windows Server provides Fibre Channel ports within the Hyper-V guest operating system. This lets you connect to Fibre Channel directly from virtual machines when virtualized workloads have to connect to existing storage arrays. This protects your investments in Fibre Channel, lets you virtualize workloads that use direct access to Fibre Channel storage, lets you cluster guest operating systems over Fibre Channel, and offers an important new storage option for servers that are hosted in your virtualization infrastructure.

Fibre Channel in Hyper-V requires:

  • One or more installations of Windows Server with the Hyper-V role installed. Hyper-V requires a computer with processor support for hardware virtualization.
  • A computer that has one or more Fibre Channel host bus adapters (HBAs), each of which has an updated HBA driver that supports virtual Fibre Channel. Updated HBA drivers are included with the HBA drivers for some models.
  • Virtual machines that are configured to use a virtual Fibre Channel adapter, which must use Windows Server, Windows Server 2012, Windows Server 2008 R2, or Windows Server 2008 as the guest operating system.
  • Connection only to data logical unit numbers (LUNs). Storage that is accessed through a virtual Fibre Channel that is connected to a LUN cannot be used as boot media.

Virtual Fibre Channel for Hyper-V provides the guest operating system with unmediated access to a SAN by using a standard World Wide Port Name (WWPN) that is associated with a virtual machine HBA. Hyper-V users can use Fibre Channel SANs to virtualize workloads that require direct access to SAN LUNs. Virtual Fibre Channel also allows you to operate in advanced scenarios, such as running the Windows failover clustering feature inside the guest operating system of a virtual machine that is connected to shared Fibre Channel storage.

Midrange and high-end storage arrays are capable of advanced storage functionality that helps offload certain management tasks from the hosts to the SANs. Virtual Fibre Channel presents an alternate hardware-based I/O path to the virtual hard disk stack in Windows software. This allows you to use the advanced functionality that is offered by your SANs directly from virtual machines running Hyper-V. For example, you can use Hyper-V to offload storage functionality (such as taking a snapshot of a LUN) on the SAN hardware by using a hardware Volume Shadow Copy Service (VSS) provider from within a virtual machine running Hyper-V.

Virtual Fibre Channel for Hyper-V guest operating systems uses the existing N_Port ID Virtualization (NPIV) T11 standard to map multiple virtual N_Port IDs to a single physical Fibre Channel N_Port. A new NPIV port is created on the host each time a virtual machine that is configured with a virtual HBA is started. When the virtual machine stops running on the host, the NPIV port is removed.

VHDX

Hyper-V in Windows Server supports the updated VHD format (called VHDX) that has much larger capacity and built-in resiliency. The principal features of the VHDX format are:

  • Support for virtual hard disk storage capacity of up to 64 TB
  • Additional protection against data corruption during power failures by logging updates to the VHDX metadata structures
  • Improved alignment of the virtual hard disk format to work well on large sector physical disks

The VHDX format also has the following features:

  • Larger block sizes for dynamic and differential disks, which allows these disks to attune to the needs of the workload
  • Four-kilobyte (4 KB) logical sector virtual hard disk that allows for increased performance when it is used by applications and workloads that are designed for 4 KB sectors
  • Efficiency in representing data (called "trim"), which results in smaller files size and allows the underlying physical storage device to reclaim unused space. (Trim requires directly attached storage or SCSI disks and trim-compatible hardware.)
  • The virtual hard disk size can be increased or decreased through the user interface while the virtual hard disk is in use if they are attached to a SCSI controller.
  • Shared access by multiple virtual machines to support guest clustering scenarios.

Guest Non-Uniform Memory Access (NUMA)

Hyper-V in Windows Server supports non-uniform memory access (NUMA) in a virtual machine. NUMA refers to a computer memory design that is used in multiprocessor systems in which the memory access time depends on the memory location relative to the processor.

By using NUMA, a processor can access local memory (memory that is attached directly to the processor) faster than it can access remote memory (memory that is local to another processor in the system). Modern operating systems and high-performance applications (such as SQL Server) have developed optimizations to recognize the NUMA topology of the system, and they consider NUMA when they schedule threads or allocate memory to increase performance.

Projecting a virtual NUMA topology in a virtual machine provides optimal performance and workload scalability in large virtual machine configurations. It does so by letting the guest operating system and applications (such as SQL Server) utilize their inherent NUMA performance optimizations. The default virtual NUMA topology that is projected into a Hyper-V virtual machine is optimized to match the NUMA topology of the host.

Dynamic Memory

Dynamic Memory, which was introduced in Windows Server 2008 R2 SP1, helps you use physical memory more efficiently. By using Dynamic Memory, Hyper-V treats memory as a shared resource that can be automatically reallocated among running virtual machines. Dynamic Memory adjusts the amount of memory that is available to a virtual machine, based on changes in memory demand and on the values that you specify.

In Windows Server, Dynamic Memory has a new configuration item called "minimum memory." Minimum memory lets Hyper-V reclaim the unused memory from the virtual machines. This can result in increased virtual machine consolidation numbers, especially in VDI environments.

In Windows Server, Dynamic Memory for Hyper-V allows for the following:

  • Configuration of a lower minimum memory for virtual machines to provide an effective restart experience.
  • Increased maximum memory and decreased minimum memory on virtual machines that are running. You can also increase the memory buffer.

Windows Server 2012 introduced Hyper-V Smart Paging for robust restart of virtual machines. Although minimum memory increases virtual machine consolidation numbers, it also brings a challenge. If a virtual machine has a smaller amount of memory than its startup memory and it is restarted, Hyper-V needs additional memory to restart the virtual machine. Because of host memory pressure or the states of the virtual machines, Hyper-V might not always have additional memory available, which can cause sporadic virtual machine restart failures in Hyper-V environments. Hyper-V Smart Paging bridges the memory gap between minimum memory and startup memory to let virtual machines restart more reliably.

Hyper-V Replica

You can use failover clustering to create high availability virtual machines, but this does not protect your company from an entire data center outage if you do not use hardware-based SAN replication across your data centers. Hyper-V Replica fills an important need by providing an affordable failure recovery solution for an entire site, down to a single virtual machine.

Hyper-V Replica is closely integrated with Windows failover clustering, and it provides asynchronous, unlimited replication of your virtual machines over a network link from one Hyper-V host at a primary site to another Hyper-V host at a replica site, without relying on storage arrays or other software replication technologies.

Hyper-V Replica in Windows Server supports replication between a source and target server running Hyper-V, and it can support extending replication from the target server to a third server (known as extended replication). The servers can be physically co-located or geographically separated.

Hyper-V Replica tracks the Write operations on the primary virtual machine and replicates these changes to the replica server efficiently over a WAN. The network connection between servers uses the HTTP or HTTPS protocol.

Hyper-V Replica supports integrated and certificate-based authentication. Connections that are configured to use integrated authentication are not encrypted. For an encrypted connection, use certificate-based authentication. Hyper-V Replica provides support for replication frequencies of 15 minutes, 5 minutes, or 30 seconds.

In Windows Server, you can access recovery points up to 24-hours old. (Windows Server 2012 allowed access to recovery points up to 15-hours old.)

Resource Metering

Hyper-V in Windows Server supports resource metering, which is a technology that helps you track historical data on the use of virtual machines and gain insight into the resource use of specific servers. In Windows Server, resource metering provides accountability for CPU and network usage, memory, and storage IOPS.

You can use this data to perform capacity planning, monitor consumption by business units or customers, or capture data that is necessary to help redistribute the costs of running a workload. You could also use the information that this feature provides to help build a billing solution, so that you can charge customers of your hosting services appropriately for their usage of resources.

Enhanced Session Mode

Enhanced session mode allows redirection of devices, clipboard, and printers from clients that are using the Virtual Machine Connection tool. The enhanced session mode connection uses a Remote Desktop Connection session through the virtual machine bus (VMBus), which does not require an active network connection to the virtual machine as would be required for a traditional Remote Desktop Protocol connection.

Enhanced session mode connections provide additional capabilities beyond simple mouse, keyboard, and monitor redirection. It supports capabilities that are typically associated with Remote Desktop Protocol sessions, such as display configuration, audio devices, printers, clipboards, smart card, USB devices, drives, and supported Plug and Play devices.

Enhanced session mode requires a supported guest operating system such as Windows Server or Windows 8.1, and it may require additional configuration inside the virtual machine. By default, enhanced session mode is disabled in Windows Server, but it can be enabled through the Hyper-V server settings.

Hyper-V Guest Virtual Machine Design

Standardization is a key tenet of private cloud architectures and virtual machines. A standardized collection of virtual machine templates can drive predictable performance and greatly improve capacity planning capabilities. As an example, the following table illustrates the composition of a basic virtual machine template library. Note that this should not replace proper template planning by using System Center Virtual Machine Manager.

Template

Specifications

Network

Operating system

Unit cost

Template 1—Small

2 vCPU, 4 GB memory, 50 GB disk

VLAN 20

Windows Server 2008 R2

1

Template 2—Medium

8 vCPU, 16 GB memory, 100 GB disk

VLAN 20

Windows Server

2

Template 3—X-Large

24 vCPU, 64 GB memory, 200 GB disk

VLAN 20

Windows Server

4

Table 8 Example template specification

Virtual Machine Storage

Dynamically Expanding Virtual Hard Disks

Dynamically expanding VHDs provide storage capacity as needed to store data. The size of the VHD file is small when the disk is created, and it grows as data is added to the disk. The size of the VHD file does not shrink automatically when data is deleted from the virtual hard disk; however, you can use the Edit Virtual Hard Disk Wizard to make the disk more compact and decrease the file size after data is deleted.

Fixed-Size Disks

Fixed-size VHDs provide storage capacity by using a VHDX file that is in the size that is specified for the virtual hard disk when the disk is created. The size of the VHDX file remains fixed, regardless of the amount of data that is stored. However, you can use the Edit Virtual Hard Disk Wizard to increase the size of the VHDX, which in turn increases the size of the VHDX file. By allocating the full capacity at the time of creation, fragmentation at the host level is not an issue. (Fragmentation inside the VHDX itself must be managed within the guest.)

Differencing Virtual Hard Disks

Differencing VHDs provide storage to help you make changes to a parent VHD without changing the disk. The size of the VHD file for a differencing disk grows as changes are stored to the disk.

Pass-Through Disks

Pass-through disks allow Hyper-V virtual machine guests to directly access local disks or SAN LUNs that are attached to the physical server, without requiring the volume to be presented to the host server.

The virtual machine guest accesses the disk directly (by using the GUID of the disk) without having to utilize the file system of the host. Given that the performance difference between fixed disk and pass-through disks is now negligible, the decision is based on manageability. For instance, a disk that is formatted with VHDX is hardly portable if the data on the volume will be very large (hundreds of gigabytes), given the extreme amounts of time it takes to copy. For a backup scheme with pass-through disks, the data can be backed up only from within the guest.

When you are utilizing pass-through disks, no VHDX file is created because the LUN is used directly by the guest. Because there is no VHDX file, and there is no dynamic sizing or snapshot capability.

Pass-through disks are not subject to migration with the virtual machine, and they are only applicable to standalone Hyper-V configurations. Therefore, they are not recommended for common, large scale Hyper-V configurations. In many cases, virtual Fibre Channel capabilities supersede the need to leverage pass-through disks when you are working with Fibre Channel SANs.

In-guest iSCSI Initiator

Hyper-V can also utilize iSCSI storage by directly connecting to iSCSI LUNs that are utilizing the virtual network adapters of the guest. This is mainly used for access to large volumes on SANs to which the Hyper-V host is not connected, or for guest clustering. Guests cannot boot from iSCSI LUNs that are accessed through the virtual network adapters without utilizing an iSCSI initiator.

Storage Quality of Service

The storage Quality of Service (QoS) in Windows Server provides the ability to specify a maximum I/O operations per second (IOPS) value for a virtual hard disk in Hyper-V. This protects tenants in a multitenant Hyper-V infrastructure from consuming excessive storage resources, which can impact other tenants (often referred to as a "noisy neighbor").

Maximum and minimum values are specified in terms of normalized IOPS where every 8 K of data is counted as an I/O. Administrators can configure storage QoS for a virtual hard disk that is attached to a virtual machine. Configurations include setting maximum and minimum values for the IOPS for the virtual hard disk. Maximum values are enforced and minimum values generate administrative notifications.

Virtual Machine Networking

Hyper-V guests support two types of virtual switch adapters: synthetic and emulated.

  • Synthetic adapters Make use of the Hyper-V VMBus architecture. They are high-performance, native devices. Synthetic devices require that the Hyper-V integration services are installed in the guest operating system.
  • Emulated adapters Available to all guests, even if integration services are not available. They perform much more slowly, and they should be used only if synthetic devices are unavailable.

You can create many virtual networks on the server running Hyper-V to provide a variety of communications channels. For example, you can create virtual switches to provide the following types of communication:

  • Private network. Provides communication between virtual machines only.
  • Internal network. Provides communication between the host server and virtual machines.
  • External network. Provides communication between a virtual machine and a physical network by creating an association to a physical network adapter on the host server.

Virtual Machine Compute

Virtual Machine Generation

In Windows Server, Hyper-V virtual machine generation is a new feature that determines the virtual hardware and functionality that is presented to the virtual machine. Hyper-V in Windows Server supports the following two generation types:

  • Generation 1 virtual machines These represent the same virtual hardware and functionality that was available in previous versions of Hyper-V, which provides BIOS-based firmware and a series of emulated devices.
  • Generation 2 virtual machines These provide the platform capability to support advanced virtual features. Generation 2 virtual machines deprecate legacy BIOS and emulators from virtual machines, which can potentially increase security and performance. Generation 2 virtual machines provide a simplified virtual hardware model including Unified Extensible Firmware Interface (UEFI) firmware.

Several legacy devices are removed from generation 2 virtual machines. Generation 2 virtual machines have a simplified virtual hardware model provided through the VMBus as synthetic devices (such as networking, storage, video, and graphics). This model supports Secure Boot, boot from a SCSI virtual hard disk or virtual DVD, or PXE boot by using a standard network adapter. In addition, generation 2 virtual machines support Unified Extensible Firmware Interface (UEFI) firmware.

Support for several legacy devices is also removed from generation 2 virtual machines, including Integrated Drive Electronics (IDE) and legacy network adapters. Generation 2 virtual machines are only supported in Windows Server, Windows Server 2012, Windows 8.1, and Windows 8 (64-bit versions).

Virtual Processors and Fabric Density

Providing sufficient CPU density is a matter of identifying the required number of physical cores to serve as virtual processors for the fabric. When performing capacity planning, it is proper to plan against cores and not hyperthreads or symmetric multithreads (SMT). Although SMT can boost performance (approximately 10 to 20 percent), SMT is not equivalent to cores.

A minimum of two CPU sockets is required for product line architecture (PLA) pattern configurations. Combined with a minimum 6-core CPU model, 12 physical cores per scale unit host are available to the virtualization layer of the fabric. Most modern server class processors support a minimum of six cores, with some supporting up to 10 cores. Given the average virtual machine requirement of two virtual CPUs, a two-socket server that has the midrange of six-core CPUs provides 12 logical CPUs. This provides a potential density of between 96 and 192 virtual CPUs on a single host.

As an example, Table 13 outlines the estimated virtual machine density based on a two-socket, six-core processor that uses a predetermined processor ratio. Note that CPU ration assumptions are highly dependent on workload analysis and planning and should be factored into any calculations. For this example, the processor ratio would have been defined through workload testing, and an estimation of potential density and required reserve capacity could then be calculated.

Nodes

Sockets

Cores

Total Cores

Logical CPU

pCPU/vCPU Ratio

Available Virtual CPU

Average Virtual CPU Workload

Estimated Raw Virtual Machine Density

Virtual Machine Density Less Reserve Capacity

1

2

6

12

12

8

96

2

48

N/A

4

2

6

12

48

8

384

2

192

144

8

2

6

12

96

8

768

2

384

336

12

2

6

12

144

8

1152

2

576

528

16

2

6

12

192

8

1536

2

768

720

32

2

6

12

384

8

3072

2

1536

1344

64

2

6

12

768

8

6144

2

3072

2688

1

2

8

16

16

8

128

2

64

N/A

4

2

8

16

64

8

512

2

256

192

8

2

8

16

128

8

1024

2

512

448

12

2

8

16

192

8

1536

2

768

704

16

2

8

16

256

8

2048

2

1024

960

32

2

8

16

512

8

4096

2

2048

1792

64

2

8

16

1024

8

8192

2

4096

3584

1

2

10

20

20

8

160

2

80

N/A

4

2

10

20

80

8

640

2

320

240

8

2

10

20

160

8

1280

2

640

560

12

2

10

20

240

8

1920

2

960

880

16

2

10

20

320

8

2560

2

1280

1200

32

2

10

20

640

8

5120

2

2560

2240

64

2

10

20

1280

8

10240

2

5120

4480

Table 9 Example virtual machine density chart

There are also supported numbers of virtual processors in a specific Hyper-V guest operating system. For more information, please see Hyper-V Overview on Microsoft TechNet.

Linux-Based Virtual Machines

Linux virtual machine support in Windows Server and System Center is focused on providing more parity with Windows-based virtual machines. The Linux Integration Services are now part of the Linux kernel version 3.x.9 and higher. This means that it is no longer a requirement to install Linux Integration Services as part of the image build process. For more information, please see The Linux Kernel Archives.

Linux virtual machines can natively take advantage of the following:

  • Dynamic Memory. Linux virtual machines can take advantage of the increased density and resource usage efficiency of Dynamic Memory. Memory ballooning and hot add are supported.
  • Support for online snapshots. Linux virtual machines can be backed up live, and they have consistent memory and file systems. Note that this is not app-aware consistency similar to what would be achieved with Volume Shadow Copy Service in Windows.
  • Online resizing of VHDs. Linux virtual machines can expand VHDs in cases where the underlying LUNs are resized.
  • Synthetic 2D frame buffer driver. Improves display performance within graphical apps.

This can be performed on Windows Server 2012 or Windows Server Hyper-V.

Automatic Virtual Machine Activation

Automatic Virtual Machine Activation is a new feature in Windows Server that makes it easier for service providers and large IT organizations to use the licensing advantages that are offered through the Windows Server Datacenter edition—specifically, unlimited virtual machine instances for a licensed system.

Automatic Virtual Machine Activation allows customers to simply license the server, which provides as many resources as desired and as many virtual machine instances of Windows Server as the system can deliver. When guests are running Windows Server and hosts are running Windows Server Datacenter, guests automatically activate. This makes it easier and faster to deploy cloud solutions based on Windows Server Datacenter.

Automatic Virtual Machine Activation (AVMA) requires that the host operating system (running on the bare metal) is running Hyper-V in Windows Server Datacenter. The supported guest operating systems (running in the virtual machine) are:

  • Windows Server Standard
  • Windows Server Datacenter
  • Windows Server Essentials

Microsoft Azure and IaaS Architecture

Microsoft Azure is the Microsoft platform for the public cloud. You can use this platform in many ways, for instance:

  • Build a web application that runs and stores its data in Microsoft data centers.
  • Store data and the applications that use this data on-premises (that is, outside the public cloud).
  • Create virtual machines for development, and then test or run production deployments of SharePoint and other applications.
  • Build massive, scalable applications that have thousands or millions of users.

For a detailed description of the Microsoft Azure services, see Microsoft Azure Solutions.

Microsoft Azure provides platform as a service (PaaS), infrastructure as a service (IaaS), and Microsoft Azure virtual machines. With the IaaS capability, Microsoft Azure becomes a core part of the Microsoft Cloud OS vision. To create hybrid cloud architectures, it is critical that you have a deep understanding of the services provided in and the architecture of Microsoft Azure (shown in the following image).

Figure 24 Microsoft Azure poster

Note To download a full-size version of this poster, see Microsoft Azure Poster in the Microsoft Download Center.

Microsoft Azure Services

Microsoft Azure Compute Services

Virtual Machines

Microsoft Azure Virtual Machines enable you to deploy a Windows Server or Linux image in the cloud. You can select images from a gallery or provide your customized images.

Cloud Services

Microsoft Azure Cloud Services remove the need to manage server infrastructure. With Web and Worker roles, Cloud Services enable you to quickly build, deploy and manage modern applications.

Web Sites

Microsoft Azure Web Sites enables you to deploy web applications on a scalable and reliable cloud infrastructure. You can quickly scale up or scale automatically to meet your application needs.

Mobile Services

Microsoft Azure Mobile Services provides a scalable cloud backend for building applications for Windows Store, Windows Phone, Apple iOS, Android, and HTML or JavaScript. You can store data in the cloud, authenticate users, and send push notifications to your application within minutes.

Microsoft Azure Data Services

Storage

Microsoft Azure Storage offers non-relational data storage including Blob, Table, Queue and Drive storage.

SQL Database

Microsoft Azure SQL Database is a relational database service that enables you to rapidly create, extend, and scale relational applications into the cloud.

SQL Reporting

Microsoft Azure SQL Reporting allows you to build easily accessible reporting capabilities into your Microsoft Azure application. You can get up and running in hours versus days at a lower upfront cost without the hassle of maintaining your own reporting infrastructure.

Backup

Microsoft Azure Backup manages cloud backups through familiar tools in Windows Server, Windows Server Essentials, or Data Protection Manager in System Center or System Center 2012.

Cache

Microsoft Azure Cache is a distributed, in-memory, scalable solution that enables you to build highly scalable and responsive applications by providing fast access to data.

HDInsight

Microsoft Azure HDInsight is a service that is based on Apache Hadoop. It allows you to manage, analyze, and report on structured or unstructured data.

Hyper-V Recovery Manager

Microsoft Azure Hyper-V Recovery Manager is a service that uses the Microsoft Azure public cloud to orchestrate and manage replication of your primary data center to a secondary site for the purposes of data management and continuity, and disaster recovery. The service makes it possible for you to use off-premises automation to control on-premises private clouds that are defined in Microsoft System Center or System Center 2012 Service Pack 1 (SP1) Virtual Machine Manager.

Microsoft Azure Network Services

Virtual Network

Microsoft Azure Virtual Network enables you to create virtual private networks (VPNs) within Microsoft Azure and securely link them to other virtual networks or to an on-premises networking infrastructure.

Traffic Manager

Microsoft Azure Traffic Manager allows you to load balance incoming traffic across multiple hosted Microsoft Azure services, whether they're running in the same data center or across data centers around the world.

Microsoft Azure Application Services

Active Directory

Microsoft Azure Active Directory provides identity management and access control capabilities for your cloud applications. You can synchronize your on-premises identities and enable single sign-on to simplify user access to cloud applications.

Multi-Factor Authentication

Microsoft Azure Multi-Factor Authentication helps prevent unauthorized access to on-premises and cloud applications by providing an additional layer of authentication. You can follow organizational security and compliance standards while also addressing user demand for convenient access.

Service Bus

Microsoft Azure Service Bus is a messaging infrastructure that sits between applications and allows them to exchange messages for improved scale and resiliency.

Notification Hubs

Microsoft Azure Notification Hubs provide a highly scalable, cross-platform push notification infrastructure that enables you to broadcast push notifications to millions of users at once or tailor notifications to individual users.

BizTalk Services

Microsoft Azure BizTalk Services is a powerful and extensible cloud-based integration service that provides business-to-business and enterprise application integration capabilities for delivering cloud and hybrid integration solutions within a secure, dedicated, per-tenant environment.

Media Services

Microsoft Azure Media Services allows you to create workflows for the creation, management, and distribution of digital media. It offers cloud-based media solutions from existing technologies including encoding, format conversion, content protection, and on-demand and live streaming.

The Microsoft Azure Training Kit provides a robust set of training and documentation. For more information, see:

Microsoft Azure Accounts and Subscriptions

A Microsoft Azure subscription grants you access to Microsoft Azure services and to the Microsoft Azure Management Portal. The terms of the Microsoft Azure account determine the scope of activities that you can perform in the Management Portal and describe the limits on available storage, network, and compute resources. For more information, please see:

In the Management Portal, you see only the services that are created by using a subscription for which you are an administrator. The billing account sets the number of compute units (virtual machines), hosted services, and storage that can be used. You can view usage information for a service by clicking the name of the service in the Management Portal.

A Microsoft Azure subscription has two aspects:

  • The Microsoft Azure account, through which resource usage is reported and services are billed. Each account is identified by a Microsoft Account or a corporate email account, and it is associated with at least one subscription. The account owner monitors usage and manages billings through the Microsoft Azure Account Center.
  • The subscription, which governs access to and use of the Microsoft Azure subscribed service. The subscription holder uses the Management Portal to manage services.

The account and the subscription can be managed by the same individual or by different individuals or groups. In a corporate enrollment, an account owner might create multiple subscriptions to give members of the technical staff access to services.

Because resource usage within an account billing is reported for each subscription, an organization can use subscriptions to track expenses for projects, departments, regional offices, and so on. The account owners use the Microsoft account that is associated with the account to sign in to the Microsoft Azure Account Center. Individuals will not have access to the Management Portal unless they have created a subscription.

Subscriptions that are created through a corporate enrollment are based on credentials that the organization provides. The subscription holder (who uses the services, but is not responsible for billings) has access to the Management Portal but not to the Microsoft Azure Account Center. By contrast, the personal account holder (who performs both duties) can sign in to either portal by using the Microsoft account that is associated with the account.

By default, Microsoft Azure subscriptions have the following boundaries:

  • 20 storage accounts (default)
  • 200 terabytes (TB) per storage account
  • 50 virtual machines in service
  • 25 PaaS roles in service (default)
  • 20 cloud services per subscription (default)
  • 250 endpoints per cloud service
  • 1,024 virtual machines in a virtual network

Sharing Service Management by Adding Co-Administrators

When a Microsoft Azure subscription is created, a service administrator is assigned. The default service administrator is the contact person for the subscription. For an individual subscription, this is the person who holds the Microsoft account that identifies the subscription. The Microsoft Azure account owner can assign a different service administrator by editing the subscription in the Microsoft Azure Account Center.

The service administrator for a subscription has full administrator rights to all Microsoft Azure services that are subscribed to and all hosted services that are deployed under the subscription. The service administrator also can perform administrative tasks for the subscription itself in the Management Portal. For example, the service administrator can manage storage accounts, affinity groups, and management certificates for the subscription.

To share management of hosted services, the service administrator can add co-administrators to the subscription. To be added as a co-administrator, a person needs only a Windows Live ID.

Subscription co-administrators share the same administrator rights as the service administrator, with one exception: a co-administrator cannot remove the service administrator from a subscription. Only the Microsoft Azure account owner can change the service administrator for a subscription by editing the subscription in the Microsoft Azure Account Center.

Important: Because service administrators and co-administrators in Microsoft Azure have broad Administrator rights for Microsoft Azure services, you should assign strong passwords for the Microsoft accounts that identify the subscribers and ensure that the credentials are not shared with unauthorized users.

In the Management Portal, the enterprise account owner only has the rights that are granted to any subscription holder. To sign in to the Management Portal, the account owner must be an administrator for a subscription. When account owners sign in to the Management Portal, they can see and manage only those hosted services that have been created under subscriptions for which they are an administrator. Enterprise account owners cannot see hosted services for subscriptions that they create for other people. To gain visibility into service management under subscriptions that they create, enterprise account owners can ask the subscription holders to add them as co-administrators.

Manage Storage Accounts for Your Subscription

You can add storage accounts to a Microsoft Azure subscription to provide access to data management services in Microsoft Azure. The storage account represents the highest level of the namespace for accessing each of the storage service components, including Blob services, Queue services, and Table services. Each storage account provides access to storage in a specific geographic region or affinity group.

Create Affinity Groups to Use with Storage Accounts and Hosted Services

By using affinity groups, you can co-locate storage and hosted services within the same data center. To use an affinity group with a hosted service, assign an affinity group instead of a geographic region when you create the service. The same option is available when you create a storage account. You cannot change the affinity group for an existing hosted service or storage account.

Add Management Certificates to a Microsoft Azure Subscription

Management certificates enable client access to Microsoft Azure resources for using the Microsoft Azure SDK tools, the Microsoft Azure Tools for Microsoft Visual Studio, or the Microsoft Azure Service Management REST APIs. For example, a management certificate is used to authenticate a user who is using Visual Studio tools to create and manage hosted services or using Windows PowerShell or command-line tools to deploy virtual machine role images.

Management certificates are not required when you work in the Management Portal. In the Management Portal, authentication is performed by using the credentials of the administrator who is performing the operation.

Creating and Managing Microsoft Azure Environments

Getting started with Microsoft Azure is relatively straightforward on an individual or small-business basis. However, for enterprise scenarios, proper management and utilization of the preceding constructs is critical from security, administration, and billing standpoints. As an example, the following diagram illustrates a simple scenario in which the Finance department utilizes a combination of billing accounts, Microsoft Azure subscriptions, Microsoft Azure Service Administrators, and Microsoft Azure Co-Administrators to model development, test, and production environments in Microsoft Azure.

Figure 25 Microsoft Azure poster

During creation of Microsoft Azure environments for medium- and large-size organizations, careful planning of the billing and administration scope is required. The preceding diagram outlines an example of how to model an organization that wants centralized billing, but with different organizational units, to manage and track its Microsoft Azure usage.

It is important to realize that in many areas, a Microsoft Azure subscription and the resources (networks, virtual machines, and so on) within that subscription have access boundaries. Communication between resources in two different subscriptions is not possible except by configuring publicly accessible endpoints or utilizing Microsoft Azure virtual private network (VPN) functionality to connect the on-premises data center to each subscription and routing traffic between the two. (This will be explained in more detail throughout this document.)

Related to billing accounts and subscriptions, administrator and co-administrator management is also a key consideration in designing Microsoft Azure environments. Although the use of a Microsoft account ID represents the default scenario, there are also options for federating an on-premises instance of Active Directory with Microsoft Azure Active Directory for a variety of scenarios, including managing administrative access to Microsoft Azure subscriptions.

If the customer uses an on-premises directory service, you can integrate it with their Microsoft Azure Active Directory to automate cloud-based administrative tasks and provide users with a more streamlined sign-in experience. Microsoft Azure Active Directory supports the following directory integration capabilities:

  • Directory synchronization: Used to synchronize on-premises directory objects (such as users, groups, and contacts) to the cloud to help reduce administrative overhead.After directory synchronization has been set up, administrators can provision directory objects from the on-premises instance of Active Directory to their tenant.
  • Single sign-on (SSO): Used to provide users with a more seamless authentication experience as they access Microsoft cloud services while they are signed in to the corporate network. To set up SSO, organizations must deploy a security-token service on premises. After SSO is set up, users can use their Active Directory corporate credentials (user name and password) to access the services in the cloud and in their existing on-premises resources.

Microsoft Azure Service-Level Agreements

For the most up-to-date information about Microsoft Azure service-level agreements (SLAs), see Service Level Agreements on the Microsoft Azure Support site.

Caching

Microsoft defines SLAs to customers for connectivity between the caching endpoints and our Internet gateway. SLA calculations are based on an average, over a monthly billing cycle, by using five-minute time intervals. For more information, see Caching SLAs on the Microsoft Download Center.

Content Delivery Network (CDN)

Microsoft guarantees that at least 99.9% of the time Microsoft Azure Content Delivery Network (CDN) will respond to client requests and deliver the requested content without error. We will review and accept data from any commercially reasonable independent measurement system that you choose to monitor your content. You must select a set of agents from the measurement system's list of standard agents that are generally available and represent at least five geographically diverse locations in major worldwide metropolitan areas (excluding People's Republic of China). For more information, see Microsoft Azure Content Distribution Network Service Level Agreement on the Microsoft Download Center.

Cloud Services, Virtual Machines, and Virtual Network

For Cloud Services, when you deploy two or more role instances in different fault and upgrade domains, Internet facing roles will have external connectivity as defined by the Azure SLA.

For all Internet facing virtual machines that have two or more instances deployed in the same availability set, Microsoft guarantees you will have external connectivity at least 99.95% of the time.

For Virtual Network, Microsoft provides Virtual Network Gateway server availability as defined by the Azure SLA. For more information, see Microsoft Azure Cloud Services, Virtual Machines, and Virtual Network SLA on the Microsoft Download Center.

Media Services

Availability of REST API transactions for media services encoding is defined by the Azure SLA. On-demand streaming will successfully make service requests with a 99.9% availability guarantee for existing media content when at least one On-Demand Streaming Reserved Unit is purchased. Availability is calculated over a monthly billing cycle. For more information, see Microsoft Azure Media Services Service Level Agreement on the Microsoft Download Center.

Mobile Services

Availability of REST API calls to all provisioned Microsoft Azure Mobile Services running in Standard and Premium tiers in a customer subscription is defined by the Azure SLA. No SLA is provided for the free tier of Mobile Services. Availability is calculated over a monthly billing cycle. For more information, see Microsoft Azure Mobile Services SLA on the Microsoft Download Center.

Multi-Factor Authentication

Availability of Microsoft Azure Multi-Factor Authentication is defined by the Azure SLA. The service is considered unavailable when it is unable to receive or process authentication requests for the multi-factor authentication provider that is deployed in a customer's subscription. Availability is calculated over a monthly billing cycle. For more information, see Multi-Factor Authentication SLA on the Microsoft Download Center.

Service Bus

Microsoft Azure Service Bus Relays are defined by the Azure SLA and properly configured applications will be able to establish a connection to a deployed Relay.

For Service Bus Queues and Service Bus Topics, availability of properly configured applications will be able to send or receive messages or perform other operations on a deployed Queue or Service Bus Topic as defined by the Azure SLA.

Availability of properly configured applications will be able to send notifications or perform registration management operations with respect to a Service Bus Notification Hub deployed within a Basic or Standard Notification Hub As defined by the Azure SLA. For more information, see Service Bus SLA on the Microsoft Download Center.

SQL Server Database

SQL Server database customers will maintain connectivity between the database and the Azure Internet gateway at a "monthly availability" of 99.9% during a billing month. The monthly availability percentage for a specific customer database is the ratio of the time the database was available to customer to the total time in the billing month. Time is measured in five-minute intervals in a 30-day monthly cycle. Availability is always calculated for a full billing month. An interval is marked as unavailable if the customer's attempts to connect to a database are rejected by the SQL Server database gateway. For more information, see SQL Database Service Level Agreement on the Microsoft Download Center.

SQL Server Reporting Services

SQL Server Reporting Services will maintain a "monthly availability" of 99.9% during a billing month. The monthly availability percentage is the ratio of the total time the customer's SQL Server Reporting Services instances were available to the total time the instances were deployed in the billing month. Time is measured in five-minute intervals. Availability is always calculated for a full billing month. An interval is marked as unavailable if the customer's initiated attempts to upload, execute, or delete reports fail to ever complete due to circumstances within Microsoft's control. For more information, see SQL Reporting Service Level Agreement on the Microsoft Download Center.

Storage

Microsoft will successfully process correctly formatted requests that we receive to add, update, read, and delete data as defined by the Azure SLA. We also guarantee that your storage accounts will have connectivity to our Internet gateway. For more information, see Microsoft Azure Storage Service Level Agreement on the Microsoft Download Center.

Web Sites

Microsoft Azure Web Sites running in the Standard tier will respond to client requests 99.9% of the time for a given billing month. Monthly availability is calculated as the ratio of the total time the customer's Standard websites were available to the total time the websites were deployed in the billing month. The website is deemed unavailable if the website fails to respond due to circumstances within Microsoft's control. For more information, see Microsoft Azure Web Sites SLA on the Microsoft Download Center.

Microsoft Azure Pricing

Microsoft Azure pricing is based on usage, and it has different metrics and pricing depending on the Microsoft Azure services or resources (storage, virtual machines, and so on) that are used. There are different payment models, from pay-as-you-go to prepaid plans. For details about pricing and pricing calculators, see Microsoft Azure pricing at-a-glance

Extending the Data Center Fabric to Microsoft Azure

As customers move to a hybrid cloud architecture, a primary scenario is extending their data center fabric (compute, storage, or network) to the cloud. There are technical (burst capacity, backup, disaster recovery, and so on) and financial (usage-based costing) reasons for doing so.

Extending the fabric to the cloud can be performed in a number of ways, such as utilizing Microsoft Azure, a Microsoft hosting partner, or a non-Microsoft cloud services (such as a service from Amazon or Google).

The two approaches presented in this document are applicable to Microsoft Azure or Microsoft hosting partners.

Data Management Services in Microsoft Azure

This section provides information about using data management services in Microsoft Azure to store and access data. Data management services in Microsoft Azure consist of:

  • Blob service: Stores large amounts of data that can be accessed from anywhere via HTTP or HTTPS
  • Queue service: Stores large numbers of messages that can be accessed via HTTP or HTTPS
  • Table service: Stores large amounts of structured, non-relational data

In addition, Microsoft Azure drives are a capability of the Blob service in that acts as an NTFS volume that is mounted on a role instance's file system and is accessible to code running in a cloud service deployment.

StorSimple provides a hybrid cloud storage solution that integrates on-premises storage with data management services in Microsoft Azure. Additionally, multiple Microsoft partners and independent software vendors (ISVs) such as CommVault and STEALTH deliver solutions that integrate with Microsoft Azure.

Data Management Services

Data management services in Microsoft Azure underlie all of the PaaS and IaaS storage needs in Microsoft Azure. Although the data management services in Microsoft Azure are typically used in PaaS scenarios by application developers, they are relevant to IaaS, because all Microsoft Azure disks and images (including virtual machine VHD files) utilize the underlying data management services in Microsoft Azure, such as the Blob service and storage accounts.

The Microsoft Azure Blob service stores large amounts of unstructured data that can be accessed from anywhere in the world via HTTP or HTTPS. A single blob can be hundreds of gigabytes in size, and a single storage account can contain up to 100 TB of blobs. Common uses for the Blob service include:

  • Serving images or documents directly to a browser.
  • Storing files for distributed access.
  • Streaming video and audio.
  • Performing secure backup and disaster recovery.
  • Storing data for analysis by an on-premises service or a service hosted by Microsoft Azure.

You can use the Blob service to expose data publicly to the world or privately for internal application storage. The Blob service contains the following components:

  • Storage account: All access to Microsoft Azure Storage is done through a storage account. This is the highest level of the namespace for accessing blobs. An account can contain an unlimited number of containers, but their total size must be under 100 TB.
  • Container: A container provides a grouping for a set of blobs. All blobs must be in a container. An account can contain an unlimited number of containers. A container can store an unlimited number of blobs within the 100 TB storage-account limit.
  • Blob: A file of any type and size within the overall size limits that are outlined in this section. You can store two types of blobs in Microsoft Azure Storage: block blobs and page blobs. Most files are block blobs. A single block blob can be up to 200 GB in size. Page blobs can be up to 1 TB in size, and they are more efficient when ranges of bytes in a file are modified frequently. For more information about blobs, see Migrating Data to Microsoft Azure Blob Storage.
  • URL format: Blobs are addressable by using the following URL format: http://<storage account>.blob.core.windows.net/<container>/<blob>

For a detailed view of the Microsoft Azure Storage service architecture, the following resources are highly recommended:

Microsoft Azure Storage: A Highly Available Cloud Storage Service with Strong Consistency

Data management services in Microsoft Azure are critical in a hybrid cloud scenario because all data that is stored in Microsoft Azure, including IaaS virtual machine VHD files, utilizes the underlying data management services in Microsoft Azure, which distribute data across multiple disks and data centers that are transparent to the running virtual machines.

Individual storage accounts have the following scalability targets:

  • Capacity: Up to 200 TB
  • Transactions: Up to 20,000 entities, messages, and blobs per second
  • Bandwidth for a geo-replication storage account
  • Ingress: Up to 5 Gbps
  • Egress: Up to 10 Gbps
  • Bandwidth for a local storage resource
  • Ingress: Up to 10 Gbps
  • Egress: Up to 15 Gbps

Note The actual transaction and bandwidth targets that are achieved by your storage account will very much depend on the size of objects, access patterns, and the type of workload that your application exhibits. To go above these targets, a service should be built to use multiple storage accounts and partition the blob containers, tables, queues, and objects across those storage accounts. By default, a single Microsoft Azure subscription comes with 20 storage accounts. However, you can contact customer support to add more storage accounts if you have to store more data—for example, if you need petabytes of data.

Planning the usage of storage accounts for deployed virtual machines and services is a key design consideration.

Virtual Hard Disks

In Microsoft Azure Storage, drives, disks, and images are all virtual hard disks (VHDs) that are stored as page blobs within your storage account. There are several slightly different VHD formats: fixed, dynamic, and differencing. Currently, Microsoft Azure supports only the fixed format.

This format lays out the logical disk linearly within the file format, so that disk offset X is stored at blob offset X. At the end of the blob, a small footer describes the properties of the VHD. All of this, which is stored in the page blob, adheres to the standard VHD format, so that you can take this VHD and mount it on your server on-premises if you choose to.

Often, the fixed format wastes space, because most disks have large unused ranges in them. However, fixed VHDs are stored as a page blob, which is a sparse format, which provides the benefits of fixed and expandable disks at the same time.

Figure 26 Azure storage

Storage Replication

Data management services in Microsoft Azure provide several options for data redundancy and replication. Some are enabled by default and included in the base pricing. For more information, see Storage Pricing Details.

Locally Redundant Storage

Locally redundant storage provides highly durable and available storage within a single location. Microsoft Azure maintains an equivalent of three copies (replicas) of your data within the primary location. For more information, see Symposium on Operating Systems Principles.

This ensures that Microsoft Azure can recover from common failures (disk, node, rack) without affecting your storage account's availability and durability. All storage writes are performed synchronously across three replicas in three separate fault domains before success is returned to the client. If there was a major data center disaster, in which part of a data center was lost, Microsoft would contact customers about potential data loss for locally redundant storage by using the customer's subscription contact information.

Geo Redundant Storage

Geo redundant storage provides Microsoft Azure's highest level of durability by additionally storing your data in a second location within the same region hundreds of miles away from the primary location. All Microsoft Azure Blob and Table data is geo-replicated; however, Queue data is not geo-replicated at this time.

With geo redundant storage, Microsoft Azure maintains three copies (replicas) of the data in the primary location and the secondary location. This ensures that each data center can recover from common failures on its own and provides a geo-replicated copy of the data in the event of a major disaster.

As in locally redundant storage, data updates are committed to the primary location before success is returned to the client. After this is complete, these updates are geo-replicated asynchronously to the secondary location.

Primary Region

Secondary Region

North Central US

South Central US

South Central US

North Central US

East US

West US

West US

East US

North Europe

West Europe

West Europe

North Europe

South East Asia

East Asia

East Asia

South East Asia

Geo redundant storage is enabled by default for all storage accounts that are in production. You can choose to disable this default state by turning off geo-replication in the Microsoft Azure portal for your accounts. You can also configure your redundant storage option when you create a new account through the Microsoft Azure Portal. For further details, see Introducing Geo-replication for Microsoft Azure Storage.

Important Geo-redundant storage is not compatible or supported when operating system disk striping is utilized. For example, if you are using sixteen 1 TB disks in a virtual machine and using operating system striping to create a single 16 TB volume in the virtual machine, the storage must be locally redundant only. For more information, see Getting Started with SQL Server in Microsoft Azure Virtual Machines.

StorSimple

StorSimple, a Microsoft company, is the leading vendor of cloud-integrated storage for Microsoft Azure. StorSimple solutions combine the data-management functions of primary storage, backup, archive, and disaster recovery with Microsoft Azure integration, enabling customers to optimize storage costs, data protection, and service agility. With its unique cloud-snapshot capability, StorSimple automatically protects and rapidly restores production data by using Microsoft Azure Storage.

The StorSimple solution combines a number of storage technologies, including Internet SCSI (iSCSI) storage area network (SAN), snapshot, backup, deduplication, and compression with storage services that are offered by cloud service providers. StorSimple solutions seamlessly integrate advanced SAN technologies (such as SSDs, SAS, automated storage tiering, deduplication, compression, and encryption) with cloud storage to reduce the storage footprint significantly and lower capital expenditures (CapEx) and operating expenditures (OpEx).

StorSimple provides a strong solution for storage tiers that is integrated tightly with Microsoft Azure Storage. StorSimple optimizes usage of different cost or performance storage options that are based automatically on data usage. This opens a large number of architecture scenarios.

Note StorSimple is currently a hardware storage appliance that is available from Microsoft. The appliance acts as an iSCSI Target Server for on-premises servers and provides several tiers of on-premises storage (by using SSD and SAS) while natively integrating with Microsoft Azure Storage.

StorSimple consolidates primary storage, archive, backup, and disaster recovery through seamless integration with the cloud. By combining StorSimple software and a custom-designed enterprise-grade hardware platform, StorSimple solutions provide high performance for primary storage and enable revolutionary speed, simplicity, and reliability for backup and recovery. Note that in the figure below that CiS refers to cloud integrated storage.

Figure 27 Hybrid storage

  • Infrastructure consolidation. StorSimple solutions consolidate primary storage, archives, backup, and disaster recovery through seamless integration with the cloud.
  • Simpler, faster backup and recovery. StorSimple cloud-based snapshots enable revolutionary speed, simplicity, and reliability for backup and recovery. Users can achieve up to 100 times faster data recovery versus traditional backup methods used in the cloud.
  • Secure data storage. StorSimple applies AES-256 encryption for all data that is transferred and stored in the cloud by using a private key that is known only to the organization.
  • Lower overall storage costs. By integrating the cloud with local enterprise storage, StorSimple can reduce total storage costs through the use of Microsoft Azure storage.

StorSimple solutions use cloud storage as an automated storage tier, offloading capacity-management burdens and ongoing capital costs. Using local and cloud snapshots, application-consistent backups complete in a fraction of the time that traditional backup systems require, while reducing the amount of data that is transferred and stored in the cloud.

Cloud-based and location-independent disaster recovery allows customers to recover their data from virtually any location that has an Internet connection and test their disaster recovery plans without affecting production systems and applications. Thin provisioning from data in the cloud enables users to resume operations after a disaster much faster than possible with physical or cloud-based tape.

Appliance Model*

5020

7020

5520

7520

Capacity

Usable local hard-drive capacity

2 TB

4 TB

10 TB

20 TB

SSD (Enterprise MLC [eMLC]) physical capacity

400 GB

600 GB

1.2 TB

2 TB

Effective local capacity**

4–10 TB

8–20 TB

20–50 TB

40–100 TB

Maximum capacity

100 TB

200 TB

300 TB

500 TB

High Availability Features

Dual, redundant, hot-swappable power-cooling modules (PCMs)

2 × 764 W PCMs, 100–240 VAC

2 × 764 W and 2 × 580 W PCMs, 100–240 VAC

Network interfaces

4 × 1 gigabit per second (Gbps) copper

Controllers

Dual, redundant, hot-swappable, active, or hot-standby controllers with automatic failover

RAID protection

Yes, including SAS hot-spare

Storage Features

iSCSI with multipath I/O support

Yes

Primary data reduction

Yes

Acceleration

Nonvolatile random access memory, SSD, cloud storage acceleration

Microsoft certification

Windows Hardware Quality Labs

VMware certification

Yes, VMware vSphere versions 4.1 and 5.1

Support for VMware vStorage APIs for array integration

Yes. (Pending future certification)

Automatic storage tiers

SSD, SAS, and cloud storage

Adaptive I/O processing

Yes. Optimizes I/O performance of mixed-pattern workloads

Data portability

Yes, access data sets across StorSimple appliances

Data Protection Features

Local backups

Yes, by using snapshots

Offsite backups or tape elimination

Yes, by using cloud snapshots and cloud clones

Microsoft VSS application consistent backups

Yes, by using Data Protection Console and hardware VSS provider

Windows Cluster Shared Volumes (CSV) and dynamic disk support

Yes. Backup CSV, mirrored dynamic disks, multi-partition disks

Protected storage migration

Yes, by using Windows host-side mirroring. Allows online backups and nondisruptive cutover

Security Features

Virtual private storage

Yes

Data-in-motion encryption

HTTPS/Secure Socket Layer (SSL)

Data-at-rest encryption

AES-256-CBC

Volume access control

IQN, CHAP

Additional security features

Multiple user accounts (local and Active Directory), role-based access, secure web proxy support

Manageability and Serviceability

Nondisruptive software upgrade

Yes, updates and new releases

Hot-swappable components

Controllers, power and cooling modules, nonvolatile random access memory batteries, SSD and SAS drives

Management and monitoring

Integrated web GUI, email alerts with call home capability, Simple Network Management Protocol (SNMP) v1/v2c

Hardware Footprint

Form factor

2U rack mountable appliance

4U rack mountable appliance

Dimensions (L × W × H [in inches])

24.8" × 19" × 3.46"

24.8" × 19" × 6.96"

The StorSimple solution uses the industry-standard iSCSI SAN protocol to connect to servers. iSCSI is easily configured for use with Microsoft and VMware servers, and it is widely understood by storage administrators.

StorSimple is intended to be used as the primary storage solution for enterprise tier 2 applications, including email, file shares, Microsoft SharePoint, content management systems, virtual machines, and large unstructured data repositories. It is not built for latency sensitive applications such as online transaction processing.

The StorSimple solution uses three different types of storage: performance-oriented flash SSDs, capacity-oriented SAS disk drives, and cloud storage. Data is moved from one type of storage to another according to its relative activity level and customer-chosen policies. Data that becomes more active is moved to a faster type of storage and data that becomes less active is moved to a higher capacity type of storage.

There are four logical tiers in the system—two at the SSD level and one each in the SAS and cloud storage levels.

Tier Name

Storage Type

Data Activity

Reduction Applied

Native

SSD

New, most active

None

Hot

SSD

Existing, most active

Deduplication

Warm

SAS

Between hot and cool

Full

Cool

Cloud

Least active

Full

The fourth column in the preceding table indicates the type of data-reduction technology that is used in the various tiers. The native tier has none, the hot tier uses data deduplication, and the warm and cool tiers use full reduction, which means that data is both compressed and deduplicated. Notice that the progression from native tier to warm tier implies that data is deduplicated before it is compressed.

Data deduplication reduces the amount of data that is stored in the system by identifying data duplicates and removing excess copies. Data deduplication is particularly effective in virtual server environments. Compression reduces the amount of data that is stored in the system by identifying strings of repeated data values and replacing them with encoded values.

Another capacity conserving technology in the StorSimple solution is thin provisioning, which allocates storage capacity as it is needed, as opposed to reserving capacity in advance. All storage in StorSimple is thinly provisioned.

StorSimple provides a broad set of data management tools that enable customers to use data management services in Microsoft Azure in ways that are familiar to them, including archive and backup storage.

Cloud snapshots are point-in-time copies of data that are stored in cool tiers in the cloud. All cloud snapshots are fully reduced (deduplicated and compressed) to minimize the amount of storage that is consumed.

A StorSimple solution transfers and stores data that already has been fully reduced. This minimizes the cost of cloud storage, the transaction costs, and the WAN bandwidth that are associated with storing data in the cloud.

In a StorSimple system, data that is ranked lowest is sent to a cool tier in the cloud, where it remains until it is accessed again and promoted to the warm tier.

StorSimple systems provide volume-level cloud mapping between storage volumes on the StorSimple system and the Microsoft Azure public cloud. Different volumes can have cool tiers in the same or different Microsoft Azure storage accounts. Every StorSimple system keeps a metadata map that describes the state of the system and provides an image of the volume's contents at the time a snapshot is taken. This map is typically .1% the size of the stored data.

AES-256 encryption is applied to all data that is transmitted and stored in the cloud by the StorSimple solution to ensure its security. SHA-256 hashing is applied to all data that is transmitted and stored in the cloud as a means to guarantee data integrity.

Cloud clones are the equivalent of a synthetic full backup that have all of the current data for a volume at the time of the last snapshot. They are stored in the cool tier for use in disaster-recovery scenarios, but they occupy separate repositories from cloud snapshots, and they can reside within the same or a different cloud repository as the volume's cloud snapshots. The following graphic shows cloud clones that are located in different repositories and shows that they can use the same or a different cloud service.

Figure 28 Distributed storage

A thin restore is a disaster-recovery process whereby a StorSimple system downloads data from the cloud. The first thing that is downloaded is the metadata map, after which users and applications can start accessing their working sets and download them. As data is downloaded, it is ranked and placed in the appropriate tier.

Figure 29 StorSimple backup

Thin restores tend to have extremely short recovery time objectives, because systems can begin accessing data after the metadata map has been downloaded. Thin restores do not restore cool data that does not belong to any working sets.

Location-independent recovery refers to the ability to perform thin restores from any location that has a suitable Internet connection. This differs from legacy disaster recovery operations, which are restricted to running at specific recovery sites.

Location independence adds an additional level of redundancy to the recovery process, and it does not require the capital investment of traditional replication solutions. A customer that has multiple data center locations can use StorSimple systems running in any of those locations to recover from disasters in any of the other sites. Similarly, a single StorSimple system can act as a spare for any of the others, providing an extremely cost-effective disaster recovery implementation.

Microsoft Azure and IaaS

There are a number of compute-related services in Microsoft Azure, including web and worker roles, cloud services, and HD Insight. The focus of this paper is hybrid cloud IaaS. Although some of the other compute capabilities will be covered briefly, the focus will be on virtual machines in Microsoft Azure.

Microsoft Azure Cloud Service

When you create a virtual machine or application and run it in Microsoft Azure, the virtual machine or the code and configuration together are called a Microsoft Azure cloud service.

By creating a cloud service, you can deploy multiple virtual machines or a multitier application in Microsoft Azure, define multiple roles to distribute processing, and allow flexible scaling of your application. A cloud service can consist of one or more virtual machines or web roles and worker roles, each of which has its own application files and configuration.

Virtual machines that are hosted in Microsoft Azure must be contained within cloud services. A single Microsoft Azure subscription by default is limited to 20 cloud services, and each cloud service can include up to 50 virtual machines.

Virtual Machines in Microsoft Azure

You can control and manage virtual machines in Microsoft Azure. After you create a virtual machine, you can delete and re-create it whenever you have to, and you can access the virtual machine just like any other server. You can use the following types of virtual hard disks (VHDs) to create a virtual machine:

  • Image: A template that you use to create a new virtual machine. An image does not have specific settings (such as the computer name and user account settings) that a running virtual machine has. If you use an image to create a virtual machine, an operating system disk is created automatically for the new virtual machine.
  • Disk: A VHD that you can start and mount as a running version of an operating system. After an image has been provisioned, it becomes a disk. A disk is always created when you use an image to create a virtual machine. Any VHD that is attached to virtualized hardware and running as part of a service is a disk.

You can use the following options to create a virtual machine from an image:

  • Create a virtual machine by using a platform image from the Microsoft Azure Management Portal.
  • Create and upload a .vhd file that contains an image to Microsoft Azure, and then use the uploaded image to create a virtual machine.

Microsoft Azure provides specific combinations of central processing unit (CPU) cores and memory for virtual machines. These combinations are known as virtual machine sizes. When you create a virtual machine, you select a specific size. This size can be changed after deployment. The sizes that are available for virtual machines are the following:

Virtual Machine Size

CPU Cores

Memory

Disk Space for Cloud Services

Disk Space for Virtual Machines

Maximum Data Disks (1 TB Each)

Maximum IOPS (500 Maximum per Disk)

ExtraSmall

Shared

768 MB

19 GB

20 GB

1

1 × 500

Small

1

1.75 GB

224 GB

70 GB

2

2 × 500

Medium

2

3.5 GB

489 GB

135 GB

4

4 × 500

Large

4

7 GB

999 GB

285 GB

8

8 × 500

ExtraLarge

8

14 GB

2,039 GB

605 GB

16

16 × 500

A5

2

14 GB

   

4

4 x 500

A6

4

28 GB

999 GB

285 GB

8

8 × 500

A7

8

56 GB

2,039 GB

605 GB

16

16 × 500

Source: Virtual Machine and Cloud Service Sizes for Microsoft Azure

Important Virtual machines will begin to incur cost as soon as they are provisioned, regardless of whether they are turned on.

Virtual Machine Storage in Microsoft Azure

A virtual machine in Microsoft Azure is created from an image or a disk. All virtual machines use one operating system disk, a temporary local disk, and possibly multiple data disks. All images and disks, except for the temporary local disk, are created from VHDs, which are .vhd files that are stored as page blobs in a storage account in Microsoft Azure. You can use platform images that are available in Microsoft Azure to create virtual machines, or you can upload your images to create customized virtual machines. The disks that are created from images are also stored in Microsoft Azure storage accounts. You can create new virtual machines easily by using existing disks.

VHD Files

A .vhd file is stored as a page blob in Microsoft Azure Storage, and it can be used for creating images, operating system disks, or data disks in Microsoft Azure. You can upload a .vhd file to Microsoft Azure and manage it just as you would any other page blob. The .vhd files can be copied or moved, and they can be deleted as long as a lease does not exist on the VHD.

A VHD can be in a fixed format or a dynamic format. Currently however, only the fixed format of .vhd files is supported in Microsoft Azure. The fixed format lays out the logical disk linearly within the file, so that disk offset X is stored at blob offset X. At the end of the blob, a small footer describes the properties of the VHD. Often, the fixed format wastes space because most disks contain large unused ranges. However, in Microsoft Azure, fixed .vhd files are stored in a sparse format, so that you receive the benefits of fixed and dynamic disks at the same time.

When you create a virtual machine from an image, a disk is created for the virtual machine, which is a copy of the original .vhd file. To protect against accidental deletion, a lease is created if you create an image, an operating system disk, or a data disk from a .vhd file.

Before you can delete the original .vhd file, you must first delete the disk or image to remove the lease. To delete a .vhd file that is being used by a virtual machine as an operating system disk, you must delete the virtual machine, delete the operating system disk, and then delete the original .vhd file. To delete a .vhd file that is used as a source for a data disk, you must detach the disk from the virtual machine, delete the disk, and then delete the .vhd file.

Images

An image is a .vhd file that you can use as a template to create a new virtual machine. An image is a template because it does not have specific settings (such as the computer name and user account settings) that a configured virtual machine does. You can use images from the Image Gallery to create virtual machines, or you can create your own images.

The Microsoft Azure Management Portal enables you to choose from several platform images to create a virtual machine. These images contain the Windows Server, Windows Server 2012 or Windows Server 2008 R2 operating system, and several distributions of the Linux operating system. A platform image can also contain applications, such as SQL Server.

To create a Windows Server image, you must run the Sysprep command on your development server to generalize and shut it down before you can upload the .vhd file that contains the operating system.

Disks

You use disks in different ways with a virtual machine in Microsoft Azure. An operating system disk is a VHD that you use to provide an operating system for a virtual machine. A data disk is a VHD that you attach to a virtual machine to store application data. You can create and delete disks whenever you want to.

You choose from among multiple ways to create disks, depending on the needs of your application. For example, a typical way to create an operating system disk is to use an image from the Image Gallery when you create a virtual machine. You can create a new data disk by attaching an empty disk to a virtual machine. You can also create a disk by using a .vhd file that has been uploaded or copied to a storage account in your subscription. You cannot use the portal to upload .vhd files. If you want to upload .vhd files you can use other tools that work with Microsoft Azure storage accounts—including the Microsoft Azure PowerShell cmdlets—to upload or copy the file.

Operating System Disk

Every virtual machine has one operating system disk. You can upload a VHD that can be used as an operating system disk, or you can create a virtual machine from an image, and a disk is created for you. An operating system disk is a VHD that you can start and mount as a running version of an operating system. Any VHD that is attached to virtualized hardware and running as part of a service is an operating system disk. The maximum size of an operating system disk can be 127 GB.

When an operating system disk is created in Microsoft Azure, three copies of the disk are created for high durability. Additionally, if you choose to use a disaster recovery plan that is geo-replication–based, your VHD is also replicated at a distance of more than 400 miles away. Operating system disks are registered as SATA drives and labeled as drive C.

Data Disk

A data disk is a VHD that can be attached to a running virtual machine to persistently store application data. You can upload and attach to the virtual machine a data disk that already contains data, or you can use the Microsoft Azure Management Portal to attach an empty disk to the virtual machine. The maximum size of a data disk is 1 TB, and you are limited in the number of disks that you can attach to a virtual machine, based on the size of the virtual machine.

Data disks are registered as SCSI drives, and you can make them available for use within the operating system by using the Disk Manager in Windows. The maximum number of data disks per virtual machine size was shown in the preceding table.

If multiple data disks are attached to a virtual machine, striping inside the virtual machine operating system can be utilized to create a single volume on the multiple attached disks (a volume of up to 16 TB that consists of a stripe of sixteen 1 TB data disks). As mentioned previously, use of operating system striping is not possible with geo-redundant data disks.

Temporary Local Disk

Each virtual machine that you create has a temporary local disk, which is labeled as drive D. This disk exists only on the physical host server on which the virtual machine is running; it is not stored in blobs in Microsoft Azure Storage. This disk is used by applications and processes that are running in the virtual machine for transient and temporary storage of data. It is used also to store page files for the operating system.

Note Any data located on a temporary drive will not survive a host-machine failure or any other operation that requires moving the virtual machine to another piece of hardware.

Use of the letter D for the drive is by default. You can change the letter as follows:

  • Deploy the virtual machine normally, with or without the second data disk attached. (The data disk initially will be drive E, if it is a formatted volume.)
  • Move the page file from drive D to drive C.
  • Reboot the virtual machine.
  • Swap the drive letters on the current drives D and E.
  • Optionally, move the page file back to the resource drive (now drive E).

If the virtual machine is resized or moved to new hardware for service healing, the drive naming will stay in place. The data disk will stay at drive D, and the resource disk will always be the first available drive letter (which would be E, in this example).

Host Caching

The operating system disk and data disk has a host caching setting (sometimes called hosted cache mode) that enables improved performance under some circumstances. However, these settings can have a negative effect on performance in other circumstances, depending on the application. By default, host caching is OFF for Read and Write operations for data disks. Host caching is ON by default for Read and Write operations for operating system disks.

RDP and Remote Windows PowerShell

New virtual machines that are created through the Microsoft Azure Management Portal have Remote Desktop Protocol (RDP) and Windows PowerShell remoting available.

Virtual Machine Placement and Affinity Groups

You can group the services in your Microsoft Azure subscription that must work together to achieve optimal performance by using affinity groups. When you create an affinity group, it lets Microsoft Azure know to keep all of the services that belong to your affinity group running on the same data center cluster. For example, if you wanted to keep different virtual machines close together, you would specify the same affinity group for those virtual machines and associated storage. Then when you deploy those virtual machines, Microsoft Azure will locate them in a data center as close to each other as possible. This reduces latency and increases performance, while potentially lowering costs.

Affinity groups are defined at the subscription level, and the name of each affinity group has to be unique within the subscription. When you create a new resource, you can use an affinity group that you previously created or create a new one.

Endpoints and ACLs

When you create a virtual machine in Microsoft Azure, it is fully accessible from any of your other virtual machines that are within the virtual network to which it is connected. All protocols (such as TCP, UDP, and Internet Control Message Protocol [ICMP]) are supported within the local virtual network. Virtual machines on your virtual network are automatically given an internal IP address from the private range (RFC 1918) that you defined when you created the network.

To provide access to your virtual machines from outside of your virtual network, you use the external IP address and configure public endpoints. These endpoints are similar to firewall and port forwarding rules, and they can be configured in the Microsoft Azure Management Portal. By default, when they are created by using the Microsoft Azure Management Portal, ports for RDP and Windows PowerShell remoting are opened. These ports use random public-port addresses, which are mapped to the correct ports on the virtual machines. You can remove these preconfigured endpoints if you have network connectivity through a VPN.

A network access control list (ACL) is a security enhancement that is available for your Microsoft Azure deployment. An ACL provides the ability to selectively permit or deny traffic for a virtual machine endpoint. This packet filtering capability provides an additional layer of security. Currently, you can specify network ACLs for virtual machine endpoints only. You cannot specify an ACL for a virtual network or a specific subnet that is contained in a virtual network.

By using network ACLs, you can do the following:

  • Selectively permit or deny incoming traffic based on a remote subnet IPv4 address range to a virtual machine input endpoint
  • Ban IP addresses
  • Create multiple rules per virtual machine endpoint
  • Specify up to 50 ACL rules per virtual machine endpoint
  • Organize the rules to ensure that the correct set of rules are applied on a given virtual machine endpoint (lowest to highest)
  • Specify an ACL for a specific remote subnet IPv4 address

An ACL is an object that contains a list of rules. When you create an ACL and apply it to a virtual machine endpoint, packet filtering takes place on the host node of your virtual machine. This means the traffic from remote IP addresses is filtered by the host node to match ACL rules, instead of on your virtual machine. This prevents your virtual machine from spending precious CPU cycles on packet filtering.

When a virtual machine is created, a default ACL is put in place to block all incoming traffic. However, if an endpoint is created for Port 3389, the default ACL is modified to allow all inbound traffic for that endpoint. Inbound traffic from any remote subnet is then allowed to that endpoint and no firewall provisioning is required. All other ports are blocked for inbound traffic unless endpoints are created for those ports. Outbound traffic is allowed by default.

You can selectively permit or deny network traffic for a virtual machine input endpoint by creating rules that specify "permit" or "deny." By default, when an endpoint is created, all traffic is denied to the endpoint. For that reason, it's important to understand how to create Permit and Deny rules and place them in the proper order of precedence if you want granular control over the network traffic that you choose to allow to reach the virtual machine endpoint.

Points to consider:

  • No ACL: By default when an endpoint is created, we permit all for the endpoint.
  • Permit: When you add one or more "permit" ranges, you are denying all other ranges by default. Only packets from the permitted IP range can communicate with the virtual machine endpoint.
  • Deny: When you add one or more Deny ranges, you are permitting all other ranges of traffic by default.
  • Combination of Permit and Deny: You can use a combination of Permit and Deny when you want to set a specific IP range to be permitted or denied.

Network ACLs can be set up on specific virtual machine endpoints. For example, you can specify a network ACL for an RDP endpoint that is created on a virtual machine, which locks down access for certain IP addresses. The following table shows a way to grant access to public virtual IPs of a certain range to permit access for RDP. All other remote IPs are denied. It follows a "lowest takes precedence" rule order.

Network ACLs can be specified on a load-balanced set endpoint. If an ACL is specified for a load-balanced set, the network ACL is applied to all virtual machines in that set. For example, if a load-balanced set is created with Port 80 and the set contains three virtual machines, the network ACL created on endpoint Port 80 of one virtual machine will automatically apply to the other virtual machines.

Figure 30 Load balanced endpoints

Virtual Machine High Availability

You can ensure the availability of your application by using multiple virtual machines in Microsoft Azure. By using multiple virtual machines in your application, you can make sure that your application is available during local network failures, local disk-hardware failures, and any planned downtime that the platform might require.

You manage the availability of your application that uses multiple virtual machines by adding the virtual machines to an availability set. Availability sets are directly related to fault domains and update domains. A fault domain in Microsoft Azure is defined by avoiding single points of failure, like the network switch or power unit of a rack of servers. In fact, a fault domain is closely equivalent to a rack of physical servers. When multiple virtual machines are connected together in a cloud service, an availability set can be used to ensure that the virtual machines are located in different fault domains. The following diagram shows two availability sets, each of which contains two virtual machines.

Figure 31 Availability sets

Microsoft Azure periodically updates the operating system that hosts the instances of an application. A virtual machine is shut down when an update is applied. An update domain is used to ensure that not all of the virtual machine instances are updated at the same time.

When you assign multiple virtual machines to an availability set, Microsoft Azure ensures that the virtual machines are assigned to different update domains. The previous diagram shows two virtual machines running Internet Information Services (IIS) in separate update domains and two virtual machines running SQL Server also in separate update domains.

Important The high availability concepts for virtual machines in Microsoft Azure are not the same as on-premises Hyper-V. Microsoft Azure does not support live migration or movement of running virtual machines. For high availability, multiple virtual machines per application or role must be created. Microsoft Azure constructs such as availability groups, and load balancing must be utilized. Each application that is being considered for deployment must be analyzed to determine how the high availability features in Microsoft Azure can be utilized. If a given application cannot use multiple roles or instances for high availability (meaning that a single virtual machine that is running the application must be online at all times), Microsoft Azure cannot support that requirement.

You should use a combination of availability sets and load-balanced endpoints (discussed in subsequent sections) to help ensure that your application is always available and running efficiently.

For more information about host updates for Microsoft Azure and how they affect virtual machines and services, see Microsoft Azure Host Updates: Why, When, and How.

Virtual Machine Load Balancing

External communication with virtual machines can occur through Microsoft Azure endpoints. These endpoints are used for different purposes, such as load-balanced traffic or direct virtual machine connectivity like RDP or SSH. You define endpoints that are associated with specific ports and are assigned a specific communication protocol. An endpoint can be assigned to either a TCP or UDP based transport protocol (the TCP protocol includes HTTP and HTTPS traffic).

Each endpoint that is defined for a virtual machine is assigned a public and private port for communication. The private port is defined for setting up communication rules on the virtual machine, and the public port is used by Microsoft Azure to communicate with the virtual machine from external resources.

If you configure it, Microsoft Azure provides round-robin load balancing of network traffic to publicly defined ports of a cloud service. When your cloud service contains instances of web roles or worker roles, you enable load balancing by setting the number of instances that are running in the service to greater-than or equal-to two and by defining a public endpoint in the service definition. For virtual machines, you can set up load balancing by creating new virtual machines, connecting them under a cloud service, and adding load-balanced endpoints to the virtual machines.

A load-balanced endpoint is a specific TCP or UDP endpoint that is used by all virtual machines that are contained in a cloud service. The following image shows a load-balanced endpoint that is shared among three virtual machines and uses a public and private Port 80:

Figure 32 More load balanced endpoints

A virtual machine must be in a healthy state to receive network traffic. You can optionally define your own method for determining the health of the virtual machine by adding a load-balancing probe to the load-balanced endpoint. Microsoft Azure probes for a response from the virtual machine every 15 seconds and takes a virtual machine out of the rotation if no response has been received after two probes. You must use Windows PowerShell to define probes on the load balancer.

Limitations of Virtual Machines in Microsoft Azure

Although virtual machines in Microsoft Azure are full running instances of Windows or Linux from an operating system perspective, in some cases (because they are virtual machines or are running on a cloud infrastructure), some operating system features and capabilities might not be supported.

The following table provides examples of Windows operating system features that are not supported for virtual machines in Microsoft Azure:

Operating System Roles or Feature

Explanation

Hyper-V

It is not supported to run Hyper-V within a virtual machine that is already running on Hyper-V.

Dynamic Host Configuration Protocol (DHCP)

Virtual machines in Microsoft Azure do not support broadcast traffic to other virtual machines.

Failover clustering

Microsoft Azure does not handle failover clustering's "virtual" or floating IP addresses for network resources.

BitLocker on operating system disk

Microsoft Azure does not support the Trusted Platform Module (TPM).

Client operating systems

Licenses for Microsoft Azure do not support client operating systems.

Virtual Desktop Infrastructure (VDI) using Remote Desktop Services (RDS)

Licenses for Microsoft Azure do not support running VDI virtual machines through RDS.

More Microsoft applications are being tested and supported for deployment of virtual machines in Microsoft Azure. For more information, see Microsoft Server Software Support for Microsoft Azure Virtual Machines.

Networking in Microsoft Azure

Network services in Microsoft Azure provide a variety of solutions for network connectivity within Microsoft Azure, and between the on-premises infrastructure and Microsoft Azure. In 2012, Microsoft Azure made substantial upgrades to the fabric and network architecture to flatten the design and significantly increase the horizontal (or node-to-node) bandwidth that is available.

With software improvements, these upgrades provide significant bandwidth between compute and storage that uses a flat-network topology. The specific implementation of the flat network for Microsoft Azure is referred to as the "Quantum 10" (Q10) network architecture. Q10 provides a fully non-blocking, 10 Gbps–based, fully meshed network. It provides an aggregate backplane in excess of 50 terabytes per second (Tbps) of bandwidth for each data center running Microsoft Azure.

Another major improvement for reliability and throughput is moving from a hardware load balancer to a software load balancer. After these upgrades, the storage architecture and design that were described in previous sections was tuned to leverage the new Q10 network fully to provide flat-network storage for Microsoft Azure Storage.

For architectural details of the Q10 design, see VL2: A Scalable and Flexible Data Center Network.

Virtual Network in Microsoft Azure

Virtual Network in Microsoft Azure enables you to create secure site-to-site connectivity and protected private virtual networks in the cloud. You can specify the address space that will be used for your virtual network and the virtual network gateway. Additionally, new name-resolution features allow you to connect directly to role instances and virtual machines by host name. These features allow you to use Microsoft Azure as a protected private virtual network in the cloud.

Before you configure Virtual Network, you should carefully consider possible scenarios. For this release, it can be difficult to make changes after your virtual network has been created and you have deployed role instances and virtual machines. After this stage of deployment, you cannot easily modify the baseline network configuration, and many values cannot be modified without pulling back roles and virtual machines, and then reconfiguring them. Therefore, you should not attempt to create a virtual network and then try to adapt the scenario to fit the network. Scenarios that are enabled by Virtual Network in Microsoft Azure include:

  • Create secure site-to-site network connectivity between Microsoft Azure and your on-premises network, effectively creating a virtual branch office or data center in the cloud. This is possible by using a hosted VPN gateway and a supported VPN gateway device (including Routing and Remote Access service (RRAS) in Windows Server and Windows Server 2012).
  • Extend your enterprise networks to Microsoft Azure.
  • Migrate existing applications and services to Microsoft Azure.
  • Host name resolution. You can specify an on-premises Domain Name System (DNS) server or a dedicated DNS server that is running elsewhere.
  • Persistent dynamic IP addresses for virtual machines. This means that the internal IP address of your virtual machines will not change, even when you restart a virtual machine.
  • Join virtual machines that are running in Microsoft Azure to your domain that is running on-premises.
  • Create point-to-site virtual networks, which enables individual workstations to establish VPN connectivity to Microsoft Azure virtual networks—for example, for developers at a remote site to connect to network services in Microsoft Azure.

Virtual Network in Microsoft Azure has the following properties:

  • Virtual machines can have only one IP address (or one IP address plus a virtual IP address, if they are load balanced; note that the virtual IP address is not assigned to the virtual machine itself, it's assigned to the gateway device that forwards packets to the virtual machine).
  • Every virtual machine gets an IP address from DHCP; static IP addresses are not supported.
  • Virtual machines on the same virtual network can communicate.
  • Virtual machines on different virtual networks cannot communicate directly and must loop back through the Internet to communicate.
  • Egress traffic from Microsoft Azure is charged.
  • Ingress traffic to Microsoft Azure is free (not charged).
  • All virtual machines by default have Internet access. There is currently no official way to force Internet traffic to go through on-premises devices, such as proxies.
  • There is only one virtual gateway per virtual network.

As mentioned previously, virtual networks and subnets in Microsoft Azure must utilize private (RFC 1918) IP-address ranges.

Site-to-Site VPN in Microsoft Azure

You can link Virtual Network in Microsoft Azure to an on-premises network through a site-to-site VPN connection.

To create a secure VPN connection, the person who will configure the VPN device must coordinate with the person who will create the Management Portal configuration. This coordination is required, because the Management Portal requires IP address information from the VPN device to start the VPN connection and create the shared key. The shared key is then exported to configure the VPN gateway device and complete the connection.

Sample configuration scripts are available for many, but not all, VPN devices. If your VPN device is in the list of supported devices, you can download the corresponding sample configuration script to help you configure the device. If you do not see your VPN device in the list, your device still might work with Virtual Network in Microsoft Azure if it satisfies the requirements. For more information, see About VPN Devices for Virtual Network.

Point-to-Site VPN in Microsoft Azure

The point-to-site VPN in Microsoft Azure allows you to set up VPN connections between individual computers and Virtual Network in Microsoft Azure without a VPN device on-premises. This feature is called Point-to-Site Virtual Private Networking, which is the same as a remote access VPN connection from the client to the Azure Virtual Network. It greatly simplifies the setup of secure connections between Microsoft Azure and client computers, whether from an office environment or from remote locations.

It is especially useful for developers who want to connect to Virtual Network in Microsoft Azure (and to the individual virtual machines within it) from behind a corporate firewall or from a remote location.

Because the connection is point-to-site, they do not need their IT staff to perform any activities to enable it, and no VPN hardware needs to be installed or configured. Instead, you can simply use the VPN client that is built-in to Windows to tunnel to Virtual Network in Microsoft Azure. This tunnel uses the Secure Sockets Tunneling Protocol (SSTP), and it can traverse firewalls and proxies automatically, while giving you complete security.

Here's a visual representation of the point-to-site scenarios:

Figure 33: Connecting virtual networks

Affinity Groups

After you have created a virtual network, an affinity group is also created. When you create resources (such as storage accounts) in Microsoft Azure, an affinity group tells Window Azure that you want to keep these resources located together. When you have an affinity group, you should reference this always when you are creating related resources.

Name Resolution

Name resolution is an important consideration for virtual network design. Even though you may create a secure site-to-site VPN connection, communication by host name is not possible without name resolution.

To refer to virtual machines and role instances within a cloud service by host name directly, Microsoft Azure provides a name resolution service. This service is used for internal host name resolution within a cloud service. The name resolution service that is provided by Microsoft Azure is a completely separate service from that which is used to access your public endpoints on the Internet.

Before you deploy role instances or virtual machines, you must consider how you want name resolution to be handled. Two options are available: you can use the internal name resolution that is provided by Microsoft Azure or choose to specify a DNS server that is not maintained by Microsoft Azure. Not all configuration options are available for every deployment type. Carefully consider your deployment scenario before you make this choice.

DNS Considerations

When you define a virtual network, Microsoft Azure will provide a DNS Server service. However, if you want to use your existing DNS infrastructure for name resolution, or you have a dependency on Active Directory, you need to define your own. Defining your own in the virtual network configuration doesn't actually create a DNS server; instead you are configuring the DHCP Server service to include the DNS server IP that you define. This DNS server could be a reference to an existing on-premises DNS server, or a new DNS server that you will provision in the cloud.

Configuring your virtual network to use the name resolution service that is provided by Microsoft Azure is a relatively simple option. However, you may require a more full-featured DNS solution to support virtual machines or complex configurations. Your choice of name resolution method should be based on the scenario that it will support. You can use the following table to help make your decision.

Scenario

Name Resolution

Points to Consider

Cross-premises: Name resolution between role instances or virtual machines in Microsoft Azure and on-premises computers

DNS solution of your choice (not Microsoft Azure service)

  • Name resolution (DNS) design
  • Address space
  • Supported VPN gateway device
  • Internet-accessible IP address for your VPN gateway device

Cross-premises: Name resolution between on-premises computers and role instances or virtual machines in Microsoft Azure

DNS solution of your choice (not Microsoft Azure service)

  • Name resolution (DNS) design
  • Address space
  • Supported VPN gateway device
  • Internet-accessible IP address for your VPN gateway device

Name resolution between role instances located in the same cloud service

Microsoft Azure name resolution service (internal)

  • Name resolution (DNS) design

Name resolution between virtual machines located in the same cloud service

Microsoft Azure name resolution service (internal)

  • Name resolution (DNS) design

Name resolution between virtual machines and role instances located in the same Virtual Network, but different cloud services

DNS solution of your choice (or Microsoft Azure service)

  • Name resolution (DNS) design
  • Address space
  • Supported VPN gateway device
  • Internet-accessible IP address for your VPN gateway device

Name resolution between virtual machines and role instances that are located in the same cloud services but not in a Microsoft Azure Virtual Network.

Not applicable

  • Virtual machines and role instances cannot be deployed in the same cloud service

Name resolution between role instances that are located in different cloud services but not in a Microsoft Azure Virtual Network.

Not applicable

  • Connectivity between virtual machines or role instances in different cloud services is not supported outside a virtual network

Name resolution between virtual machines that are located in the same Microsoft Azure Virtual Network.

DNS solution of your choice (not Microsoft Azure service)

  • Name-resolution (DNS) design
  • Address space
  • Supported VPN gateway device
  • Internet-accessible IP address for your VPN gateway device

Use name resolution to direct traffic between datacenters.

 

See Microsoft Azure Traffic Manager

Control the distribution of user traffic to services hosted by Microsoft Azure hosted.

 

See Microsoft Azure Traffic Manager

Although the name resolution service that is provided by Microsoft Azure requires very little configuration, it is not the appropriate choice for all deployments. If your network requires name resolution across cloud services or across premises, you must use your own DNS server. If you want to register additional DNS records, you will have to use a DNS solution that is not provided by Microsoft Azure.

Microsoft Azure Traffic Manager

The Microsoft Azure Traffic Manager allows you to control the distribution of user traffic to Microsoft Azure hosted services. The services can be running in the same data center or in different centers around the world. Traffic Manager works by applying an intelligent policy engine to the DNS queries in your domain names.

The following conceptual diagram demonstrates Traffic Manager routing. The user uses the www.contoso.com company domain and to reach a hosted service to accept the request. The Traffic Manager policy dictates which hosted service receives the request. Although Traffic Manager conceptually routes traffic to a given hosted service, the actual process is slightly different because it uses DNS. No actual service traffic routes through Traffic Manager. The client computer calls the hosted service directly when Traffic Manager resolves the DNS entry for the company domain to the IP address of a hosted service.

Figure 34: Name resolution

The numbers in the preceding diagram correspond to the numbered descriptions in the following list:

  • User traffic to company domain: The user requests information by using the company domain name. The typical process to resolve a DNS name to an IP address begins. Company domains must be reserved through normal Internet domain-name registration processes and are maintained outside of Traffic Manager. In this diagram, the company domain is www.contoso.com.
  • Company domain to Traffic Manager domain: The DNS resource record for the company domain points to a Traffic Manager domain that is maintained in Microsoft Azure Traffic Manager. In the example, the Traffic Manager domain is contoso.trafficmanager.net.
  • Traffic Manager domain and policy: The Traffic Manager domain is part of the Traffic Manager policy. Traffic enters through the domain. The policy dictates how to route that traffic.
  • Traffic Manager policy rules processed: The Traffic Manager policy uses the chosen load-balancing method and monitoring status to determine which Microsoft Azure hosted service should service the request.
  • Hosted-service domain name sent to user: Traffic Manager returns the DNS name of the hosted service to the IP address of a chosen hosted service to the user. The user's local DNS server resolves the domain to the IP address of the chosen hosted service.
  • User calls hosted service: The client computer calls the chosen hosted service directly by using the returned IP address. Because the company domain and resolved IP address are cached on the client computer, the user continues to interact with the chosen hosted service until its local DNS cache expires.

Note The Windows operating system caches DNS host entries for the duration of their Time to Live (TTL). Whenever you evaluate Traffic Manager policies, retrieving host entries from the cache bypasses the policy, and you can observe unexpected behavior. If the TTL of a DNS host entry in the cache expires, new requests for the same host name should result in the client running a fresh DNS query. However, browsers typically cache these entries for longer periods, even after their TTL has expired. To reflect the behavior of a Traffic Manager policy accurately when accessing the application through a browser, it is necessary to force the browser to clear its DNS cache before each request.

Repeat: The process repeats itself when the client's DNS cache expires. The user might receive the IP address of a different hosted service, depending on the load-balancing method that is applied to the policy and the health of the hosted service at the time of the request.

The following links include additional details about this process:

Microsoft Azure Content Delivery Network

Microsoft Azure Content Delivery Network (CDN) offers developers a global solution for delivering high-bandwidth content by caching blobs and static content of compute instances on physical nodes in the United States, Europe, Asia, Australia, and South America. For a current list of CDN node locations, see Microsoft Azure CDN Node Locations.

The benefits of using CDN to cache Microsoft Azure data include:

  • Better performance and user experience for users who are far from a content source and are using applications for which many "Internet trips" are required to load content.
  • Large distributed scale to better handle instantaneous high load—for example, at the start of an event, such as a product launch.

To use Microsoft Azure CDN, you must have a Microsoft Azure subscription and enable the feature on the storage account or hosted service in the Microsoft Azure Management Portal. CDN is an add-on feature to your subscription, and it has a separate billing plan. For more information, see Microsoft Azure Pricing.

Fabric and Fabric Management

The PLA patterns at a high level include the concepts of compute, storage, and network fabrics. This is logically and physically independent from components, such as the components in System Center that provides management of the underlying fabric.

Figure 33: Fabric and fabric management components

Fabric

The definition of the fabric is all of the physical and virtual resources under the scope of management of the fabric management infrastructure. The fabric is typically the entire compute, storage, and network infrastructure—usually implemented as Hyper-V host clusters—being managed by the System Center infrastructure.

For private cloud infrastructures, the fabric constitutes a resource pool that consists of one of more scale units. In a modular architecture, the concept of a scale unit refers to the point to which a module in the architecture can scale before another module is required. For example, an individual server is a scale unit, because it can be expanded to a certain point in terms of CPU and RAM; however, once it reaches its maximum scalability, an additional server is required to continue scaling. Each scale unit also has an associated amount of physical installation and configuration labor. With large scale units, like a preconfigured full rack of servers, the labor overhead can be minimized.

It is critical to know the scale limits of all hardware and software components when determining the optimum scale units for the overall architecture. Scale units enable the documentation of all the requirements (for example, space, power, HVAC, or connectivity) needed for implementation.

Fabric Management

Fabric management is the concept of treating discrete capacity pools of servers, storage, and networks as a single fabric. The fabric is then subdivided into capacity clouds, or resource pools, which carry characteristics like delegation of access and administration, service-level agreements (SLAs), and cost metering. Fabric management enables the centralization and automation of complex management functions that can be carried out in a highly standardized, repeatable fashion to increase availability and lower operational costs.

Fabric Management Host Architecture

In a private cloud infrastructure, it is recommended that the systems that make up the fabric resource pools be physically separate from the systems that provide fabric management. Much like the concept of having a top-of-rack (ToR) switch, it is recommended to provide separate fabric management hosts to manage the underlying services that provide capacity to the private cloud infrastructure. This model helps make sure that the availability of the fabric is separated from fabric management, and regardless of the state of the underlying fabric resource pools, management of the infrastructure and its workloads is maintained at all times.

To support this level of availability and separation, private cloud architectures should contain a separate set of hosts (minimum of two) configured as a failover cluster in which the Hyper-V role is enabled. Furthermore, these hosts should contain highly availability virtualized instances of the management infrastructure (System Center) to support fabric management operations that are stored on dedicated CSVs.

Non-Converged Architecture Pattern

This section contains an architectural example that is based on the non-converged pattern validation requirements that were outlined in the previous sections. This example provides guidance about the hardware that is required to build the non-converged pattern reference architecture by using high-level, non-OEM–specific system models.

As explained earlier, the non-converged pattern comprises traditional blade or non-blade servers that utilize a standard network and storage network infrastructure to support a high availability Hyper-V failover cluster fabric infrastructure. This infrastructure pattern provides the performance of a large-scale Hyper-V host infrastructure and the flexibility of utilizing existing infrastructure investments at a lower cost than a converged architecture.

Figure 25 outlines a logical structure of components that follow this architectural pattern.

Figure 34 Non-converged architecture pattern

Compute

The compute infrastructure is one of the primary elements that must scale to support a large number of workloads. In a non-converged fabric infrastructure, a set of hosts that have the Hyper-V role enabled provide the fabric with the capability to achieve scale in the form of a large-scale failover cluster.

Figure 26 provides an overview of the compute layer of the private cloud fabric infrastructure.

Figure 35 Compute minimum configuration

Hyper-V Host Infrastructure

The Hyper-V host infrastructure is comprised of a minimum of four hosts and a maximum of 64 hosts in a single Hyper-V failover-cluster instance. Although Windows Server failover clustering supports a minimum of two nodes, a configuration at that scale does not provide a sufficient reserve capacity to achieve cloud attributes such as elasticity and resource pooling.

As with any failover-cluster configuration, reserve capacity must be accounted for in the host infrastructure. Adopting a simple n-1 methodology does not always provide a sufficient amount of reserve capacity to support the workloads that are running on the fabric infrastructure. For true resilience to outages, we recommend that you size the reserve capacity within a single scale unit to one or more hosts. This is critical for delivering availability within a private cloud infrastructure and it is a key consideration when you are advertising the potential workload capacity of the fabric infrastructure.

Equally important to the overall density of the fabric is the amount of physical memory that is available for each fabric host. For service provider and enterprise configurations, a minimum of 192 GB of memory is required. As the demand for memory within workloads increases, this becomes the second largest factor for scale and density in the compute fabric architecture.

As discussed earlier, Hyper-V provides Dynamic Memory to support higher densities of workloads through a planned oversubscription model. Although it is safe to assume that this feature will provide increased density for the fabric, a private cloud infrastructure should carefully consider the use of Hyper-V Dynamic Memory as part of the compute design due to supportability limitations and performance requirements in certain workloads. Always refer to the vendor workload recommendations and support guidelines when you enable Hyper-V Dynamic Memory.

Additional considerations that should be accounted for in density calculations include:

  • The amount of startup RAM that is required for each operating system
  • The minimum RAM that is allocated to the virtual machine after startup for normal operations
  • The maximum RAM that is assigned to the system to prevent oversubscription scenarios when memory demand is high

The Hyper-V parent partition (host) must have sufficient memory to provide services such as I/O virtualization, snapshot, and management to support the child partitions (guests). Previous guidance was provided to tune the parent partition reserve; however; when Dynamic Memory is used, root reserve is calculated automatically (based on the root physical memory and NUMA architecture of the hosts) and no longer requires manual configuration.

Although guidance about network connectivity that uses onboard network connections is provided in the following section, you should make sure that out-of-band (OOB) network-management connectivity is provided to support the remote management and provisioning capabilities that are found within System Center.

To address these capabilities, the compute infrastructure should support a minimum of one OOB management interface, with support for Intelligent Platform Management Interface (IPMI) 1.5/Data Center Management Interface (DCMI) 1.0 or Systems Management Architecture for Server Hardware (SMASH) 1.0 over WS-Man. Failure to include this component will result in a compute infrastructure that cannot utilize automated provisioning and management capabilities in the private cloud solution.

It should be assumed that customers will also require multiple types (or classifications) of resource pools to support a number of scenarios and associated workloads. These types of resource pools are expected to be evaluated as part of the capabilities that the resulting fabric will be required to provide.

For example, a resource pool that is intended for VDI resources might have different hardware, such as specialized graphics cards, to support RemoteFX capabilities within Hyper-V. For these reasons, options for a compute infrastructure that provide advanced resource pool capabilities, such as the RemoteFX resource pool, should be available to address these needs and provide a complete solution.

Network

When you are designing the fabric network for the Hyper-V failover cluster in Windows Server, it is important to provide the necessary hardware and network throughput to provide resiliency and Quality of Service (QoS). Resiliency can be achieved through availability mechanisms, and QoS can be provided through dedicated network interfaces or through a combination of hardware and software QoS capabilities.

Figure 27 provides an overview of the network layer of the private cloud fabric infrastructure.

Figure 36 Network minimum configuration

Host Connectivity

During the design of the network topology and associated network components of the private cloud infrastructure, the following key considerations apply:

  • Provide adequate network port density: Designs should contain top-of-rack switches with sufficient density to support all host network interfaces.
  • Provide adequate interfaces to support network resiliency: Designs should contain a sufficient number of network interfaces to establish redundancy through NIC Teaming.
  • Provide network Quality of Service (QoS): Although dedicated cluster networks is an acceptable way to achieve QoS, utilizing high-speed network connections in combination with hardware- or software-defined network QoS policies provides a more flexible solution.

For PLA pattern designs, a minimum of two 10 GbE network interfaces and one OOB management connection is assumed a minimum baseline of network connectivity for the fabric architecture. Two interfaces are used for cluster traffic, and the third is available as a management interface. To provide resiliency, additional interfaces can be added and teamed by using the NIC Teaming feature in Windows Server.

It is recommended to have redundant network communication between all private cloud cluster nodes. As previously described, host connectivity in a private cloud infrastructure should support the following types of communication that are required by Hyper-V and the failover clusters that make up the fabric:

  • Host management
  • Virtual machine
  • Live migration
  • iSCSI (if required)
  • Intra-cluster communication and CSV

Host management consists of isolated network traffic to manage the parent partition (host), and virtual machine traffic is on an accessible network for clients to access the virtual machines. The usage of the virtual machine traffic is highly dependent on the running workload and the interaction of the client with that application or service.

Live migration traffic is intermittent and used during virtual machine mobility scenarios such as planned failover events. This has the potential to generate a large amount of network traffic over short periods during transition between nodes. Live migration will default to the second lowest metric if three or more networks are configured in failover clustering.

When iSCSI is used, a dedicated storage network should be deployed within the fabric (because this is the non-converged pattern where storage traffic has a dedicated network). These interfaces should be disabled for cluster use, because cluster traffic can contribute to storage latency. Intracluster communication and CSV traffic consist of the following traffic types:

  • Network health monitoring
  • Intra-cluster communication
  • CSV I/O redirection

Network health monitoring traffic consists of heartbeats that are sent to monitor the health status of network interfaces in a full mesh manner. This lightweight unicast traffic (approximately 134 bytes) is sent between cluster nodes over all cluster-enabled networks.

Because of its sensitivity to latency, bandwidth is important, as opposed to Quality of Service, because if heartbeat traffic becomes blocked due to network saturation, fabric nodes could be removed from cluster membership. By default, nodes exchange these heartbeats every one second, and they are considered to be down if they do not respond to five heartbeats.

Intra-cluster communication is variable (based on workload), and it is responsible for sending database updates and synchronizing state changes between the nodes in the fabric cluster. This lightweight traffic communicates over a single interface. As with network health monitoring, bandwidth is the primary concern, because this type of traffic is sensitive to latency during state changes, such as failover.

CSV I/O redirection traffic consists of lightweight metadata updates, and it can communicate over the same interface as the intracluster communication mentioned previously. It requires a defined Quality of Service to function properly. CSV uses SMB to route I/O over the network between nodes during failover events, so sufficient bandwidth is required to handle the forwarded I/O between cluster nodes. Additionally, CSV traffic will utilize SMB Multichannel and advanced network adapter capabilities such as RDMA; however, use of Jumbo Frames has shown little increase in performance.

Storage

Storage provides the final component for workload scaling, and as for any workload, storage must be designed properly to provide the required performance and capacity for overall fabric scale. In a non-converged fabric infrastructure, traditional SAN infrastructures that are connected over Fibre Channel or iSCSI provide the fabric with sufficient capacity to achieve storage scale.

Figure 28 provides an overview of the storage infrastructure for the non-converged pattern.

Figure 37 Storage minimum configuration

Storage Connectivity

For the operating system volume of the parent partition that is using direct-attached storage to the host, an internal SATA or SAS controller is required, unless the design utilizes SAN for all system storage requirements, including boot from SAN for the host operating system (Fibre Channel and iSCSI boot are supported in Windows Server). Depending on the storage protocol and devices that are used in the non-converged storage design, the following adapters are required to allow shared storage access:

  • If using Fibre Channel SAN, two or more host bus adapters (HBAs)
  • If using iSCSI, two or more 10 GbE network adapters or HBAs

As described earlier, Hyper-V in Windows Server supports the ability to present SAN storage to the guest workloads that are hosted on the fabric infrastructure by using virtual Fibre Channel adapters. Virtual SANs are logical equivalents of virtual network switches within Hyper-V, and each virtual SAN maps to a single physical Fibre Channel uplink. To support multiple HBAs, a separate virtual SAN must be created per physical Fibre Channel HBA and mapped exactly to its corresponding physical topology.

When configurations use multiple HBAs, MPIO must be enabled within the virtual machine workload. A virtual SAN assignment should follow a pattern that is similar to a Hyper-V virtual switch assignment, in that if there are different classifications of service within the SAN, it should be reflected within the fabric.

As discussed in earlier sections, all physical Fibre Channel equipment must support NPIV. Hardware vendors must also provide drivers that display the Designed for Windows logo for all Fibre Channel HBAs, unless the drivers are provided in Windows Server. If zoning that is based on physical Fibre Channel switch ports is part of the fabric design, all physical ports must be added to allow for virtual machine mobility scenarios across hosts in the fabric cluster. Although virtual machines can support iSCSI boot, boot from SAN is not supported over the virtual Fibre Channel adapter and should not be considered as part of workload design.

Storage Infrastructure

The key attribute of the storage infrastructure for the non-converged pattern is the use of a traditional SAN infrastructure to provide access to storage to the fabric, fabric management, and workload layers. As discussed earlier, the primary reasons to adopt or maintain this design are to preserve existing investments in SAN or to maintain the current level of flexibility and capabilities that a SAN-based storage-array architecture provides.

For Hyper-V failover cluster and workload operations in a non-converged infrastructure, the fabric components utilize the following types of storage:

  • Operating system: Non-shared physical boot disks (direct-attached storage or SAN) for the fabric management host servers
  • Cluster witness: Shared witness disk or file share to support the failover cluster quorum
  • Cluster Shared Volumes (CSV): One or more CSV LUNs for virtual machines (Fibre Channel or iSCSI), as presented by the SAN
  • Guest clustering [optional]: Shared Fibre Channel, shared VHDX, or shared iSCSI LUNs for guest clustering

Figure 29 provides a conceptual view of the storage architecture for the non-converged pattern.

Figure 38 Non-converged architecture pattern

As outlined in the overview, fabric and fabric management host controllers require sufficient storage to account for the operating system and paging files. We recommend that virtual memory in Windows Server be configured as "Automatically manage paging file size for all drives."

Although boot from SAN by using Fibre Channel or iSCSI storage is supported in Windows Server, it is widely accepted to have storage configured locally per server to provide these capabilities for each server given the configuration of standard non-converged servers. In these cases, local storage should include two disks that are configured as RAID 1 (mirror) as a minimum, with an optional global hot spare.

To provide quorum for the server infrastructure, we recommend utilizing a quorum configuration of Node and Disk Majority. A cluster witness disk is required to support this quorum model. In non-converged pattern configurations, we recommend that you provide a 1 GB witness disk that is formatted as NTFS for all fabric and fabric management clusters. This provides resiliency and prevents partition-in-time scenarios within the cluster.

As described in earlier sections, Windows Server provides multiple hosts access to a shared disk infrastructure through CSV. For non-converged patterns, the SAN should be configured to provide adequate storage for virtual machine workloads. Given that workload, virtual disks often exceed multiple gigabytes, so we recommend that where supported by the workload, dynamically expanding disks be used to provide higher density and more efficient use of storage. Additional SAN capabilities, such as thin provisioning of LUNs, can assist with the consumption of physical space. However, this functionality should be evaluated to help make sure that workload performance is not negatively affected.

For the purposes of Hyper-V failover clustering, CSV must be configured in Windows as a basic disk that is formatted as NTFS (FAT and FAT32 are not supported for CSV). CSV cannot be used as a witness disk, and they cannot have Windows Data Deduplication enabled. Although supported, ReFS should not be used in conjunction with a CSV with Hyper-V workloads. A CSV has no restrictions for the number of virtual machines that it can support on an individual CSV because metadata updates on a CSV are orchestrated on the server side, and they run in parallel to provide no interruption and increased scalability.

Performance considerations fall primarily on the IOPS that SAN provides, given that multiple servers from the Hyper-V failover cluster stream I/O to a commonly shared LUN. Providing more than one CSV to the Hyper-V failover cluster within the fabric can increase performance, depending on the SAN configuration.

To support guest clustering, LUNs can be presented to the guest operating system through iSCSI or Fibre Channel. Configurations for the non-converged pattern should include sufficient space on the SAN to support the number of LUNs that are needed for workloads with high-availability requirements that must be satisfied within the guest virtual machines and associated applications.

Converged Architecture Pattern

This section contains an architectural example that is based on the converged-pattern validation requirements that were previously outlined. This example will provide guidance about the hardware that is required to build the converged pattern reference architecture by using high-level, non-OEM–specific system models.

As explained earlier, the converged pattern comprises advanced blade servers that utilize a converged-network and storage-network infrastructure (often referred to as converged network architecture) to support a high availability Hyper-V failover-cluster fabric infrastructure. This infrastructure pattern provides the performance of a large-scale Hyper-V host infrastructure and the flexibility of utilizing software-defined networking capabilities at a higher system density than can be achieved through traditional non-converged architectures.

Although many aspects of converged architectures are the same, this section outlines the key differences between these two patterns. The following diagram outlines an example logical structure of components that follow the converged architectural pattern.

Figure 39 Converged architecture pattern

In the converged pattern, the physical converged network adapters (CNAs) are teamed, and they present network adapters and Fibre Channel HBAs to the parent operating system. From the perspective of the parent operating system, it appears that network adapters and Fibre Channel HBAs are installed. The configuration of teaming and other settings are performed at the hardware level.

Compute

As identified in the non-converged pattern, compute infrastructure remains as the primary element that provides fabric scale to support a large number of workloads. Identical to the non-converged pattern, the converged fabric infrastructure consists of an array of hosts that have the Hyper-V role enabled to provide the fabric with the capability to achieve scale in the form of a large-scale failover cluster.

Figure 31 provides an overview of the compute layer of the private cloud fabric infrastructure.

Figure 40 Compute minimum configuration

With the exception of storage connectivity, the compute infrastructure of the converged pattern is similar to the infrastructure of the non-converged pattern, because the Hyper-V host clusters utilize FCoE or iSCSI to connect to storage over a high-speed, converged network architecture.

Hyper-V Host Infrastructure

As in non-converged infrastructures, the server infrastructure comprises a minimum of four hosts and a maximum of 64 hosts in a single Hyper-V failover-cluster instance. Although Windows Server failover clustering supports a minimum of two nodes, a configuration at that scale does not provide sufficient reserve capacity to achieve cloud attributes such as elasticity and resource pooling.

Converged infrastructures typically utilize blade servers and enclosures to provide compute capacity. In large-scale deployments in which multiple resource pools exist across multiple blade enclosures, a guideline of containing no more than 25 percent of a single cluster in a blade enclosure is recommended.

Network

When you are designing the fabric network for the Windows Server Hyper-V failover cluster, it is important to provide the necessary hardware and network throughput to provide resiliency and Quality of Service (QoS). Resiliency can be achieved through availability mechanisms, and QoS can be provided through dedicated network interfaces or through a combination of hardware and software QoS capabilities.

Figure 32 provides an overview of the network layer of the private cloud fabric infrastructure.

Figure 41 Network minimum configuration

Host Connectivity

During the design of the network topology and associated network components of the private cloud infrastructure, the following key considerations apply:

  • Provide adequate network port density—Designs should contain top-of-rack switches with sufficient density to support all host network interfaces.
  • Provide adequate interfaces to support network resiliency—Designs should contain a sufficient number of network interfaces to establish redundancy through NIC teaming.
  • Provide network Quality of Service— Having dedicated cluster networks is an acceptable way to achieve QoS, however the use of high-speed network connections in combination with either hardware-defined or software-defined network QoS policies provide a more flexible solution.

For PLA pattern designs, a minimum of two 10 GbE converged network adapters (CNAs) and one OOB management connection is a minimum baseline of network connectivity for the fabric architecture. Two interfaces are used for cluster traffic, and the third is available as a management interface. To provide resiliency, additional interfaces can be added and teamed by using a network adapter teaming solution from your OEM hardware. We recommend that you provide redundant network communication between all private cluster nodes. As previously described, host connectivity in a private cloud infrastructure should support the following types of communication that are required by Hyper-V and the failover clusters that make up the fabric, including:

  • Host management
  • Virtual machine
  • Live migration
  • FCoE or iSCSI
  • Intracluster communication and CSV

In a converged network architecture, LAN and storage traffic utilize Ethernet as the transport. Fibre Channel and iSCSI are possible choices for the converged infrastructure pattern. Although using the customer address over SMB could also be considered a converged architecture, it is broken into a separate design pattern.

The converged pattern refers to either FCoE or iSCSI approaches. Proper network planning is critical in a converged design. Use of Quality of Service (QoS), VLANs, and other isolation or reservation approaches is strongly recommended, so that storage and LAN traffic is appropriately balanced.

Storage

Storage provides the final component for workload scaling, and as with any workload, it must be designed properly to provide the required performance and capacity for overall fabric scale. In a converged fabric infrastructure, connectivity to the storage uses an Ethernet-based approach, such as iSCSI or FCoE.

Figure 33 provides an overview of the storage infrastructure for the converged pattern.

Figure 42 Storage minimum configuration

Storage Connectivity

For the operating system volume of the parent partition that is using direct-attached storage to the host, an internal SATA or SAS controller is required, unless the design utilizes SAN for all system-storage requirements, including boot from SAN for the host operating system (Fibre Channel and iSCSI boot are supported in Windows Server).

Depending on the storage protocol and devices that are used in the converged storage design, the following adapters are required to allow shared storage access:

  • If using Fibre Channel SAN, two or more converged network adapters (CNAs)
  • If using iSCSI, two or more 10-gigabit (GB) Ethernet network adapters or iSCSI HBAs

As described earlier, Windows Server Hyper-V supports the ability to present SAN storage to the guest workloads that are hosted on the fabric infrastructure by using virtual Fibre Channel adapters. Virtual SANs are logical equivalents of virtual network switches within Hyper-V, and each Virtual SAN maps to a single physical Fibre Channel uplink. To support multiple CNAs, a separate Virtual SAN must be created per physical Fibre Channel CNA and mapped exactly to its corresponding physical topology. When configurations use multiple CNAs, MPIO must be enabled within the virtual machine workload itself. Virtual SAN assignment should follow a similar pattern as Hyper-V virtual switch assignment in that, if there are different classifications of service within the SAN, it should be reflected within the fabric.

As discussed in earlier sections, all physical Fibre Channel equipment must support NPIV. Hardware vendors must also provide drivers that display the Designed for Windows logo for all Fibre Channel CNAs, unless drivers are provided in-box. If zoning that is based on physical Fibre Channel switch ports is part of the fabric design, all physical ports must be added to allow for virtual machine mobility scenarios across hosts in the fabric cluster. Although virtual machines can support iSCSI boot, boot from SAN is not supported over the virtual Fibre Channel adapter and should not be considered as part of workload design.

Storage Infrastructure

The key attribute of the storage infrastructure for the converged pattern is the use of a traditional SAN infrastructure, but it is accessed through an Ethernet transport for the fabric, fabric management, and workload layers. As discussed earlier, the primary reason to adopt or maintain this design is to preserve existing investments in a SAN infrastructure or to maintain the current level of flexibility and capabilities that a SAN-based storage-array architecture provides, while consolidating to a single Ethernet network infrastructure.

For Hyper-V failover-cluster and workload operations in a converged infrastructure, the fabric components utilize the following types of storage:

  • Operating system: Non-shared physical boot disks (direct-attached storage or SAN) for the fabric management host servers (unless using boot from SAN)
  • Cluster witness: Shared witness disk or file share to support the failover cluster quorum
  • Cluster Shared Volumes (CSV): One or more shared CSV LUNs for virtual machines (Fibre Channel or iSCSI), as presented by the SAN
  • Guest clustering [optional]: Shared Fibre Channel, shared VHDX, or shared iSCSI LUNs for guest clustering

Figure 34 provides a conceptual view of this architecture for the converged pattern.

Figure 43 Converged architecture pattern

As outlined in the overview, fabric and fabric management host controllers require sufficient storage to account for the operating system and paging files. In Windows Server, we recommend that virtual memory be configured as "Automatically manage paging file size for all drives."

Although boot from SAN from Fibre Channel and iSCSI storage is supported in Windows Server and Windows Server 2012, it is widely accepted to have storage configured locally per server to provide these capabilities for each server, given the configuration of standard non-converged servers. In these cases, local storage should include two disks that are configured as RAID 1 (mirror) as a minimum, with an optional global hot spare.

To provide quorum for the server infrastructure, we recommend that you utilize a quorum configuration of Node and Disk Majority. To support this, a cluster witness disk is required to support this quorum model. In converged pattern configurations, we recommend that a 1 GB witness disk that is formatted as NTFS be provided for all fabric and fabric management clusters. This provides resiliency and prevents partition-in-time scenarios within the cluster.

As described in earlier sections, Windows Server provides multiple host access to a shared disk infrastructure through CSV. For converged patterns, the SAN should be configured to provide adequate storage for virtual machine workloads. Because workload virtual disks often exceed multiple gigabytes, we recommend that where it is supported by the workload, dynamically expanding disks be used to provide higher density and more efficient use of storage. Additional SAN capabilities such as thin provisioning of LUNs can assist with the consumption of physical space. However, this functionality should be evaluated to help make sure that workload performance is not affected.

For the purposes of Hyper-V failover clustering, CSV must be configured in Windows as a basic disk that is formatted as NTFS (FAT and FAT32 are not supported for CSV). CSV cannot be used as a witness disk, and they cannot have Windows Data Deduplication enabled. Although supported, ReFS should not be used in conjunction with a CSV with Hyper-V workloads. A CSV has no restrictions in the number of virtual machines that it can support on an individual CSV, because metadata updates on a CSV are orchestrated on the server side and they run in parallel to provide no interruption and increased scalability.

Performance considerations fall primarily on the IOPS that the SAN provides, given that multiple servers from the Hyper-V failover-cluster stream I/O to a commonly shared LUN. Providing more than one CSV to the Hyper-V failover cluster within the fabric can increase performance, depending on the SAN configuration.

To support guest clustering, LUNs can be presented to the guest operating system through iSCSI or Fibre Channel. Configurations for the converged pattern should include sufficient space on the SAN to support a small number of LUNs that support workloads with high availability requirements that must be satisfied within the guest virtual machines and associated applications.

Software Defined Infrastructure Architecture Pattern

Key attributes of the Software Defined Infrastructure pattern (previously called the Continuous Availability over SMB Storage pattern) include the use of the SMB 3.02 protocol, and in the case of Variation A, the implementation of the new Scale-Out File Server cluster design pattern in Windows Server.

This section outlines a finished example of a Software Defined Infrastructure pattern that uses Variation A. The following diagram shows the high-level architecture.

Figure 44 Software Defined Infrastructure storage architecture pattern

The design consists of one or more Windows Server Scale-Out File Server clusters (left) combined with one or more Hyper-V host clusters (right). In this sample design, a shared SAS storage architecture is utilized by the Scale-Out File Server clusters. The Hyper-V hosts store their virtual machines on SMB shares in the file cluster, which is built on top of Storage Spaces and Cluster Shared Volumes.

A key choice in the Software Defined Infrastructure pattern is whether to use InfiniBand or Ethernet as the network transport between the failover clusters that are managed by Hyper-V and the clusters that are managed by the Scale-Out File Server. Currently, InfiniBand provides higher speeds per port than Ethernet (56 Gbps for InfiniBand compared to 10 or 40 GbE). However, InfiniBand requires a separate switching infrastructure, whereas an Ethernet-based approach can utilize a single physical network infrastructure.

Compute

The compute infrastructure is one of the primary elements that provides fabric scale to support a large number of workloads. In a Software Defined Infrastructure pattern fabric infrastructure, an array of hosts that have the Hyper-V role enabled provide the fabric with the capability to achieve scale in the form of a large-scale failover cluster.

Figure 36 provides an overview of the compute layer of the private cloud fabric infrastructure.

Figure 45 Microsoft Azure poster

With the exception of storage connectivity, the compute infrastructure of this design pattern is similar to the infrastructure of the converged and non-converged patterns. However, the Hyper-V host clusters utilize the SMB protocol over Ethernet or InfiniBand to connect to storage.

Hyper-V Host Infrastructure

The server infrastructure is comprised of a minimum of four hosts and a maximum of 64 hosts in a single Hyper-V failover cluster instance. Although a minimum of two nodes is supported by failover clustering in Windows Server, a configuration at that scale does not provide sufficient reserve capacity to achieve cloud attributes such as elasticity and resource pooling.

Note The same sizing and availability guidance that is provided in the Hyper-V Host Infrastructure subsection (in the Non-Converged Architecture Pattern section) applies to this pattern.

Figure 46 provides a conceptual view of this architecture for the Software Defined Infrastructure pattern.

Figure 46 Microsoft Azure poster

A key factor in this computer infrastructure is determining whether Ethernet or InfiniBand will be utilized as the transport between the Hyper-V host clusters and the Scale-Out File Server clusters. Another consideration is how RDMA (recommended) will be deployed to support the design.

As outlined in previous sections, RDMA cannot be used in conjunction with NIC Teaming. Therefore, in this design, which utilizes a 10 GbE network fabric, each Hyper-V host server in the compute layer contains four 10 GbE network adapters. One pair is for virtual machine and cluster traffic, and it utilizes NIC Teaming. The other pair is for storage connectivity to the Scale-Out File Server clusters, and it is RDMA-capable.

Network

When designing the fabric network for the failover cluster for Windows Server Hyper-V, it is important to provide the necessary hardware and network throughput to provide resiliency and Quality of Service (QoS). Resiliency can be achieved through availability mechanisms, while QoS can be provided through dedicated network interfaces or through a combination of hardware and software QoS capabilities.

Figure 38 provides an overview of the network layer of the private cloud fabric infrastructure.

Figure 47 Network minimum configuration

Host Connectivity

When you are designing the network topology and associated network components of the private cloud infrastructure, certain key considerations apply. You should provide:

  • Adequate network port density: Designs should contain top-of-rack switches that have sufficient density to support all host network interfaces.
  • Adequate interfaces to support network resiliency: Designs should contain a sufficient number of network interfaces to establish redundancy through NIC Teaming.
  • Network Quality of Service: Although the use of dedicated cluster networks is an acceptable way to achieve Quality of Service, utilizing high-speed network connections in combination with hardware- or software-defined network QoS policies provides a more flexible solution.
  • RDMA support: For the adapters (InfiniBand or Ethernet) that will be used for storage (SMB) traffic, RDMA support is required.

The network architecture for this design pattern is critical because all storage traffic will traverse a network (Ethernet or InfiniBand) between the Hyper-V host clusters and the Scale-Out File Server clusters.

Storage

Storage Connectivity

For the operating system volume of the parent partition that is using direct-attached storage to the host, an internal SATA or SAS controller is required—unless the design utilizes SAN for all system storage requirements, including boot from SAN for the host operating system. (Fibre Channel and iSCSI boot are supported in Windows Server and Windows Server 2012.)

Depending on the storage transport that is utilized for the Software Defined Infrastructure design pattern, the following adapters are required to allow shared storage access:

Hyper-V host clusters:

  • 10 GbE adapters that support RDMA
  • InfiniBand adapters that support RDMA

Scale-Out File Server clusters:

  • 10 GbE adapters that support RDMA
  • InfiniBand adapters that support RDMA
  • SAS controllers (host bus adapters) for access to shared SAS storage

The number of adapters and ports that are required for storage connectivity between the Hyper-V host clusters and the Scale-Out File Server clusters depends on a variety of size and density planning factors. The larger the clusters and the higher the number of virtual machines that are to be hosted, the more bandwidth and IOPS capacity is required between the clusters.

Scale-Out File Server Cluster Architecture

The key attribute of Variation A and B of the Software Defined Infrastructure design pattern is the usage of Scale-Out File Server clusters in Windows Server as the "front end" or access point to storage. The Hyper-V host clusters that run virtual machines have no direct storage connectivity. Instead, they have SMB Direct (RDMA)–enabled network adapters, and they store their virtual machines on file shares that are presented by the Scale-Out File Server clusters.

For the PLA patterns, there are two options for the Scale-Out File Server clusters that are required for Variations A and B. The first is the Fast Track "small" SKU, or "Cluster-in-a-Box," which can serve as the storage cluster. Any validated small SKU can be used as the storage tier for the "medium" IaaS PLA Software Defined Infrastructure Storage pattern. The small SKU is combined with one or more dedicated Hyper-V host clusters for the fabric.

The second option is a larger, dedicated Scale-Out File Server cluster that meets all of the validation requirements that are outlined in the Software Defined Infrastructure Storage section. Figures 48, 49, and 50 illustrate these options.

Figure 48 Software Defined Infrastructure storage options

In the preceding design, a dedicated fabric management cluster and one or more fabric clusters use a Scale-Out File Server cluster as the storage infrastructure.

Figure 49 Another option for fabric management design

In the preceding design, a dedicated fabric management cluster and one or more fabric clusters use a Fast Track "small" (or "Cluster-in-a-Box") SKU as the storage infrastructure.

Cluster-in-a-Box

As part of the Fast Track program, Microsoft has been working with server industry customers to create a new generation of simpler, high-availability solutions that deliver small implementations as a "Cluster-in-a-Box" or as consolidation appliance solutions at a lower price.

In this scenario, the solution is designed as a storage "building block" for the data center, such as a dedicated storage appliance. Examples of this scenario are cloud solution builders and enterprise data centers. For example, suppose that the solution supported Server Message Block (SMB) 3.0 file shares for Hyper-V or SQL Server. In this case, the solution would enable the transfer of data from the drives to the network at bus and wire speeds with CPU utilization that is comparable to Fibre Channel.

In this scenario, the file server is enabled in an office environment in an enterprise equipment room that provides access to a switched network. As a high-performance file server, the solution can support variable workloads, hosted line-of-business (LOB) applications, and data.

The "Cluster-in-a-Box" design pattern requires a minimum of two clustered server nodes and shared storage that can be housed within a single enclosure design or a multiple enclosure design, as shown in Figure 50.

Figure 50 Cluster-in-a-Box

Fast Track Medium Scale-Out File Server Cluster

For higher end scenarios in which larger capacity I/O or performance is required, larger multinode Scale-Out File Server clusters can be utilized. Higher performing networks (such as 10 GbE or 56 GbE InfiniBand) can be used between the failover clusters that are managed by Hyper-V and the file cluster.

The Scale-Out File Server cluster design is scaled by adding additional file servers to the cluster. By using CSV 2.0, administrators can create file shares that provide simultaneous access to data files, with direct I/O, through all nodes in a file-server cluster. This provides better utilization of network bandwidth and load balancing of the file server clients (Hyper-V hosts).

Additional nodes also provide additional storage connectivity, which enables further load balancing between a larger number of servers and disks.

In many cases, the scaling of the file server cluster when you use SAS JBOD runs into limits in terms of how many adapters and individual disk trays can be attached to the same cluster. You can avoid these limitations and achieve additional scale by using a switched SAS infrastructure, as described in previous sections.

Figure 51 illustrates this approach. For simplicity, only file-cluster nodes are diagrammed; however, this could easily be four nodes or eight nodes for scale-out.

Figure 51 Medium scale-out file server

Highlights of this design include the SAS switches, which allow a significantly larger number of disk trays and paths between all hosts and the storage. This approach can enable hundreds of disks and many connections per server (for instance, two or more four-port SAS cards per server).

To have resiliency against the failure of one SAS enclosure, you can use two-way mirroring (use minimum of three disks in the mirror for failover clustering and CSV) and enclosure awareness, which requires three physical enclosures. Two-way mirror spaces must use three or more physical disks. Therefore, three enclosures are required with one disk in each enclosure so that the storage pool is resilient to one enclosure failure. For this design, the pool must be configured with the IsEnclosureAware flag, and the enclosures must be certified to use the Storage Spaces feature in Windows Server.

For enclosure awareness, Storage Spaces leverage the array's failure and identify lights to indicate drive failure or a specific drive's location within the disk tray. The array or enclosure must support SCSI Enclosure Services (SES) 3.0. Enclosure awareness is independent of a SAS switch or the number of compute nodes.

This design also illustrates a 10 GbE with RDMA design for the file server cluster to provide high bandwidth and low latency for SMB traffic. This could also be InfiniBand if requirements dictate that. Balancing the available I/O storage capacity through the SAS infrastructure to the demands of the failover clusters that are managed by Hyper-V that will be utilizing the file cluster for their storage is key to a good design. An extremely high-performance InfiniBand infrastructure does not make sense if the file servers will have only two SAS connections to storage.

Storage Infrastructure

For Hyper-V failover-cluster and workload operations in a continuous availability infrastructure, the fabric components utilize the following types of storage:

  • Operating system: Non-shared physical boot disks (direct-attached storage or SAN) for the file servers and Hyper-V host servers.
  • Cluster witness: File share to support the failover cluster quorum for the file server clusters and the Hyper-V host clusters (a shared witness disk is also supported).
  • Cluster Shared Volumes (CSV): One or more shared CSV LUNs for virtual machines on Storage Spaces that are backed by SAS JBOD.
  • Guest clustering [optional]: Requires iSCSI or shared VHDX. For this pattern, adding the iSCSI Target Server to the file server cluster nodes can enable iSCSI shared storage for guest clustering. However, shared VHDX is the recommended approach because it maintains separation between the consumer and the virtualization infrastructure that is supplied by the provider.

As outlined in the overview, fabric and fabric management host controllers require sufficient storage to account for the operating system and paging files. However, in Windows Server, we recommend that virtual memory be configured as "Automatically manage paging file size for all drives."

The Sizing of the physical storage architecture for this design pattern is highly dependent on the quantity and type of virtual machine workloads that are to be hosted.

Given the workload, virtual disks often exceed multiple gigabytes. Where it is supported by the workload, we recommend using dynamically expanding disks to provide higher density and more efficient use of storage.

CSV on the Scale-Out File Server clusters must be configured in Windows as a basic disk that is formatted as NTFS. While supported on CSV, ReFS is not recommended and is specifically not recommend for IaaS. In addition, CSV cannot be used as a witness disk, and they cannot have Windows Data Deduplication enabled.

A CSV has no restrictions in the number of virtual machines that it can support on an individual CSV, because metadata updates on a CSV are orchestrated on the server side, and they run in parallel to provide no interruption and increased scalability.

Performance considerations fall primarily on the IOPS that the file cluster provides, given that multiple servers from the Hyper-V failover cluster connect through SMB to a commonly shared CSV on the file cluster. Providing more than one CSV to the Hyper-V failover cluster within the fabric can increase performance, depending on the file cluster configuration.

Multi-Tenant Designs

In many private cloud scenarios, and nearly all hosting scenarios, a multi-tenant infrastructure is required. This section illustrates how a multi-tenant fabric infrastructure can be created by using Windows Server and the technologies described in this Fabric Architecture Guide.

In general, multi-tenancy implies multiple non-related consumers or customers of a set of services. Within a single organization, this could be multiple business units with resources and data that must remain separate for legal or compliance reasons. Most hosting companies require multi-tenancy as a core attribute of their business model. This might include a dedicated physical infrastructure for each hosted customer or logical segmentation of a shared infrastructure by using software-defined technologies.

Requirements Gathering

The design of a multi-tenant fabric begins with a careful analysis of the business requirements, which will drive the design. In many cases, legal or compliance requirements drive the design approach, which means that a team of several disciplines (for example, business, technical, and legal) should participate in the requirements gathering phase. If specific legal or compliance regimes are required, a plan to ensure compliance and ongoing auditing (internal or non-Microsoft) should be implemented.

To organize the requirements gathering process, an "outside in" approach can be helpful. For hosted services, the end customer or consumer is outside of the hosting organization. Requirements gathering can begin by taking on the persona of the consumer and determining how the consumer will become aware of and be able to request access to hosted services.

Then consider multiple consumers, and ask the following questions:

  • Will consumers use accounts that the host creates or accounts that they use internally to access services?
  • Is one consumer allowed to be aware of other consumer's identities, or is a separation required?

Moving further into the "outside in" process, determine whether legal or compliance concerns require dedicated resources for each consumer:

  • Can multiple consumers share a physical infrastructure?
  • Can traffic from multiple consumers share a common network?
  • Can software-defined isolation meet the requirements?
  • How far into the infrastructure must authentication, authorization, and accounting be maintained for each consumer (for example, only at the database level, or including the disks and LUNs that are used by the consumer in the infrastructure)?

The following list provides a sample of the types of design and segmentation options that might be considered as part of a multi-tenant infrastructure:

  • Physical separation by customer (dedicated hosts, network, and storage)
  • Logical separation by customer (shared physical infrastructure with logical segmentation)
  • Data separation (such as dedicated databases and LUNs)
  • Network separation (VLANs or private VLANs)
  • Performance separation by customer (shared infrastructure but guaranteed capacity or QoS)

The remainder of this section describes multi-tenancy options at the fabric level and how those technologies can be combined to enable a multi-tenant fabric.

Infrastructure Requirements

The aforementioned requirements gathering process should result in a clear direction and set of mandatory attributes that the fabric architecture must contain. The first key decision is whether a shared storage infrastructure or dedicated storage per tenant is required. For a host, driving toward as much shared infrastructure as possible is typically a business imperative, but there can be cases where it is prohibited.

As mentioned in the previous storage sections in this Fabric Architecture Guide, Windows Server supports a range of traditional storage technologies such as JBOD, iSCSI and Fiber Channel SANs, and converged technologies such as FCoE. In addition, the new capabilities of storage spaces, cluster shared volumes, storage spaces tiering and Scale-Out File Server clusters present a potentially lower cost solution for advanced storage infrastructures.

The shared versus dedicated storage infrastructure requirement drives a significant portion of the design process. If dedicated storage infrastructures per tenant are required, appropriate sizing and minimization of cost are paramount. It can be difficult to scale down traditional SAN approaches to a large number of small- or medium-sized tenants. In this case, the Scale-Out File Server cluster and Storage Spaces approach, which uses shared SAS JBOD, can scale down cost effectively to a pair of file servers and a single SAS tray.

Figure 52 Scale out file server and storage spaces

On the other end of the spectrum, if shared but logically segmented storage is an option, nearly all storage options become potentially relevant. Traditional Fiber Channel and iSCSI SANs have evolved to provide a range of capabilities to support multi-tenant environments through technologies such as zoning, masking, and virtual SANs. With the scalability enhancements in Windows Server in the storage stack, large-scale shared storage infrastructures that use the Scale-Out File Server cluster and Storage Spaces can also be a cost effective choice.

Although previous sections discussed architecture and scalability, this section highlights technologies for storage security and isolation in multi-tenant environments.

Multi-Tenant Storage Considerations

SMB 3.0

The Server Message Block (SMB) protocol is a network file sharing protocol that allows applications on a computer to read and write to files and to request services from server programs in a computer network. The SMB protocol can be used on top of TCP/IP or other network protocols.

By using the SMB protocol, an application (or the user of an application) can access files or other resources on a remote server. This allows users to read, create, and update files on the remote server. The application can also communicate with any server program that is set up to receive an SMB client request.

Windows Server provides the following ways to use the SMB 3.0 protocol:

  • File storage for virtualization (Hyper-V over SMB): Hyper-V can store virtual machine files (such as configuration files, virtual hard disk (VHD) files, and snapshots) in file shares over the SMB 3.0 protocol. This can be used for stand-alone file servers and for clustered file servers that use Hyper-V with shared file storage for the cluster.
  • SQL Server over SMB: SQL Server can store user database files on SMB file shares. Currently, this is supported with SQL Server 2008 R2 for stand-alone servers.
  • Traditional storage for end-user data: The SMB 3.0 protocol provides enhancements to the information worker (client) workloads. These enhancements include reducing the application latencies that are experienced by branch office users when they access data over wide area networks (WANs), and protecting data from eavesdropping attacks.
SMB Encryption

A security concern for data that traverses untrusted networks is that it is prone to eavesdropping attacks. Existing solutions for this issue typically use IPsec, WAN accelerators, or other dedicated hardware solutions. However, these solutions are expensive to set up and maintain.

Windows Server includes encryption that is built-in to the SMB protocol. This allows end-to-end data protection from snooping attacks with no additional deployment costs. You have the flexibility to decide whether the entire server or only specific file shares should be enabled for encryption. SMB Encryption is also relevant to server application workloads if the application data is on a file server and it traverses untrusted networks. With this feature, data security is maintained while it is on the wire.

Cluster Shared Volumes

By using Cluster Shared Volumes (CSV), you can unify storage access into a single namespace for ease-of-management. A common namespace folder that contains all CSV in the failover cluster is created at the path C:\ClusterStorage\. All cluster nodes can access a CSV at the same time, regardless of the number of servers, the number of JBOD enclosures, or the number of provisioned virtual disks.

This unified namespace enables high availability workloads to transparently fail over to another server if a server failure occurs. It also enables you to easily take a server offline for maintenance.

Clustered storage spaces can help protect against the following risks:

  • Physical disk failures: When you deploy a clustered storage space, protection against physical disk failures is provided by creating storage spaces with the mirror resiliency type. Additionally, mirror spaces use "dirty region tracking" to track modifications to the disks in the pool. When the system resumes from a power fault or a hard reset event and the spaces are brought back online, dirty region tracking creates consistency among the disks in the pool.
  • Data access failures: If you have redundancy at all levels, you can protect against failed components, such as a failed cable from the enclosure to the server, a failed SAS adapter, power faults, or failure of a JBOD enclosure. For example, in an enterprise deployment, you should have redundant SAS adapters, SAS I/O modules, and power supplies. To protect against a complete disk enclosure failure, you can use redundant JBOD enclosures.
  • Data corruptions and volume unavailability: The NTFS file system and the Resilient File System (ReFS) help protect against corruption. For NTFS, improvements to the CHKDSK tool in Windows Server can greatly improve availability. If you deploy highly available file servers, you can use ReFS to enable high levels of scalability and data integrity regardless of hardware or software failures.
  • Server node failures: Through the Failover Clustering feature in Windows Server, you can provide high availability for the underlying storage and workloads. This helps protect against server failure and enables you to take a server offline for maintenance without service interruption.

The following are some of the technologies in Windows Server that can enable multi-tenant architectures.

  • File storage for virtualization (Hyper-V over SMB): Hyper-V can store virtual machine files (such as configuration files, virtual hard disk (VHD) files, and snapshots) in file shares over the SMB 3.0 protocol. This can be used for stand-alone file servers and for clustered file servers that use Hyper-V with shared file storage for the cluster.
  • SQL Server over SMB: SQL Server can store user database files on SMB file shares. Currently, this is supported by SQL Server 2012 and SQL Server 2008 R2 for stand-alone servers.
  • Storage visibility to only a subset of nodes: Windows Server enables cluster deployments that contain application and data nodes, so storage can be limited to a subset of nodes.
  • Integration with Storage Spaces: This technology allows virtualization of cluster storage on groups of inexpensive disks. The Storage Spaces feature in Windows Server can integrate with CSV to permit scale-out access to data.
Security and Storage Access Control

A solution that uses file clusters, storage spaces, and SMB 3.0 in Windows Server eases the management of large scale storage solutions because nearly all the setup and configuration is Windows based with associated Windows PowerShell support.

If desired, particular storage can be made visible to only a subset of nodes in the file cluster. This can be used in some scenarios to leverage the cost and management advantage of larger shared clusters, and to segment those clusters for performance or access purposes.

Additionally, at various levels of the storage stack (for example, file shares, CSV, and storage spaces), access control lists can be applied. In a multi-tenant scenario, this means that the full storage infrastructure can be shared and managed centrally and that dedicated and controlled access to segments of the storage infrastructure can be designed. A particular customer could have LUNs, storage pools, storage spaces, cluster shared volumes, and file shares dedicated to them, and access control lists can ensure only that tenant has access to them.

Additionally, by using SMB Encryption, all access to the file-based storage can be encrypted to protect against tampering and eavesdropping attacks. The biggest benefit of using SMB Encryption over more general solutions (such as IPsec) is that there are no deployment requirements or costs beyond changing the SMB settings in the server. The encryption algorithm used is AES-CCM, which also provides data integrity validation.

Multi-Tenant Network Considerations

The network infrastructure is one of the most common and critical layers of the fabric where multi-tenant design is implemented. It is also an area of rapid innovation because the traditional methods of traffic segmentation, such as VLANs and port ACLs, are beginning to show their age, and they are unable to keep up with highly virtualized, large-scale hosting data centers and hybrid cloud scenarios.

The following sections describe the range of technologies that are provided in Windows Server for building modern, secure, multi-tenant network infrastructures.

Windows Network Virtualization

Hyper-V Network Virtualization provides the concept of a virtual network that is independent of the underlying physical network. With this concept of virtual networks, which are composed of one or more virtual subnets, the exact physical location of an IP subnet is decoupled from the virtual network topology.

As a result, customers can easily move their subnets to the cloud while preserving their existing IP addresses and topology in the cloud so that existing services continue to work unaware of the physical location of the subnets.

Hyper-V Network Virtualization provides policy-based, software-controlled network virtualization that reduces the management overhead that is faced by enterprises when they expand dedicated IaaS clouds, and it provides cloud hosts with better flexibility and scalability for managing virtual machines to achieve higher resource utilization.

An IaaS scenario that has multiple virtual machines from different organizational divisions (referred to as a dedicated cloud) or different customers (referred to as a hosted cloud) requires secure isolation. Virtual local area networks (VLANs), can present significant disadvantages in this scenario.

For more information, see Hyper-V Network Virtualization Overview in the TechNet Library.

VLANs Currently, VLANs are the mechanism that most organizations use to support address space reuse and tenant isolation. A VLAN uses explicit tagging (VLAN ID) in the Ethernet frame headers, and it relies on Ethernet switches to enforce isolation and restrict traffic to network nodes with the same VLAN ID. As described earlier, there are disadvantages with VLANs, which introduce challenges in large-scale, multi-tenant environments.

IP address assignment In addition to the disadvantages that are presented by VLANs, virtual machine IP address assignment presents issues, which include:

  • Physical locations in the data center network infrastructure determine virtual machine IP addresses. As a result, moving to the cloud typically requires rationalization and possibly sharing IP addresses across workloads and tenants.
  • Policies are tied to IP addresses, such as firewall rules, resource discovery, and directory services. Changing IP addresses requires updating all the associated policies.
  • Virtual machine deployment and traffic isolation are dependent on the network topology.

When data center network administrators plan the physical layout of the data center, they must make decisions about where subnets will be physically placed and routed. These decisions are based on IP and Ethernet technology that influence the potential IP addresses that are allowed for virtual machines running on a given server or a blade that is connected to a particular rack in the data center. When a virtual machine is provisioned and placed in the data center, it must adhere to these choices and restrictions regarding the IP address. Therefore, the typical result is that the data center administrators assign new IP addresses to the virtual machines.

The issue with this requirement is that in addition to being an address, there is semantic information associated with an IP address. For instance, one subnet might contain given services or be in a distinct physical location. Firewall rules, access control policies, and IPsec security associations are commonly associated with IP addresses. Changing IP addresses forces the virtual machine owners to adjust all their policies that were based on the original IP address. This renumbering overhead is so high that many enterprises choose to deploy only new services to the cloud, leaving legacy applications alone.

Hyper-V Network Virtualization decouples virtual networks for customer virtual machines from the physical network infrastructure. As a result, it enables customer virtual machines to maintain their original IP addresses, while allowing data center administrators to provision customer virtual machines anywhere in the data center without reconfiguring physical IP addresses or VLAN IDs.

Each virtual network adapter in Hyper-V Network Virtualization is associated with two IP addresses:

  • Customer address: The IP address that is assigned by the customer, based on their intranet infrastructure. This address enables the customer to exchange network traffic with the virtual machine as if it had not been moved to a public or private cloud. The customer address is visible to the virtual machine and reachable by the customer.
  • Provider address: The IP address that is assigned by the host or the data center administrators, based on their physical network infrastructure. The provider address appears in the packets on the network that are exchanged with the server running Hyper-V that is hosting the virtual machine. The provider address is visible on the physical network, but not to the virtual machine.

The customer addresses maintain the customer's network topology, which is virtualized and decoupled from the actual underlying physical network topology and addresses, as implemented by the provider addresses. Figure 53 shows the conceptual relationship between virtual machine customer addresses and network infrastructure provider addresses as a result of network virtualization.

Figure 53 HNV

Key aspects of network virtualization in this scenario include:

  • Each customer address in the virtual machine is mapped to a physical host provider address.
  • Virtual machines send data packets in the customer address spaces, which are put into an "envelope" with a provider address source and destination pair, based on the mapping.
  • The customer address and provider address mappings must allow the hosts to differentiate packets for different customer virtual machines.

As a result, the network is virtualized by the network addresses that are used by the virtual machines.

Hyper-V Network Virtualization supports the following modes to virtualize the IP address:

  • Generic Routing Encapsulation The Network Virtualization using Generic Routing Encapsulation (NVGRE) is part of the tunnel header. This mode is intended for the majority of data centers that deploy Hyper-V Network Virtualization. In NVGRE, the virtual machine's packet is encapsulated inside another packet. The header of this new packet has the appropriate source and destination PA IP addresses in addition to the virtual subnet ID, which is stored in the Key field of the Generic Routing Encapsulation (GRE) header.
  • IP Rewrite In this mode, the source and the destination CA IP addresses are rewritten with the corresponding PA addresses as the packets leave the end host. Similarly, when virtual subnet packets enter the end host, the PA IP addresses are rewritten with appropriate CA addresses before being delivered to the virtual machines. IP Rewrite is targeted for special scenarios where the virtual machine workloads require or consume very high bandwidth throughput (~10 Gbps) on existing hardware, and the customer cannot wait for NVGRE-aware hardware.

Hyper-V Virtual Switch

The Hyper-V virtual switch is a software-based, layer-2 network switch that is available in Hyper-V Manager when you install the Hyper-V role. The switch includes programmatically managed and extensible capabilities to connect virtual machines to virtual networks and to the physical network. In addition, Hyper-V virtual switch provides policy enforcement for security, isolation, and service levels.

The Hyper-V virtual switch in Windows Server introduces several features and enhanced capabilities for tenant isolation, traffic shaping, protection against malicious virtual machines, and simplified troubleshooting.

With built-in support for Network Device Interface Specification (NDIS) filter drivers and Windows Filtering Platform (WFP) callout drivers, the Hyper-V virtual switch enables independent software vendors (ISVs) to create extensible plug-ins (known as virtual switch extensions) that can provide enhanced networking and security capabilities. Virtual switch extensions that you add to the Hyper-V virtual switch are listed in the Virtual Switch Manager feature of Hyper-V Manager.

The capabilities provided in the Hyper-V virtual switch mean that organizations have more options to enforce tenant isolation, to shape and control network traffic, and to employ protective measures against malicious virtual machines.

Some of the principal features that are included in the Hyper-V virtual switch are:

  • ARP and Neighbor Discovery spoofing protection: Uses Address Resolution Protocol (ARP) to provide protection against a malicious virtual machine that uses spoofing to steal IP addresses from other virtual machines. Also uses Neighbor Discovery spoofing to provide protection against attacks that can be launched through IPv6. For more information, see Neighbor Discovery.
  • DHCP guard protection: Protects against a malicious virtual machine representing itself as a Dynamic Host Configuration Protocol (DHCP) server for man-in-the-middle attacks.
  • Port ACLs: Provides traffic filtering, based on Media Access Control (MAC) or Internet Protocol (IP) addresses and ranges, which enables you to set up virtual network isolation.
  • Trunk mode to virtual machines: Enables administrators to set up a specific virtual machine as a virtual appliance, and then direct traffic from various VLANs to that virtual machine.
  • Network traffic monitoring: Enables administrators to review traffic that is traversing the network switch.
  • Isolated (private) VLAN: Enables administrators to segregate traffic on multiple VLANs, to more easily establish isolated tenant communities.

The features in this list can be combined to deliver a complete multi-tenant network design.

Example Network Design

In Hyper-V Network Virtualization, a customer is defined as the owner of a group of virtual machines that are deployed in a data center. A customer can be a corporation or enterprise in a multi-tenant public data center, or a division or business unit within a private data center. Each customer can have one or more customer networks in the data center, and each customer network consists of one or more customer networks with virtual subnets.

Customer network

  • Each customer network consists of one or more virtual subnets. A customer network forms an isolation boundary where the virtual machines within a customer network can communicate with each other. As a result, virtual subnets in the same customer network must not use overlapping IP address prefixes.
  • Each customer network has a routing domain, which identifies the customer network. The routing domain ID is assigned by data center administrators or data center management software, such as System Center Virtual Machine Manager (VMM). The routing domain ID has a GUID format, for example, "{11111111-2222-3333-4444-000000000000}".

Virtual subnets

  • A virtual subnet implements the Layer 3 IP subnet semantics for virtual machines in the same virtual subnet. The virtual subnet is a broadcast domain (similar to a VLAN). Virtual machines in the same virtual subnet must use the same IP prefix, although a single virtual subnet can accommodate an IPv4 and an IPv6 prefix simultaneously.
  • Each virtual subnet belongs to a single customer network (with a routing domain ID), and it is assigned a unique virtual subnet ID (VSID). The VSID is universally unique and may be in the range 4096 to 2^24-2).

A key advantage of the customer network and routing domain is that it allows customers to bring their network topologies to the cloud. The following diagram shows an example where the Blue Corp has two separate networks, the R&D Net and the Sales Net. Because these networks have different routing domain IDs, they cannot interact with each other. That is, Blue R&D Net is isolated from Blue Sales Net, even though both are owned by Blue Corp. Blue R&D Net contains three virtual subnets. Note that the routing domain ID and VSID are unique within a data center.

Figure 54 Microsoft Azure poster

In this example, the virtual machines with VSID 5001 can have their packets routed or forwarded by Hyper-V Network Virtualization to virtual machines with VSID 5002 or VSID 5003. Before delivering the packet to the virtual switch, Hyper-V Network Virtualization will update the VSID of the incoming packet to the VSID of the destination virtual machine. This will only happen if both VSIDs are in the same routing domain ID. If the VSID that is associated with the packet does not match the VSID of the destination virtual machine, the packet will be dropped. Therefore, virtual network adapters with RDID 1 cannot send packets to virtual network adapters with RDID 2.

Each virtual subnet defines a Layer 3 IP subnet and a Layer 2 broadcast domain boundary similar to a VLAN. When a virtual machine broadcasts a packet, this broadcast is limited to the virtual machines that are attached to switch ports with the same VSID. Each VSID can be associated with a multicast address in the provider address. All broadcast traffic for a VSID is sent on this multicast address.

In addition to being a broadcast domain, the VSID provides isolation. A virtual network adapter in Hyper-V Network Virtualization is connected to a Hyper-V switch port that has a VSID ACL. If a packet arrives on this Hyper-V virtual switch port with a different VSID, the packet is dropped. Packets will only be delivered on a Hyper-V virtual switch port if the VSID of the packet matches the VSID of the virtual switch port. This is the reason that packets flowing from VSID 5001 to 5003 must modify the VSID in the packet before delivery to the destination virtual machine.

If the Hyper-V virtual switch port does not have a VSID ACL, the virtual network adapter that is attached to that virtual switch port is not part of a Hyper-V Network Virtualization virtual subnet. Packets that are sent from a virtual network adapter that does not have a VSID ACL will pass unmodified through the Hyper-V Network Virtualization.

When a virtual machine sends a packet, the VSID of the Hyper-V virtual switch port is associated with this packet in the out-of-band (OOB) data. If Generic Routing Encapsulation (GRE) is the IP virtualization mechanism, the GRE Key field of the encapsulated packet contains the VSID.

On the receiving side, Hyper-V Network Virtualization delivers the VSID in the OOB data and the decapsulated packet to the Hyper-V virtual switch. If IP Rewrite is the IP virtualization mechanism, and the packet is destined for a different physical host, the IP addresses are changed from CA addresses to PA addresses, and the VSID in the OOB data is dropped. Hyper-V Network Virtualization verifies a policy and adds the VSID to the OOB data before the packet is passed to the Hyper-V virtual switch.

Multi-Tenant Compute Considerations

Similar to the storage and network layers, the compute layer of the fabric can be dedicated per tenant or shared across multiple tenants. That decision greatly impacts the design of the compute layer. Two primary decisions are required to begin the design process:

  • Will the compute layer be shared between multiple tenants?
  • Will the compute infrastructure provide high availability by using failover clustering?

This leads to four high-level design options:

  • Dedicated stand-alone server running Hyper-V
  • Shared stand-alone server running Hyper-V
  • Dedicated Hyper-V failover clusters
  • Shared Hyper-V failover clusters

A live migration without shared storage (also known as "shared nothing" live migration) in Windows Server enables stand-alone servers running Hyper-V to be a viable option when the running virtual machines do not require high availability. A live migration without shared storage also enables virtual machines to be moved from any Hyper-V host running Windows Server to another host, with nothing required but a network connection (it does not require shared storage).

For hosts that are delivering a stateless application and web hosting services, this may be an option. A live migration without shared storage enables the host to move virtual machines and evacuate hosts for patching without causing downtime to the running virtual machines. However, stand-alone hosts do not provide virtual machine high availability, so if the host fails, the virtual machines are not automatically started on another host.

The decision of using a dedicated vs. a shared Hyper-V host is primarily driven by the compliance or business model requirements discussed previously.

Hyper-V Role

The Hyper-V role enables you to create and manage a virtualized computing environment by using the virtualization technology that is built in to Windows Server. Installing the Hyper-V role installs the required components and optionally installs management tools. The required components include the Windows hypervisor, Hyper-V Virtual Machine Management Service, the virtualization WMI provider, and other virtualization components such as the virtual machine bus (VMBus), virtualization service provider, and virtual infrastructure driver.

The management tools for the Hyper-V role consist of:

  • GUI-based management tools: Hyper-V Manager, a Microsoft Management Console (MMC) snap-in, and Virtual Machine Connection (which provides access to the video output of a virtual machine so you can interact with the virtual machine).
  • Hyper-V-specific cmdlets for Windows PowerShell. Windows Server and Windows Server 2012 include the Hyper-V module for Windows PowerShell, which provides command-line access to all the functionality that is available in the GUI, in addition to functionality that is not available through the GUI.

The scalability and availability improvements in Hyper-V allow for significantly larger clusters and greater consolidation ratios, which are key to the cost of ownership for enterprises and hosts. Hyper-V in Windows Server supports significantly larger configurations of virtual and physical components than previous releases of Hyper-V. This increased capacity enables you to run Hyper-V on large physical computers and to virtualize high-performance, scaled-up workloads.

Hyper-V provides a multitude of options for segmentation and isolation of virtual machines that are running on the same server. This is critical for shared servers running Hyper-V and cluster scenarios where multiple tenants will host their virtual machines on the same servers. By design, Hyper-V ensures isolation of memory, VMBus, and other system and hypervisor constructs between all virtual machines on a host.

Failover Clustering

Failover Clustering in Windows Server supports increased scalability, continuously available file-based server application storage, easier management, faster failover, automatic rebalancing, and more flexible architectures for failover clusters.

For the purposes of a multi-tenant design, clusters that are managed by Hyper-V can be used in conjunction with the aforementioned clusters that are managed by Scale-Out File Server for an end-to-end Microsoft solution for storage, network, and compute architectures.

Resource Metering

Service providers and enterprises that deploy private clouds need tools to charge business units that they support while providing the business units with the appropriate resources to match their needs. For hosting service providers, it is equally important to issue chargebacks based on the amount of usage by each customer.

To implement advanced billing strategies that measure the assigned capacity of a resource and its actual usage, earlier versions of Hyper-V required users to develop their own chargeback solutions that polled and aggregated performance counters. These solutions could be expensive to develop and sometimes led to loss of historical data.

To assist with more accurate, streamlined chargebacks while protecting historical information, Hyper-V in Windows Server and Windows Server 2012 provides Resource Metering, which is a feature that allows customers to create cost-effective, usage-based billing solutions. With this feature, service providers can choose the best billing strategy for their business model, and independent software vendors can develop more reliable, end-to-end chargeback solutions by using Hyper-V.

Management

This guide deals only with fabric architecture solutions, and not the more comprehensive topic of fabric management by using System Center. However, there are significant management and automation options for multi-server, multi-tenant environments that are enabled by Windows Server technologies.

Windows Server provides management efficiency with broader automation for common management tasks. For example, Server Manager in Windows Server enables multiple servers on the network to be managed effectively from a single computer.

With the Windows PowerShell 3.0 command-line interface, Windows Server provides a platform for robust, multi-computer automation for all elements of a data center, including servers, Windows operating systems, storage, and networking. It also provides centralized administration and management capabilities such as deploying roles and features remotely to physical and virtual servers, and deploying roles and features to virtual hard disks, even when they are offline.