Azure Storage

Microsoft Azure Storage is a Microsoft-managed service that provides durable, scalable, and redundant storage. Microsoft takes care of maintenance and handles critical problems for you. An Azure subscription can host up to 100 storage accounts, each of which can hold 500 TB. If you have a business case, you can talk to the Azure Storage team and get approval for up to 250 storage accounts in a subscription.

Azure Storage consists of four data services: Blob storage, File storage, Table storage, and Queue storage. Blob storage supports both standard and premium storage, with premium storage using only SSDs for the fastest performance possible. Another new feature added in 2016 is cool storage, allowing you to store large amounts of rarely accessed data for a lower cost.

In this chapter, we look at the four Azure Storage services. We talk about each one, discuss what they are used for, and show how to create storage accounts and manage the data objects. We'll also touch briefly on securing your applications' use of Azure Storage.

Storage accounts

This reference table shows the various kinds of storage accounts and what objects are used with each.

Type of storage account

General-purpose Standard storage account

General-purpose Premium storage account

Blob storage account, hot and cool access tiers

Services supported

Blob, File, Table, Queue Services

Blob service

Blob service

Types of blobs supported

Block blobs, page blobs, append blobs

Page blobs

Block blobs and append blobs

You can view your data objects using one of a number of storage explorers, each of which has different capabilities. For your convenience, Microsoft has a page listing several of these, including its own: https://azure.microsoft.com/documentation/articles/storage-explorers/.

While you can view and update some data in the Azure portal, the customer experience is not complete. For example, you cannot upload blobs or add and view messages in a queue. In this chapter, we use the Azure portal, Visual Studio Cloud Explorer, and PowerShell to access the data.

Note After this chapter was completed, the Microsoft Azure Storage Explorer team released a new version that supports all four types of storage objects—blobs, files, tables, and queues. This is a free multi-platform tool that you can download from here: http://storageexplorer.com/

General-purpose storage accounts

There are two kinds of general-purpose storage accounts.

Standard storage

The most widely used storage accounts are Standard storage accounts, which can be used for all four types of data—blobs, files, tables, and queues. Standard storage accounts use magnetic media to store data.

Premium storage

Premium storage provides high-performance storage for page blobs and specifically virtual hard disks (VHDs). Premium storage accounts use SSD to store data. Microsoft recommends using Premium storage for all of your virtual machines (VMs).

Blob storage accounts

The Blob storage account is a specialized storage account used to store block blobs and append blobs. You can't store page blobs in these account;, therefore, you can't store VHD files. These accounts allow you to set an access tier to Hot or Cool; the tier can be changed at any time.

The hot access tier is used for files that are accessed frequently. For blobs stored in the hot access tier, you pay a higher cost for storing the blobs, but the cost for accessing the blobs is much lower.

The cool access tier is used for files that are accessed infrequently. For blobs stored in the cool access tier, you pay a higher cost for accessing the blobs, but the cost of storage is much lower.

Storage services

Azure Storage supports four kinds of objects that can be stored—blobs, files (on a file share), tables, and queues. Let's take a closer look at each one of these.

Blob storage

The word blob is an acronym for binary large object. Blobs are basically files like those that you store on your computer (or tablet, mobile device, etc.). They can be pictures, Microsoft Excel files, HTML files, virtual hard disks (VHDs)—pretty much anything.

The Azure Blob service gives you the ability to store files and access them from anywhere in the world by using URLs, the REST interface, or one of the Azure SDK storage client libraries. Storage client libraries are available for multiple languages, including .NET, Node.js, Java, PHP, Ruby, and Python. To use the Blob service, you have to create a storage account. Once you have a storage account, you can create containers, which are similar to folders, and then put blobs in the containers. You can have an unlimited number of containers in a storage account and an unlimited number of blobs in each container, up to the maximum size of a storage account, which is 500 TB. The Blob service supports only a single-level hierarchy of containers; in other words, containers cannot contain other containers.

Azure Storage supports three kinds of blobs: block blobs, page blobs, and append blobs.

  • Block blobs are used to hold ordinary files up to 195 GB in size (4 MB × 50,000 blocks). The primary use case for block blobs is the storage of files that are read from beginning to end, such as media files or image files for websites. They are named block blobs because files larger than 64 MB must be uploaded as small blocks, which are then consolidated (or committed) into the final blob.
  • Page blobs are used to hold random-access files up to 1 TB in size. Page blobs are used primarily as the backing storage for the VHDs used to provide durable disks for Azure Virtual Machines (Azure VMs), the IaaS feature in Azure Compute. They are named page blobs because they provide random read/write access to 512-byte pages.
  • Append blobs are made up of blocks like block blobs, but they are optimized for append operations. These are frequently used for logging information from one or more sources into the same blob. For example, you might write all of your trace logging to the same append blob for an application running on multiple VMs. A single append blob can be up to 195 GB.

Blobs are addressable through a URL, which has the following format:

https://[storage account name]/blob.core.windows.net/[container]/[blob name]

The Blob service supports only a single physical level of containers. However, it supports the simulation of a file system with folders within the containers by allowing blob names to contain the '/' character. The client APIs provide support to traverse this simulated file system. For example, if you have a container called animals and you want to group the animals within the container, you could add blobs named cats/tuxedo.png, cats/marmalade.png, and so on. The URL would include the entire blob name including the "subfolder," and it would end up looking like this:

https://mystorage.blob.core.windows.net/animals/cats/tuxedo.png

https://mystorage.blob.core.windows.net/animals/cats/marmalade.png

When looking at the list of blobs using a storage explorer tool, you can see either a hierarchical directory tree or a flat listing. The directory tree would show cats as a subfolder under animals and would show the .png files in the subfolder. The flat listing would list the blobs with the original names, cats/tuxedo.png and cats/marmalade.png.

You also can assign a custom domain to the storage account, which changes the root of the URL, so you could have something like this:

http://[storage.companyname.com]/[container]/[blobname]

This eliminates cross-domain issues when accessing files in blob storage from a website because you could use the company domain for both. Blob storage also supports Cross-Origin Resource Sharing (CORS) to help with this type of cross-source usage.

Note

At this time, Microsoft does not support using a custom domain name with HTTPS.

File storage

The Azure Files service enables you to set up highly available network file shares that can be accessed by using the standard Server Message Block (SMB) protocol. This means that multiple VMs can share the same files with both read and write access. The files can also be accessed using the REST interface or the storage client libraries. The Files service removes the need for you to host your own file shares in an Azure VM and go through the tricky configuration required to make it highly available.

One thing that's really special about Azure file shares versus file shares on-premises is that you can access the file from anywhere by using a URL that points to the file (similar to the blob storage URL displayed above). To do this, you have to append a shared access signature (SAS). We'll talk more about shared access signatures in the section on Security.

File shares can be used for many common scenarios:

  • Many on-premises applications use file shares; this makes it easier to migrate those applications that share data to Azure. If you mount the file share to the same drive letter that the on-premises application uses, the part of your application that accesses the file share should work without any changes.
  • Configuration files can be stored on a file share and accessed by multiple VMs.
  • Diagnostic logs, metrics, crash dumps, etc. can be saved to a file share to be processed and analyzed later.
  • Tools and utilities used by multiple developers in a group can be stored on a file share to ensure that everyone uses the same version and that they are available to everyone in the group.

To make the share visible to a VM, you just mount it as you would any other file share, and then you can access it through the network URL or the drive letter to which it was assigned. The network URL has the format \\[storage account name].file.core.windows.net\[share name]. After the share is mounted, you can access it using the standard file system APIs to add, change, delete, and read the directories and files.

To create or view a file share or upload or download files to it from outside Azure, you can use the Azure portal, PowerShell, the Azure Command-Line Interface (CLI), the REST APIs, one of the storage client libraries, or AzCopy, a command-line tool provided by Microsoft. (For more information on AzCopy, check out this link: http://azure.microsoft.com/documentation/articles/storage-use-azcopy/.) There are also several storage explorers you can use, as noted at the beginning of this article.

Here are some of the points about Azure Files that you need to know:

  • When using SMB 2.1, the share is available only to VMs within the same region as the storage account. This is because SMB 2.1 does not support encryption.
  • When using SMB 3.0, the share can be mounted on VMs in different regions, or even the desktop.

Note that to mount an Azure file share on the desktop, port 445 (SMB) must be open, so you may need to negotiate that with your company. Many ISPs and corporate IT departments block this port. This TechNet wiki shows a list of ISPs reported by Microsoft customers as allowing or disallowing port 445 traffic: http://social.technet.microsoft.com/wiki/contents/articles/32346.azure-summary-of-isps-thatallow-disallow-access-from-port-445.aspx

  • If using a Linux VM, you can only mount shares available within the same region as the storage account. This is because while the Linux SMB client supports SMB 3.0, it does not currently support encryption. The Linux developers responsible for SMB functionality have agreed to implement this, but there is no known time frame.
  • If using a Mac, you can't mount Azure File shares because Apple's Mac OS doesn't support encryption on SMB 3.0. Apple has agreed to implement this, but there is no known time frame.
  • You can access the data from anywhere by using the REST APIs (rather than SMB).
  • The storage emulator does not support Azure Files.
  • The file shares can be up to 5 TB.
  • Throughput is up to 60 MB/s per share.
  • The size limit of the files placed on the share is 1 TB.
  • There are up to 1,000 IOPS (of size 8 KB) per share.
  • Active Directory–based authentication and access control lists (ACLs) are not currently supported, but it is expected that they will be supported at some time in the future. For now, the Azure Storage account credentials are used to provide authentication for access to the file share. This means anybody with the share mounted will have full read/write access to the share.
  • For files that are accessed repeatedly, you can maximize performance by splitting a set of files among multiple shares.

Table storage

Azure Table storage is a scalable NoSQL data store that enables you to store large volumes of semistructured, nonrelational data. It does not allow you to do complex joins, use foreign keys, or execute stored procedures. Each table has a single clustered index that can be used to query the data quickly. You also can access the data by using LINQ queries and Odata with the WCF Data Service .NET libraries. A common use of table storage is for diagnostics logging.

To use table storage, you have to create a storage account. Once you have a storage account, you can create tables and fill them with data.

A table stores entities (rows), each of which contains a set of key/value pairs. Each entity has three system properties: a partition key, a row key, and a timestamp. The partition key and row key combination must be unique; together they make up the primary key for the table. The PartitionKey property is used to shard (partition) the entities across different storage nodes, allowing for load balancing across storage nodes. All entities with the same PartitionKey are stored on the same storage node. The RowKey is used to provide uniqueness within a given partition.

To get the best performance, you should give a lot of thought to the PrimaryKey and RowKey and how you need to retrieve the data. You don't want all of your data to be in the same partition; nor do you want each entity to be in its own partition.

The Azure Table service provides scalability targets for both storage account and partitions. The Timestamp property is maintained by Azure, and it represents the date and time the entity was last modified. Azure Table service uses this to support optimistic concurrency with Etags.

In addition to the system properties, each entity has a collection of key/value pairs called properties. There is no schema, so the key/value pairs of each entity can contain values of different properties. For example, you could be doing logging, and one entity could contain a payload of {customer id, customer name, request date/time, request} and the next could have {customer id, order id, item count, date-time order filled}. You can store up to 252 key/value pairs in each table entity.

The number of tables is unlimited, up to the size limit of a storage account.

Tables can be managed by using the storage client library. The Table service also supports a REST API that implements the Odata protocol; tables are addressable with the Odata protocol using a URL in the following format: http://[storage account name]/table.core.windows.net/[table name]

Queue storage

The Azure Queue service is used to store and retrieve messages. Queue messages can be up to 64 KB in size, and a queue can contain millions of messages—up to the maximum size of a storage account. Queues generally are used to create a list of messages to be processed asynchronously. The Queue service supports best-effort first in, first out (FIFO) queues.

For example, you might have a background process (such as a worker role or Azure WebJob) that continuously checks for messages on a queue. When it finds a message, it processes the message and then removes it from the queue. One of the most common examples is image or video processing.

Let's say you have a web application that allows a customer to upload images into a container in blob storage. Your application needs to create thumbnails for each image. Rather than making the customer wait while this processing is done, you put a message on a queue with the customer ID and container name. Then, you have a background process that retrieves the message and parses it to get the customer ID and the container name. The background process then retrieves each image, creates a thumbnail, and writes the thumbnail back to the same blob storage container as the original image. After all images are processed, the background process removes the message from the queue.

What if you need the message to exceed 64 KB in size? In that case, you could write a file with the information to a blob in blob storage and put the URL to the file in the queue message. The background process could retrieve the message from the queue and then take the URL and read the file from blob storage to do the required processing.

Azure Queues provide at-least-once semantics in which each message may be read one or more times. This makes it important that all processing of the message be idempotent, which means the outcome of the processing must be the same regardless of how many times the message is processed.

When you retrieve a message from a queue, it is not deleted from the queue—you have to delete it when you're done with it. When the message is read from the queue, it becomes invisible. The Invisibility Timeout is the amount of time to allow for processing the message—if the message is not deleted from the queue within this amount of time, it becomes visible again for processing. In general, you want to set this property to the largest amount of time that would be needed to process a message so that while one instance of a worker role is processing it, another instance doesn't find it (visible) on the queue and try to process it at the same time.

You don't want to read the message from the queue, delete it from the queue, and then start processing it. If the receiver fails, that queue entry will never be processed. Leaving the message on the queue (but invisible) until the processing has completed handles the case of the receiving process failing—eventually, the message will become visible again and will be processed by another instance of the receiver.

You can simulate a workflow by using a different queue for each step. A message can be processed from one queue from which it is deleted on completion, and then that processing can place a new message on a different queue to initiate processing for the next step in the workflow. You can also prioritize messages by using queues and processing the messages in them with different priorities.

The Queue service provides poison message support through the dequeue count. The concern is that an invalid message could cause an application handling it to crash, causing the message to become visible on the queue again only to crash the application again the next time the message is processed. Such a message is referred to as a poison message. You can prevent this by checking the dequeue count for the message. If this exceeds some level, the processing of the message should be stopped, the message deleted from the queue, and a copy inserted in a separate poison message queue for offline review. You could process those entries periodically and send an email when an entry is placed on the queue, or you could just let them accumulate and check them manually.

If you want to process the queue messages in batches, you can retrieve up to 32 messages in one call and then process them individually. Note, however, that when you retrieve a batch of messages, it sets the Invisibility Timeout for all of the messages to the same time. This means you must be able to process all of them within the time allotted.

Redundancy

What happens if the storage node on which your blobs are stored fails? What happens if the rack holding the storage node fails? Fortunately, Azure supports something called redundancy. There are four choices for redundancy; you specify which one to use when you create the storage account. You can change the redundancy settings after they are set up, except in the case of zone redundant storage.

Locally Redundant Storage (LRS).

Azure Storage provides high availability by ensuring that three copies of all data are made synchronously before a write is deemed successful. These copies are stored in a single facility in a single region. The replicas reside in separate fault domains and upgrade domains. This means the data is available even if a storage node holding your data fails or is taken offline to be updated. When you make a request to update storage, Azure sends the request to all three replicas and waits for successful responses for all of them before responding to you. This means that the copies in the primary region are always in sync.

LRS is less expensive than GRS, and it also offers higher throughput. If your application stores data that can be easily reconstructed, you may opt for LRS.

Geo-Redundant Storage (GRS).

GRS makes three synchronous copies of the data in the primary region for high availability, and then it asynchronously makes three replicas in a paired region for disaster recovery. Each Azure region has a defined paired region within the same geopolitical boundary for GRS. For example, West US is paired with East US. This has a small impact on scalability targets for the storage account. The GRS copies in the paired region are not accessible to you, and GRS is best viewed as disaster recovery for Microsoft rather than for you. In the event of a major failure in the primary region, Microsoft would make the GRS replicas available, but this has never happened to date.

Read-Access Geo-Redundant Storage (RA-GRS).

This is GRS plus the ability to read the data in the secondary region, which makes it suitable for partial customer disaster recovery. If there is a problem with the primary region, you can change your application to have read-only access to the paired region. The storage client library supports a fallback mechanism via Microsoft.WindowsAzure.Storage.RetryPolicies.LocationMode to try to read from the secondary copy if the primary copy can't be reached. This feature is built in for you. Your customers might not be able to perform updates, but at least the data is still available for viewing, reporting, etc.

You also can use this if you have an application in which only a few users can write to the data but many people read the data. You can point your application that writes the data to the primary region but have the people only reading the data access the paired region. This is a good way to spread out the performance when accessing a storage account.

Zone-Redundant Storage (ZRS).

This option can only be used for block blobs in a standard storage account. It replicates your data across two to three facilities, either within a single region or across two regions. This provides higher durability than LRS, but ZRS accounts do not have metrics or logging capability.

Security and Azure Storage

Azure Storage provides a set of security features that help developers build secure applications. You can secure your storage account by using Role-Based Access Control (RBAC) and Microsoft Azure Active Directory (Azure AD). You can use client-side encryption, HTTPS, or SMB 3.0 to secure your data in transit. You can enable Storage Service Encryption, and the Azure Storage service will encrypt data written to the storage account. OS and Data disks for VMs now have Azure Disk Encryption that can be enabled. And secure access to the data plane objects (such as blobs) can be granted using a shared access signature (SAS). Let's talk a little more about each of these.

For more detail and guidance about any of these security features, please check out the Azure Storage Security Guide at https://azure.microsoft.com/documentation/articles/storage-security-guide/.

Securing your storage account

The first thing to think about is securing your storage account.

Storage account keys

Each storage account has two authentication keys—a primary and a secondary—either of which can be used for any operation. There are two keys to allow occasional rollover of the keys to enhance security. It is critical that these keys be kept secure because their possession, along with the account name, allows unlimited access to any data in the storage account.

Say you're using key 1 for your storage account in multiple applications. You can regenerate key 2 and then change all the applications to use key 2, test them, and deploy them to production. Then, you can regenerate key 1, which removes access from anybody who is still using it. A good example of when you might want to do this is if your team uses a storage explorer that retains the storage account keys, and someone leaves the team or the company—you don't want them to have access to your data after they leaves. This can happen without a lot of notice, so you should have a procedure in place to know all the apps that need to change, and then practice rotating keys on a regular basis so that it's simple and not a big problem when it is necessary to rotate the keys in a hurry.

Using RBAC, Azure AD, and Azure Key Vault to control access to Resource Manager storage accounts

RBAC and Azure AD

With Resource Manager RBAC, you can assign roles to users, groups, or applications. The roles are tied to a specific set of actions that are allowed or disallowed. Using RBAC to grant access to a storage account only handles the management operations for that storage account. You can't use RBAC to grant access to objects in the data plane like a specific container or file share. You can, however, use RBAC to grant access to the storage account keys, which can then be used to read the data objects.

For example, you might grant someone the Owner role to the storage account. This means they can access the keys and thus the data objects, and they can create storage accounts and do pretty much anything.

You might grant someone else the Reader role. This allows them to read information about the storage account. They can read resource groups and resources, but they can't access the storage account keys and therefore can't access the data objects.

If someone is going to create VMs, you must grant them the Virtual Machine Contributor role, which grants them access to retrieve the storage account keys but not to create storage accounts. They need the keys to create the VHD files that are used for the VM disks.

Azure Key Vault

Azure Key Vault helps safeguard cryptographic keys and secrets used by Azure applications and services. You could store your storage account keys in an Azure Key Vault. What does this do for you? While you can't control access to the data objects directly using Active Directory, you can control access to an Azure Key Vault using Active Directory. This means you can put your storage account keys in Azure Key Vault and then grant access to them for a specific user, group, or application.

Let's say you have an application running as a Web App that uploads files to a storage account. You want to be really sure nobody else can access those files. You add the application to Azure Active Directory and grant it access to the Azure Key Vault with that storage account's keys in it. After that, only that application can access those keys. This is much more secure than putting the keys in the web.config file where a hacker could get to them.

Securing access to your data

There are two ways to secure access to your data objects. We just talked about the first one—by controlling access to the storage account keys.

The second way to secure access is by using shared access signatures and stored access policies. A shared access signature (SAS) is a string containing a security token that can be attached to the URI for an asset that allows you to delegate access to specific storage objects and to specify constraints such as permissions and the date/time range of access.

You can grant access to blobs, containers, queue messages, files, and tables. With tables, you can grant access to specific partition keys. For example, if you were using geographical state for your partition key, you could give someone access to just the data for California.

You can fine-tune this by using a separation of concerns. You can give a web application permission to write messages to a queue, but not to read them or delete them. Then, you can give the worker role or Azure WebJob the permission to read the messages, process the messages, and delete the messages. Each component has the least amount of security required to do its job.

Here's an example of an SAS, with each parameter explained: http://mystorage.blob.core.windows.net/mycontainer/myblob.txt (URL to the blob)

?sv=2015-04-05 (storage service version) 
&st=2015-12-10T22%3A18%3A26Z (start time, in UTC time and URL encoded) 
&se=2015-12-10T22%3A23%3A26Z (end time, in UTC time and URL encoded) 
&sr=b (resource is a blob) 
&sp=r (read access) 
&sip=168.1.5.60-168.1.5.70 (requests can only come from this range of IP addresses) 
&spr=https (only allow HTTPS requests) 
&sig=Z%2FRHIX5Xcg0Mq2rqI3OlWTjEg2tYkboXr1P9ZUXDtkk%3D (signature used for the authentication of the SAS) 

Note that the SAS query parameters must be URL encoded, such as %3A for colon (:) and %20 for a space. This SAS gives read access to a blob from 12/10/2015 10:18 PM to 12/10/2015 10:23 PM.

When the storage service receives this request, it will take the query parameters and create the &sig value on its own and compare it to the one provided here. If they agree, it will verify the rest of the request. If our URL pointed to a file on a file share instead of a blob, the request would fail because blob is specified. If the request were to update the blob, it would fail because only read access has been granted.

There are both account-level SAS and service-level SAS. With account-level SAS, you can do things like list containers, create containers, delete file shares, and so on. With service-level SAS, you can only access the data objects. For example, you can upload a blob into a container.

You can also create stored access policies on container-like objects such as blob containers and file shares. This will let you set the default values for the query parameters, and then you can create the SAS by specifying the policy and the query parameter that is different for each request. For example, you might set up a policy that gives read access to a specific container. Then, when someone requests access to that container, you create an SAS from the policy and use it.

There are two advantages to using stored access policies. First, this hides the parameters that are defined in the policy. So if you set your policy to give access to 30 minutes, it won't show that in the URL—it just shows the policy name. This is more secure than letting all of your parameters be seen.

The second reason to use stored access policies is that they can be revoked. You can either change the expiration date to be prior to the current date/time or remove the policy altogether. You might do this if you accidentally provided access to an object you didn't mean to. With an ad hoc SAS URL, you have to remove the asset or change the storage account keys to revoke access.

Shared access signatures and stored access policies are the two most secure ways to provide access to your data objects.

Securing your data in transit

Another consideration when storing your data in Azure Storage is securing the data when it is being transferred between the storage service and your applications.

First, you should always use the HTTPS protocol, which ensures secure communication over the public Internet. Note that if you are using SAS, there is a query parameter that can be used that specifies that only the HTTPS protocol can be used with that URL.

For Azure File shares, SMB 3.0 running on Windows encrypts the data going across the public Internet. When Apple and Linux add security support to SMB 3.0, you will be able to mount file shares on those machines and have encrypted data in transit.

Last, you can use the client-side encryption feature of the .NET and Java storage client libraries to encrypt your data before sending it across the wire. When you retrieve the data, you can then unencrypt it. This is built in to the storage client libraries for .NET and Java. This also counts as encryption at rest because the data is encrypted when stored.

Encryption at rest

Let's look at the various options available to encrypt the stored data.

Storage Service Encryption (SSE)

This is a new feature currently in preview. This lets you ask the storage service to encrypt blob data when writing it to Azure Storage. This feature has been requested by many companies to fulfill security and compliance requirements. It enables you to secure your data without having to add any code to any of your applications. Note that it only works for blob storage; tables, queues, and files will be unaffected.

This feature is per-storage account, and it can be enabled and disabled using the Azure portal, PowerShell, the CLI, the Azure Storage Resource Provider REST API, or the .NET storage client library. The keys are generated and managed by Microsoft at this time, but in the future you will get the ability to manage your own encryption keys.

This can be used with both Standard and Premium storage, but only with the new Resource Manager accounts. During the preview, you have to create a new storage account to try out this feature.

One thing to note: after being enabled, the service encrypts data written to the storage account. Any data already written to the account is not encrypted. If you later disable the encryption, any future data will not be encrypted, but it does retain encryption on the data written while encryption was enabled.

If you create a VM using an image from the Azure Marketplace, Azure performs a shallow copy of the image to your storage account in Azure Storage, and it is not encrypted even if you have SSE enabled. After it creates the VM and starts updating the image, SSE will start encrypting the data. For this reason, Microsoft recommends that you use Azure Disk Encryption on VMs created from images in the Azure Marketplace if you want them fully encrypted.

Azure Disk Encryption

This is another new feature that is currently in preview. This feature allows you to specify that the OS and data disks used by an IaaS VM should be encrypted. For Windows, the drives are encrypted with industry-standard BitLocker encryption technology. For Linux, encryption is performed using DMCrypt.

Note For Linux VMs already running in Azure or new Linux VMs created from images in the Azure Marketplace, encryption of the OS disk is not currently supported. Encryption of the OS volume for Linux VMs is supported only for VMs that were encrypted on-premises and uploaded to Azure. This restriction only applies to the OS disk; encryption of data volumes for a Linux VM is supported.

Azure Disk Encryption is integrated with Azure Key Vault to allow you to control and manage the disk encryption keys.

Unlike SSE, when you enable this, it encrypts the whole disk, including data that was previously written. You can bring your own encrypted images into Azure and upload them and store the keys in Azure Key Vault, and the image will continue to be encrypted. You can also upload an image that is not encrypted or create a VM from the Azure Gallery and ask that its disks be encrypted.

This is the method recommended by Microsoft to encrypt your IaaS VMs at rest. Note that if you turn on both SSE and Azure Disk Encryption, it will work fine. Your data will simply be double-encrypted.

Client-side encryption

We looked at client-side encryption when discussing encryption in transit. The data is encrypted by the application and sent across the wire to be stored in the storage account. When retrieved, the data is decrypted by the application. Because the data is stored encrypted, this is encryption at rest.

For this encryption, you can encrypt the data in blobs, tables, and queues, rather than just blobs like

SSE. Also, you can bring your own keys or use keys generated by Microsoft. If you store your encryption keys in Azure Key Vault, you can use Azure Active Directory to specifically grant access to the keys. This allows you to control who can read the vault and retrieve the keys being used for clientside encryption.

This is the most secure method of encrypting your data, but it does require that you add code to perform the encryption and decryption. If you only have blobs that need to be encrypted, you may choose to use a combination of HTTPS and SSE to meet the requirement that your data be encrypted at rest.

Using Storage Analytics to audit access

You may want to see how people are accessing your storage account. Do all the requests use an SAS? How many people are accessing the storage account using the actual storage account keys?

To check this, you can enable the logging in the Azure Storage Analytics and check the results after a while. Enabling the logging tells the Azure Storage service to log all requests to the storage account. (Note that at this time, only blobs, tables, and queues are supported.)

The logs are stored in a container called $logs in blob storage. They are stored by date and time, collected by hour. If there is no activity, no logs are generated.

Here are the fields that are stored in the logs.

<version-number>;<request-start-time>;<operation-type>;<request-status>;<http-statuscode>;<end-to-end-latency-in-ms>;<server-latency-in-ms>;<authentication-type>;<requesteraccount-name>;<owner-account-name>;<service-type>;<request-url>;<requested-objectkey>;<request-id-header>;<operation-count>;<requester-ip-address>;<request-versionheader>;<request-header-size>;<request-packet-size>;<response-header-size>;<response-packetsize>;<request-content-length>;<request-md5>;<server-md5>;<etag-identifier>;<last-modifiedtime>;<conditions-used>;<user-agent-header>;<referrer-header>;<client-request-id>

The fields in bold are the ones in which we are interested. So if you look at a log file, these are the three cases we can look for:

  1. The blob is public, and it is accessed using a URL without an SAS. In this case, the request-status will be AnonymousSuccess and the authentication type will be anonymous.

    1.0;2015-11-

    17T02:01:29.0488963Z;GetBlob;AnonymousSuccess;200;124;37;anonymous;;mystorage…

  2. The blob is private and was used with an SAS. In this case, the request-status is SASSuccess and the authentication type is sas.

    1.0;2015-11-16T18:30:05.6556115Z;GetBlob;SASSuccess;200;416;64;sas;;mystorage…

  3. The blob is private, and the storage key was used to access it. In this case, the request-status is Success and the authentication type is authenticated.

    1.0;2015-11-16T18:32:24.3174537Z;GetBlob;Success;206;59;22;authenticated;mystorage…

To view and analyze these log files, you can use the Microsoft Message Analyzer (free from Microsoft).

You can download the Message Analyzer here: https://www.microsoft.com/download/details.aspx?id=44226.

The operating guide is here: https://technet.microsoft.com/library/jj649776.aspx.

The Message Analyzer lets you search and filter the data. An example of when you might want to do this is if you have your keys stored in Azure Key Vault and only one application has access to the Azure Key Vault. In that case, you might search for instances where GetBlob was called and make sure there aren't any calls that were authenticated in any other way.

Important For Azure Analytics, the metrics tables start with $metrics, and the logs container in blob storage is called $logs. You cannot even see the tables and container using PowerShell, the Visual Studio Cloud Explorer, or the Azure portal.

You can see the tables and container and even open and view the entities and blobs using the Microsoft Azure Storage Explorer (http://storageexplorer.com). The Cerebrata Azure Management Studio and Cloud Portam allow you to access and view these objects (http://www.cerebrata.com) as well.

You can also write your own code using one of the storage client libraries to retrieve the data from table storage and blob storage. Other storage explorers listed in the article at the beginning of this chapter may also enable you to view this data.

Using Cross-Origin Resource Sharing (CORS)

When a web browser running in one domain makes an HTTP request for a resource in another domain, it's called a cross-origin HTTP request. If the request is made in a script language such as JavaScript, the browser will not allow the request.

For example, if a web application running on contoso.com makes a request for a jpeg on fabrikam.blob.core.windows.net, it will be blocked.

What if you actually want to share the images in your storage account with Contoso? Azure Storage allows you to enable CORS—Cross-Origin Resource Sharing. For this example, you would enable CORS on the fabrikam storage account and allow access from contoso.com. You can do this by using the Rest API or the storage client library.

Creating and managing storage

In this section, we are going to go through several exercises to show the different ways you can access your data objects. First, we'll use the Azure portal and the Visual Studio Cloud Explorer, then we'll do some of the same operations using PowerShell. Here's what we'll do:

Create a storage account using the Azure portal.

  • Create a blob container and upload blobs using the Visual Studio Cloud Explorer.
  • Create a file share and upload files using the Azure portal.
  • Create a table and add records to it using Visual Studio Cloud Explorer.
  • Create a storage account using Azure PowerShell.
  • Create a blob container and upload blobs using PowerShell.
  • Create a file share and upload files using PowerShell.

To do the Azure PowerShell demos, you need to install Azure PowerShell. If you haven't used Azure PowerShell before, please check out Chapter 8, "Management tools," or this article that shows how to install and configure Azure PowerShell: https://azure.microsoft.com/documentation/articles/powershell-install-configure/.

Create a storage account using the Azure portal

To create a storage account, log into the Azure portal. Click New > Data + Storage > Storage Account. You see a screen similar to Figure 4-1.

First, fill in a name for the storage account. The name must be globally unique because it is used as part of the URL. This will be used in the endpoints for blobs, files, tables, and queues. In Figure 4-1, the storage account name is azurebooktest. This means the blobs (for example) will be addressable as http://azurebooktest.blob.core.windows.net.

The next field displayed is the Deployment Model. You want to create a Resource Manager storage account, so select Resource Manager.

Account Kind can be General Purpose or Blob Storage. Select General Purpose so you can use the same account for blobs, files, and tables.

For Replication, the default is GRS—Globally Redundant Storage. Change this to LRS (Locally Redundant Storage), which has the lowest cost. For test data, you don't need it to be replicated in a completely different region.

If you manage multiple subscriptions, select the one you want to be used for this storage account.

For Resource Group, let's create a new one just for this chapter. Specify the name of the resource group. In Figure 4-1, the resource group is called azurebookch4rg.

For Location, select the Azure region closest to you for the best performance.

Select the Pin To Dashboard check box and click Create. Azure will provision the storage account and add it to the Dashboard.

Now that you've created a Resource Manager storage account in its own resource group, let's take a look at it.

If your storage account wasn't automatically displayed after being created, click your new storage account from the Dashboard. A blade will be displayed with information about your storage account (Figure 4-2).

Click All Settings to bring up the Settings blade (Figure 4-3).

Figure 4-3 Settings blade for the new storage account.

Here are some of the options in the Settings blade:

  • Access Keys. This shows you your storage account name and the two access keys. From the Access Keys blade, you can copy any of the values to the Windows clipboard. You can also regenerate the storage account access keys here.
  • Configuration. This allows you to change the replication. Yours is LRS if that's what you selected when creating the storage account. You can change it here to GRS or RA-GRS.
  • Custom Domain. This is where you can configure a custom domain for your storage account. For example, rather than calling it robinscompany.blob.core.windows.net, you can assign a domain to it and refer to it as storage.robinscompany.com.
  • Encryption. This is where you can sign up for the Storage Service Encryption preview. At some point, this will be where you enable and disable SSE for the storage account.
  • Diagnostics. This is where you can turn on the Storage Analytics and the logging.
  • Users. This is where you can grant management-plane access for this specific storage account.

Create a container and upload blobs using Visual Studio Cloud Explorer

Now you want to create a container and upload some files to it using Visual Studio Cloud Explorer.

Run Visual Studio. If you don't have the Azure Tools installed, you can use the Web Platform Installer to install them.

Click View > Cloud Explorer. You see a screen like the one in Figure 4-4.

Click the Settings icon to get to the login screen (Figure 4-5).

Figure 4-5 Select the Azure account with which to log into the Cloud Explorer.

If you don't have any Azure accounts displayed in the list, click the drop-down list and select Add An Account. If you do have accounts displayed, select the one you want to use and log into it. Click Apply. After logging in, you see something like Figure 4-6.

Figure 4-6 Visual Studio Cloud Explorer, showing resources.

Open the storage account you created with the portal. In the example, that's azurebooktest. The storage account has Blob Containers, Queues, and Tables. Right-click Blob Containers and select Create Blob Container, as displayed in Figure 4-7.

It shows a text box; type in the container name. The example uses test-vs. Press Enter; now it shows your new container under Blob Containers. Double-click the container name to open a screen where you can upload blobs (Figure 4-8).

To upload blobs into the container, click the icon on the top row next to the filter that shows an up arrow with a line over it (this is the same icon used in Figure 4-14). The Upload New File dialog opens (Figure 4-9). Browse to find a file. You can set a folder name here. Note that this is the pseudo-foldering discussed earlier—it includes the folder name in the blob name with a forward slash. If you leave the folder blank, it will put the file in the root of the container.

Figure 4-9 Dialog for uploading blobs into the container.

Upload some files into the root and some files into a folder. You should see something similar to Figure 4-10. This figure shows a folder called images and two blobs in the root. Note that it shows the URL to the blobs. If you open the images folder, it will show the blobs there, and all of the URLs will have /images/ in them.

Figure 4-10 Screen showing blobs uploaded into the container.

You can delete blobs from the container by using the red X icon, and you can download blobs and view them in the picture viewer by double-clicking the entry in the table or by clicking the forward arrow icon.

One thing this tool does not allow you to do is set the Access Type of the container. By default, the Cloud Explorer sets it to Private. The Access Type defines who can access the blobs and the container. If this is Private, the container and the blobs in the container can only be accessed by someone who has the account credentials (account name and key) or a URL that includes an SAS. If you set this to Blob, then anyone with a URL can view the associated blob but cannot view the container properties and metadata or the list of blobs in the container. If you set this to Container, then everyone has read access to the container and the blobs therein.

You can change this in the Azure portal and through some storage explorers. In the Azure portal, go to the storage account, click Blobs, and then select the container. A blade will open on the right showing the blobs in the container. Click Access Policy to set it to Blob or Public.

The Cloud Explorer is a pretty simple implementation of accessing blob storage. It does not allow you to upload or download folders full of images. For more sophisticated applications, check out the list of storage explorers provided earlier in this section.

Create a file share and upload files using the Azure portal

In this section, you will create an Azure File share and then upload some files to it. For this demo, you'll use the Azure portal. You can't use the Cloud Explorer in Visual Studio because it doesn't support Azure Files.

Log into the Azure portal. Click All Resources and then select the storage account you created using the portal. In the examples, this was azurebooktest. You should see something like Figure 411.

Click Files to open the File Service blade shown in Figure 4-12.

You don't have any file shares yet. Create one by clicking File Share. This will show the New File Share blade (Figure 4-13).

Provide a name for the file share. If you want the maximum size of the file share to be less than the allowed 5,120 GB, specify the desired value in the Quota field. To maximize the size of the file share, leave the Quota blank.

Click Create at the bottom of the blade, and Azure will create the file share for you and display it in the File Service blade.

Click the new file share to bring up the file share's blade. You see something like Figure 4-14.

Let's look at what the icons do.

  • Connect: This gives you the NET USE statement that you can use in a command window to map the network share to a local drive letter.
  • Upload: This allows you to upload files.
  • Directory: This lets you create a directory in the folder currently displayed. For you, that's the root folder.
  • Refresh: This refreshes the displayed information.
  • Delete: share This will delete the file share and all the files on it.
  • Properties: This shows the Properties blade for the file share. This shows the name, URL, quota, usage, and so on.
  • Quota: This lets you modify the quota specified.

Now upload some files. Click the Upload icon to show the Upload Files blade (Figure 4-15).

Click the file folder icon. In the Choose File To Upload dialog that displays, browse to any folder and select some files to upload. You can upload up to four files at a time. If you select more than four, it will ignore the extras. After selecting them and returning to the Upload Files blade, it shows the files in a list. Click the Start Upload button displayed in Figure 4-16 to upload the files.

The portal will show the progress while uploading the files and then show the files in the File Share blade, as illustrated in Figure 4-17.

Figure 4-17 Uploaded files.

Create a table and add records using the Visual Studio Cloud Explorer

Now you can create a table in your storage account and add some entities to it. You can use one of the storage explorer tools mentioned earlier in this book, but let's see how easy it is to use the Visual Studio Cloud Explorer to do this task.

If you've done the steps in the last section that showed how to use the Cloud Explorer to add blobs to blob storage, this will be just as easy. If you don't still have the Cloud Explorer open, open it again and log in to your Azure account again.

In Cloud Explorer, right-click Tables and select Create Table. You will be prompted for the name of the table, which must be unique within your storage account. After pressing Enter to create the new table, double-click the table name to see something similar to Figure 4-18.

You don't have any entities, so add one by clicking the icon with the + in it.

As discussed in the section "Table storage" earlier in this chapter, you have to think about what you want to use for PartitionKey and RowKey to get the best performance.

For this example, use geographic state abbreviation for the PartitionKey and city name for the RowKey. For properties, add Population as Int32 and LandArea as a Double. Fill in values for each of the fields. Figure 4-19 shows what the entity looks like before adding it to the table.

Click OK to save the entity. Add another entity, and this time, add another property besides

Population and LandArea, such as GPSCoordinates. Add a couple more entities, including whatever properties you want. If you want to edit an entity after saving it, you can right-click the entity and select Edit. You also can delete entities using this view.

After entering a few entities, you should have something similar to Figure 4-20.

You can see the PartitionKey and RowKey combination is unique for all of the entities. The rest of each row in the table is the list of key/value pairs. Not all entities have the same properties. The entity for

San Francisco only has LandArea and Population; the entity for San Jose is the only one with GPSCoordinates. This is a strength of Azure Tables—the key/value pairs can vary for each entity.

You can create tables by using a designer such as this one in Visual Studio, but for adding, changing, and deleting entities in an application, you will probably want to write your own code using the storage client library. For examples, please check out this link: http://azure.microsoft.com/ documentation/articles/storage-dotnet-how-to-use-tables/.

Create a storage account using PowerShell

Let's see how to do many of the same operations using Azure PowerShell cmdlets.

First, you need to run Azure PowerShell ISE.

Log into your Azure account using the PowerShell cmdlet Login-AzureRmAccount. You will be prompted for your Azure credentials; go ahead and log in.

> Login-AzureRmAccount

Note: There is also a cmdlet called Add-AzureAccount. This is for using classic resources. All of the cmdlets for Resource Manager accounts have "Rm" after the word "Azure" in the cmdlet.

After logging into the account, it should show the subscription in the command window.

Now you need a resource group in which to put your storage account. Use the same one you created in the portal when you created the storage account there. If you put all of the resources created in this chapter in the same resource group, then at the end you can delete them in one fell swoop by deleting the resource group.

If you want to create a new resource group, you can do that with the NewAzureRmResourceGroup cmdlet like this:

> New-AzureRmResourceGroup "nameofgroup" –Location "location" An example of Location is West US.

You can retrieve a list of resource groups by using the Get-AzureRmResourceGroup cmdlet. When you run this, you see the resource group you set up when creating the storage account in the portal (Figure 4-21).

Now let's create the storage account. You want to create a Resource Manager storage account and specify the resource group. You also specify the storage account name, the location, and the type, which is for the redundancy type. You want to use locally redundant storage for the same reasons mentioned when creating the storage account using the Azure portal. Select your own storage account name. Here's what the command looks like:

> New-AzureRmStorageAccount –ResourceGroup "bookch4rg" –StorageAccountName "bookch4ps" –Location "West US" –Type "Standard_LRS"

For a full list of locations, you can run the PowerShell cmdlet Get-AzureRmLocation.

Fill in your own values, and when you're ready, press Enter to execute the command. It will take a couple of minutes. When it's done, it will show you your new storage account. It should look like Figure 4-22.

Figure 4-22 The PowerShell output from creating the storage account.

If you log into the Azure portal, you can see your new resource group and the new storage account in the resource group.

Create a container and upload blobs using PowerShell

Now you'll create a container and upload some blobs. In the example, the test files are in D:\_TestImages. That path is used when uploading those files to Blob storage.

Note These cmdlets are Azure Storage data-plane cmdlets, not Azure Service Management (ASM) or Azure Resource Manager cmdlets, which are management-plane cmdlets. The cmdlet to create a storage account is a management-plane cmdlet. These data-plane cmdlets can be used with both ASM and Resource Manager storage accounts.

If you're not running the PowerShell ISE and are logged into your Azure account, do that now. You're going to create a script that you can save and use later. In addition to the path to your local pictures, you will need the name and access key of your storage account.

Set up variable names for the storage account name and key—$StorageAccountName and $StorageAccountKey. Fill in your storage account name and key here.

$StorageAccountName = "yourStorageAccountName"

$StorageAccountKey = "yourStorageAccountKey"

Next, you'll define the storage account context using the storage account name and key. You will use this context for authentication with subsequent commands against the storage account. This is easier (and safer) than specifying the storage account name and key all the time.

$ctx = New-AzureStorageContext -StorageAccountName $StorageAccountName ` -StorageAccountKey $StorageAccountKey

Note that there is a continuation character (the backward tick mark) at the end of the first line.

Next, you'll add a variable for the name of your container, then you'll create the container. The example uses test-ps.

$ContainerName = "test-ps"

#create a new container with public access to the blobs

New-AzureStorageContainer -Name $ContainerName -Context $ctx -Permission Blob

This creates a container in your storage account (as defined by the context) with a permission of Blob, which means the blobs can be accessed on the Internet with a URL.

Now you need to set a variable pointing at the local directory with the images. You can upload any files, just remember the larger they are, the longer it will take to upload! Using a variable here makes it easier to change it later in case you use this in multiple places.

$localFileDirectory = "D:\_TestImages\"

Now you can upload a blob. First, you'll set a variable name for the blob name to be the same as the file name. Then, append it to the $localFileDirectory variable. The file will be uploaded from the local disk to the specified container.

$BlobName = "SnowyCabin.jpg"

$localFile = $localFileDirectory + $BlobName

Set-AzureStorageBlobContent -File $localFile -Container $ContainerName `

-Blob $BlobName -Context $ctx

To run the script, press F5. To run parts of the script, highlight the bits you want to run and press F8 (or click the Run Selection icon). If you have to run this repeatedly, you only want to create the container once, so once that's successful, only select commands starting after that. When you run this and upload the file, you get back verification in the command window (Figure 4-23).

To upload more files, copy and paste the three lines of PowerShell, changing the $BlobName variable for each set you paste.

After uploading some files, you can list them by using the Get-AzureStorageBlob PowerShell cmdlet.

# get list of blobs and see the new one has been added to the container

Get-AzureStorageBlob -Container $ContainerName -Context $ctx

You can also see the container and blobs if you log into the Azure portal and go to the storage account.

There are also PowerShell commands for downloading blobs, deleting blobs, copying blobs, etc.

Create a file share and upload files using PowerShell

Now you're going to create a file share in the storage account and upload some files to it using PowerShell. This is very similar to the PowerShell for uploading blobs.

In our example, the storage account is called bookch4ps; the test files are in D:\_TestImages. That path is needed when uploading those files to File storage.

If needed, run the PowerShell ISE and log into your Azure account. You're going to create a script that you can save and use later. In addition to the path to your local pictures, you will need the name and access key of your storage account.

Set up variable names for the storage account name and key: $StorageAccountName and $StorageAccountKey. Fill in your storage account name and key.

$StorageAccountName = "yourStorageAccountName"

$StorageAccountKey = "yourStorageAccountKey"

Next, you'll define the storage account context using the storage account name and key. You will use this context for authentication with subsequent commands against the storage account. This is easier (and safer) than specifying the storage account name and key all the time.

$ctx = New-AzureStorageContext -StorageAccountName $StorageAccountName -StorageAccountKey $StorageAccountKey

Note that there is a continuation character at the end of the first line—the backward tick mark.

Now you'll set the variable for the name of the file share to whatever you like; the example will use psfileshare. Then, you'll create the new file share, assigning it to the variable $s.

$shareName = "psfileshare"

$s = New-AzureStorageShare $shareName -Context $ctx

Now set a variable for the local location of the files to be uploaded.

$localFolderName = "D:\_TestImages\"

Now you can do the actual upload of the file. Set a variable for the file name, create the local path (directory + file name), and then use the PowerShell cmdlet Set-AzureStorageFileContent to upload the file.

$fileName = "DogInCatTree.png"

$localFile = $localFolderName + $fileName

Set-AzureStorageFileContent -Share $s -Source $localFile -Path images

Copy this a couple of times and run it with different file names to upload multiple files. Now run the script and watch as it echoes the successful commands back to you.

You can call Get-AzureStorageFile to retrieve the list of files in the root of the file share.

Get-AzureStorageFile -Share $s

Figure 4-25 shows the output from the example.

There are also PowerShell commands for downloading files, deleting files, copying files, etc.

AzCopy: A very useful tool

Before finishing the chapter on Azure Storage, you need to know about AzCopy. This is a free tool provided by the Azure Storage team to move data around. The core use case is asynchronous serverside copies. When you copy blobs or files from one storage account to another, they are not downloaded from the first storage account to your local machine and then uploaded to the second storage account. The blobs and files are copied directly within Azure.

Here are some of the things you can do with AzCopy:

  • Upload blobs from the local folder on a machine to Azure Blob storage.
  • Upload files from the local folder on a machine to Azure File storage.
  • Copy blobs from one container to another in the same storage account.
  • Copy blobs from one storage account to another, either in the same region or in a different region.
  • Copy files from one file share to another in the same storage account.
  • Copy files from one storage account to another, either in the same region or in a different region.
  • Copy blobs from one storage account to an Azure File share in the same storage account or in a different storage account.
  • Copy files from an Azure File share to a blob container in the same storage account or in a different storage account.
  • Export a table to an output file in JSON or CSV format. You can export this to blob storage.
  • Import the previously exported table data from a JSON file into a new table. (Note: It won't import from a CSV file.)

As you can see, there are a lot of possibilities when using AzCopy. It also has a bunch of options. For example, you can tell it to only copy data where the source files are newer than the target files. You can also have it copy data only where the source files are older than the target files. And you can combine these options to ask it to copy only files that don't exist in the destination at all.

AzCopy is frequently used to make backups of Azure Blob storage. Maybe you have files in Blob storage that are updated by your customer frequently, and you want a backup in case there's a problem. You can do something like this:

  • Do a full backup on Saturday from the source container to a target container and put the date in the name of the target container.
  • For each subsequent day, do an incremental copy—copy only the files that are newer in the source than in the destination.

If your customer uploads a file by mistake, if they contact you before end of day, you can retrieve the previous version from the backup copy.

Here are some other use cases:

  • You want to move your data from a classic storage account to a Resource Manager storage account. You can do this by using AzCopy, and then you can change your applications to point to the data in the new location.
  • You want to move your data from general-purpose storage to cool storage. You would copy your blobs from the general-purpose storage account to the new Blob storage account, then delete the blobs from the original location.

For more information and a ton of examples, check out https://azure.microsoft.com/documentation/articles/storage-use-azcopy/.

The Azure Data Movement Library

Many people wanted to be able to call AzCopy with their own specialized case. Because of this, the Azure Storage team open sourced the Azure Storage Data Movement Library, giving you programmatic access to AzCopy. For more information, check out the repository and samples on GitHub at https://github.com/Azure/azure-storage-net-data-movement.