Dive into Microsoft Azure Stack Architecture (part 2)

Since the Azure Stack Architecture blog became rather long, this blog will cover the second part. You can find part one here.

Initial Azure Stack VM sizes

In Azure Stack TP2 there are only a handful VM sizes, but at GA a lot more VM sizes will be supported, although not all VM sizes can be accommodated yet because they require specific hardware configuration.

13-inital-azure-vm-sizes

Azure Stack Compute Requirements

The minimum Azure Stack configuration requires at least four compute nodes. These servers are hyper-converged meaning they combine the Hyper-V and Storage roles of Windows Server 2016. A compute node consists of a dual socket processor with a minimum of 8 cores per socket. With multi-threading enabled this will offer 32 logical processors. A server should have a minimum of 256GB of memory and each server in a hyper-converged cluster should be identical in terms of cpu, storage and network.

Software Defined Network Capabilities Windows Server 2016

Azure Stack greatly benefits from a large number of software defined networking capabilities in Windows Server 2016:

  • Network Controller
    • Central control plane
    • Fault tolerant
  • Virtual Networking
    • BYO address space
    • Multiple subnets
    • Distributed router
  • Network Security
    • Distributed firewall
    • Network Security Groups
    • BYO Virtual Appliances
  • Robust Gateways
    • Robust availability model
    • Multi-tenancy for all modes of operation
  • Software Load Balancing
    • L4 load balancing
    • NAT for tenants and Azure Stack infrastructure
  • Data Plane Improvements
    • Performance: 10Gb, 40Gb and higher
    • RDMA over Virtual Switch

This very rich software defined network (SDN) ecosystem is all configured automatically during installation of Azure Stack and can be used immediately as soon as the installation has finished. Compare this to all the pain an admin has to go through when having to understand, architect and build all this him/herself. Even with the assistance of all the service templates and wizards in Virtual Machine Manager 2016, this is still a very complex task and rarely succeeds in a first attempt (if at all).

Physical Network Switch Topology

Each Azure Stack rack can contain one or more Scale Units (hyper-converged clusters) and contains three switches:

  • Two DATA Top of Rack (ToR) switches for Windows Server 2016 Converged NIC (SDN + Storage)
  • One BMC switch for physical host control and 3rd party hardware monitoring

Each hyper-converged node contains a 2-port 10Gb (or faster) physical NIC. By using the new Switch Embedded Teaming (SET) feature, the DATA network can converge both SDN and Storage traffic, guaranteeing port and link resiliency. Each port is connected to its own ToR switch.

Network Switch Connectivity

In the below diagram, two Azure Stack racks are shown, interconnected in a mesh fabric by Aggregate and ToR switches. The BMC switches are redundantly connected to the in-rack ToR switches. Each ToR switch has a dual connection with its partner switch, and both aggregate switches. Each 2-port physical NIC connects to each in-rack ToR switch.

14-network-switch-connectivity

Azure Stack Network: Subnets Used

The Azure Stack network is architected to use five subnets for the following purposes:

  • Network Switch Management
  • BMC (iLO, DRAC, etc.)
  • Infrastructure
  • Storage
  • External Virtual IPs

Azure Stack Virtual IP Addressing

Azure Stack uses an IP address for the following services:

  • Tenant Portal
  • Admin Portal
  • ARM
  • Storage (Blob, Table, Queues)
  • xRP, Keyvault
  • ADFS, Graph
  • Site-to-Site Endpoint
  • Tenant 1, Tenant 2 … Tenant N

15-virtual-ip-addresses

Windows Server 2016 – HNV Networking

The next diagram provides a full architecture diagram of a 4-node Azure Stack Scale Unit using Hyper-V Network Virtualization (HNV). It shows a 3-node Network Controller guest cluster, a 2-node Software Load Balancer (SLB) and multiplexer (MUX) and a 3-node Gateway, connected to three main networks:

  • Management Network – 10.184.108.0/24
  • Transit Network – 10.10.10.0/24
  • HNV Provider Network – 10.10.56.0/24

16-hnv-networking

The diagram also shows two tenant VNETs using identical subnets (192.168.0.0/24) which is possible with HNV as Bring-Your-Own (BYO) networks.

Azure Stack NIC and Switch

Each Azure Stack hyper-converged host requires a single dual-port 10Gb+ NIC that supports Remote Direct Memory Access (RDMA) for SMB Direct and can either be RoCE v2 or iWARP.

Each DATA ToR switch requires:

  • BGP
  • Data Center Bridging (DCB)
    • Enhanced Transmission Selection (ETS)
    • Priority Flow Control (PFC)
  • Switch Independent Teaming used by the host

For more details about RDMA and especially RoCE, please read the excellent blogs by my MVP buddy Didier van Hoye aka @WorkingHardInIT.

Storage Spaces Direct: Use Cases

For the initial version of Azure Stack, Microsoft decided to go for the hyper-converged model which combines Hyper-V and Storage Spaces Direct in a 4-node cluster as a minimum Scale Unit. Future incarnations of Azure Stack may also support the converged model which separates the Hyper-V compute cluster(s) from the Storage Spaces Direct cluster(s).

16-storage-spaces-direct

Characteristics of hyper-converged:

  • Compute and storage resources together
  • Compute and storage scale together
  • Compute and storage managed together
  • Typically small to medium sized deployments

Characteristics of converged:

  • Compute and storage resources separate
  • Compute and storage scale independently
  • Compute and storage managed independently
  • Typically larger scale deployments

If one day we have to move from hyper-converged to converged, live Storage Migration will be our friend because the VHDs remain on the S2D cluster, while the VMs memory can be transferred to the Hyper-V clusters at very high speed thanks to the RDMA network and the flat Layer-3 network with a minimum of network hops.

Storage Stack

The beauty of the Windows Server 2016 compute and storage architecture is that very little has to change between converged and hyper-converged and the basic building blocks remain exactly the same. In fact most of the layers were already in place with Windows Server 2012 R2. There are a few notable differences of course. We now use the new Resilient File System (ReFS) instead of NTFS and the Storage Software Bus which extends across multiple physical servers to form one Storage Spaces Storage Pool.

16-storage-stack

Azure Stack design choices

Microsoft made the following design choices for Azure Stack:

  • Hyper-Converged deployment
  • File System: CSVFS with ReFS
    • Cluster-wide data access
    • Fast VHD(X) creation, expansion and checkpoints
  • Storage Spaces
    • Single scalable pool with all disk devices (except boot)
    • Multiple virtual disks per pool (Mirrored or Parity)
  • Software Storage Bus
    • Storage Bus Cache
    • Leverages SMB3 and SMB3 Direct
  • Servers with local disks
    • SATA, SAS and NVMe

Built-In Cache

An integral part of the Software Storage Bus is the built-in cache. This cache is scoped to only the local machine and is agnostic to the storage pool or particular virtual disks. The cache is also conveniently configured automatically as soon as S2D is enabled with the cluster. It creates a special partition on each caching device, leaving 32GB for pool and virtual disks metadata. The cache maintains a round robin binding of SSD to HDD and automatically rebinds with a topology change.

18-cache

The built-in cache has the following characteristics:

  • All writes up to 256KB are cached
  • Reads of 64KB or less are cached on first miss
  • Reads of 64+ KB are cached on second miss (<10 minutes)
  • Sequential reads of 32+KB are not cached
  • Write cache only on all flash systems

Volume Types

In Storage Spaces Direct, three distinct volume types exist:

  • Mirror for performance
  • Parity for capacity
  • Multi-resilient for balancing performance and capacity

19-volume-types

Because the initial integrated systems offered at GA have only HDD and SSD, Azure Stack will not yet be able to support

Azure Stack Storage

The Azure Stack storage configuration consists of a single storage spaces pool per cluster with one ReFS file system per server which accommodates the addition and removal of resources. Adding or removing a physical disk rearranges and spreads the data blocks across the disks, taking fully advantage of alle available physical disks, unlike a Windows Server 2012 R2 storage space, which does not benefit from better performance when adding physical disks to a pool.

The minimum hardware requirements per Storage Spaces Direct cluster node are:

  • 2 cache devices
  • 4 capacity devices

It is more likely that a modern 2U server can hold at least 8 to 12 large form factor (LFF)capacity disks. For example, an HPE DL380 Gen9 server used for Azure Stack starts with 4 x 1.2 GB SATA SSD disks (cache), 10 x 6 TB SATA HDD disks (capacity) and 2 x 340 GB SSD disks (OS) in a single M.2 slot. If you combine that in a 4-node cluster, you get a pool with 240 TB of capacity and 19.2 TB of cache, before creating resilient mirror and/or parity volumes.

Spaces Direct Health Service

The Azure Stack takes advantage of the Spaces Direct Health Service, which is built-in to the Windows Server 2012 operating system. The Health Resource Provider (HRP) processes events and produces alerts which can be accessed via a REST API, allowing both the Azure Stack portal and other monitoring tools to consume the health data. The next diagram shows a multitude of S2D events which are tracked.

20-health-resource-provider

Azure-consistent storage solution view

On top of the Azure Stack infrastructure services offered by the Scale-Out File Server (SOFS) and Storage Spaces Direct (SD2), there is a virtualized services layer which is divided a data services cluster and a resource provider cluster. The former offers tenant-facing storage cloud services such as the blob, table and queue service. The latter offers the storage account services (also tenant oriented) and the storage cloud admin service (admin oriented). For example when a tenant creates and then deletes a storage account, an administrator can recover this deleted storage account on behalf of the tenant through the storage cloud admin service.

The storage solutions in Azure stack are consumed by applications using an Azure Stack account, APIs or tools like the Azure Storage Explorer. Also the Azure Stack portal, PowerShell storage cmdlets, ACS cmdlets, CLI and Client SDK can access the virtualized storage services.

21-azure-consistent-storage-solution-view

Deployment characteristics

There are a number of tasks that must be completed for an Azure Stack deployment and integration in the datacenter of a service provider or enterprise:

  • Rack and Stack the servers and switches
  • Configure the network switches
  • Integrate with the border network
  • Deployment of Azure Stack onto Scale Units (SU)
  • Connectivity to Azure Active Directory (AAD) or Active Directory Federation Services (ADFS), etc.

22-integrationi-in-datacenter

Validation

Furthermore there is continuous validation going on for the various hardware components (NICs, HBAs, Drives, Network Switches, Servers) and to validate each hardware configuration as a properly working integrated system. The reason why only three OEMs were selected with only one hardware SKU, is that this part should not be underestimated. We know from experience and Microsoft has learnt the hard way with Cloud Platform System (CPS), that system integration needs to be done very carefully and the success is very dependent on the quality of the parts, the firmware and the drivers. Remember the VMQ nightmare in R2? Validation follows the traditional Windows Hardware Lab Kit (HLK) procedures, but stringent Azure Stack specific integration tests have been added.

Hardware Monitoring Overview

The following diagram provides a detailed overview of how hardware monitoring is done with Azure Stack. Since no agents are allowed on a server, all monitoring must be done via the Bare Metal Controller (BMC), which will be specific to the hardware vendor. External monitoring software can either connect to the BMC or to the REST API of the Health Storage Provider (HRP).

23-hardware-monitoring-overview

Patching and Update

Azure Stack will be updated at a regular intervals with pre-validated updates for both software and firmware in a single package. Microsoft could not yet confirm if that update came via Microsoft or the hardware partners, but it would definitely be single-sourced. The updating process is designed to not disrupt tenant workloads and be as reliable and easy to use as possible.

24-patch-and-update

Azure Stack: Backup and Disaster Recovery

Just like with Azure, we will have two important features in Azure Stack: Azure Backup and Azure Site Recovery (ASR). However for the Azure Stack infrastructure, there will be a different protection service. Backups of the Azure Stack infrastructure configuration will be taken to an external file share, which in turn can be protected by any Microsoft or 3rd party backup/restore service.

25-backup-and-disaster-recovery

Azure Stack Security

Windows Server 2016 has received a lot of attention in terms of security, as customers have shared there concern about how hackers have compromised their business and how long it took before this got noticed. Microsoft now simply assumes that systems are already breached or will be breached. The following measures were taken to significantly improve security:

  • Constrained admin
    • Least privilege
    • Role-based Access Control (RBAC)
    • Just Enough Administration (JEA)
  • Application whitelisting
  • Network whitelisting
  • Customized auditing

Windows Server 2016 systems will be hardened by default by using:

  • Data at rest encryption
  • Security OS baseline
  • Disabled legacy protocols (NTLM, SMB v1, etc.)
  • Customized Anti-virus configurations

Spread the word. Share this post!