Update 1.1904.0.36 and Changes in Update Processes

Update 1.1904.0.36 was officially released for Azure Stack today. Just like last months, this update was mainly focused around fixes and improvements. I am impressed with the newer smaller and faster updates. My little 4 node scale until up until 1903 was taking an average of 16 to 20 hours to update. These past 2 updates have taken just over 12 hours each. That is amazing and allows us to manage our maintenance windows a lot easier each month.

P&U Changes

The product team has changed how Updates alert the operator. They also have been working at how updates will be deployed.

First of all, the portal will alert the operator in two ways now. What they call “Soft” or Hard” alerts.

  • The Soft Alert is a warning alert letting the operator know the recent update needs attention. It also tells the operator that Microsoft recommends opening a service request during normal business hours.
  • The Hard Alert is a critical alert. This will display a message in the portal saying Microsoft recommends opening a service request as soon as possible.

The Soft and Hard alerts are based on the update process running a Test-AzureStack and based off of the output generated, the most appropriate alert will be generated in the portal.

The GRF

1903 the “Global Remediation Framework” also known as the GRF was introduced. It is a subsystem in the P&U orchestration. It detects various conditions and the current state of NC, ACS, etc. One big change is the way they treat some of these components. If the orchestration can’t get various services running in a timely manner the process actually FRU the node. In the 1903 update, it was limited to the Service Fabric and the nodes within except for the NC. IN 1904, they are including others like WAS, WASP, GW, and SLB. Basically, they treat these nodes as cattle not pets. 🙂

The ERCS

They also bumped memory on the ERCS VMs from 8 GB to 12 GB. This will greatly increase performance on the ERCS VM’s, which host the Action Plan execution for the update.

To Include or Not Include

There are also changing on how they package Windows Updates in the P&U as well. It comes down to the payload and if it contains cloud critical fixes. If the Windows Server latest Cumulative Update (LCU) isn’t needed then they just distribute new NuGet packages to the infrastructure. This doesn’t depend on physical nodes so the overall runtime will be lower. This also means they don’t have to re-image the host which means not having to put the host into maintenance mode. Which also means not having to drain them. Which also means not having to rebalance storage on the backend since it runs on Storage Spaces Direct. This all combined allows the total update time to be cut drastically.

These are some of the many changes coming to Azure Stack to help keep the platform stable and running smoothly.

Update 1.1904.0.36 Information

Just like 1903, 1904 didn’t new features. I know the product team is working very hard these past few months to stabilize the platform so the previous few updates have focused around improvements and fixes.

https://docs.microsoft.com/en-us/azure-stack/operator/azure-stack-release-notes-1904

Improvements

  • Added a notification in the administrator portal, when the currently logged in user does not have the necessary permissions, which enables the dashboard to load properly. It also contains a link to the documentation that explains which accounts have the appropriate permissions, depending on the identity provider used during deployment.
  • Added improvements to VM resiliency and uptime, which resolves the scenario in which all VMs go offline if the storage volume containing the VM configuration files goes offline.
  • Added optimization to the number of VMs evacuated concurrently and placed a cap on bandwidth consumed, to address VM brownouts or blackouts if the network is under heavy load. This change increases VM uptime when a system is updating.
  • Improved resource throttling when a system is running at scale to protect against internal processes exhausting platform resources, resulting in failed operations in the portal.
  • Improved filtering capabilities enable operators to apply multiple filters at the same time. You can only sort on the Name column in the new user interface.
  • Improvements to the process of deleting offers, plans, quotas, and subscriptions. You can now successfully delete offers, quotas, plans, and subscriptions from the Administrator portal if the object you want to delete has no dependencies. For more information, see this article.
  • Improved syslog message volume by filtering out unnecessary events and providing a configuration parameter to select the desired severity level for forwarded messages. For more information on how to configure the severity level, refer to Azure Stack datacenter integration – syslog forwarding.
  • The Azure Stack Infrastructure consumes an additional 12 GB + (4 GB * Number of Azure Stack hosts) from the 1904 update onwards. This means that in a 4 node stamp there will be an additional capacity consumption of 28 GB (12 GB + 4 GB * 4) reflected in the capacity screen of the Azure Stack administrator portal. Your update to the 1904 release should succeed even if the additional memory consumption puts your Azure Stack stamp over capacity. If your Azure Stack stamp is over memory usage AFTER the update is completed, you will see an alert reflecting this state, with remediation steps to de-allocate some VMs.
  • Added a new capability to the Get-AzureStackLog cmdlet by incorporating an additional parameter, -OutputSASUri. You can now collect Azure Stack logs from your environment and store them in the specified Azure Storage blob container. For more information, see Azure Stack diagnostics.
  • Added a new memory check in the Test-AzureStack UpdateReadiness group, which checks to see if you have enough memory available on the stack for the update to complete successfully.
  • Improvements to hardware updates, which reduces the time it takes to complete drive firmware update to 2-4 hours. The update engine dynamically determines which portions of the update need to execute, based on content in the package.
  • Added robust operation prechecks to prevent disruptive infrastructure role instance operations that affect availability.
  • Improvements to idempotency of infrastructure backup action plan.
  • Improvements to Azure Stack log collection. These improvements reduce the time it takes to retrieve the set of logs. Also, the Get-AzureStackLog cmdlet no longer generates default logs for the OEM role. You must execute the Invoke-AzureStackOnDemandLog cmdlet, specifying the role to retrieve the OEM logs. For more information, see Azure Stack diagnostics.

Changes

  • Removed the option for Azure Stack operators to stop and shut down infrastructure role instances in the administrator portal. The restart functionality ensures a clean shutdown attempt before restarting the infrastructure role instance. For advanced scenarios, the API and PowerShell functionality remains available.
  • There is a new Marketplace management experience, with separate screens for Marketplace images and resource providers. For now, the Resource providers window is empty, but in future releases new PaaS service offerings will appear and be managed in the Resource providers window.
  • Changes to the update experience in the operator portal. There is a new grid for resource provider updates. The ability to update resource providers is not available yet.
  • Changes to the update installation experience in the operator portal. To help Azure Stack operators respond appropriately to an update issue, the portal now provides more specific recommendations based on the health of the scale unit, as derived automatically by running Test-AzureStack and parsing the results. Based on the result, it will inform the operator to take one of two actions:
    • A “soft” warning alert is displayed in the portal that reads “The most recent update needs attention. Microsoft recommends opening a service request during normal business hours. As part of the update process, Test-AzureStack is performed, and based on the output we generate the most appropriate alert. In this case, Test-AzureStack passed.”
    • A “hard” critical alert is displayed in the portal that reads, “The most recent update failed. Microsoft recommends opening a service request as soon as possible. As part of the update process, Test-AzureStack is performed, and based on the output we generate the most appropriate alert. In this case, Test-AzureStack also failed.”

Fixes

Fixed an issue in which the syslog configuration was not persisted through an update cycle, causing the syslog client to lose its configuration, and the syslog messages to stop being forwarded. Syslog configuration is now preserved.
Fixed an issue in CRP that blocked deallocation of VMs. Previously, if a VM contained multiple large managed disks, deallocating the VM might have failed with a timeout error.
Fixed issue with Windows Defender engine impacting access to scale-unit storage.
Fixed a user portal issue in which the Access Policy window for blob storage accounts failed to load.
Fixed an issue in both administrator and user portals, in which erroneous notifications about the global Azure portal were displayed.
Fixed a user portal issue in which selecting the Feedback tile caused an empty browser tab to open.
Fixed a portal issue in which changing a static IP address for an IP configuration that was bound to a network adapter attached to a VM instance, caused an error message to be displayed.
Fixed a user portal issue in which attempting to Attach Network Interface to an existing VM via the Networking window caused the operation to fail with an error message.
Fixed an issue in which Azure Stack did not support attaching more than 4 Network Interfaces (NICs) to a VM instance.
Fixed a portal issue in which adding an inbound security rule and selecting Service Tag as the source, displayed several options that are not available for Azure Stack.
Fixed the issue in which Network Security Groups (NSGs) did not work in Azure Stack in the same way as global Azure.
Fixed an issue in Marketplace management which hides all downloaded products if registration expires or is removed.

Spread the word. Share this post!