Menu

Get notified about service health incidents that have occurred in Azure

Wouldn't be nice to receive notifications about any service health incidents that have occurred in Azure? Automated notification about platform issues could save your troubleshooting time.

Status of the Azure service is publicly available in here but this site basically offers only traffic light view without detailed information about the issue. Detailed subscription level service health notification can be verified or subscribed from the Azure Service Health. These notifications are a sub-class of activity log events, and can also be found in the activity log. Service health notifications can be informational or actionable, depending on the class.

Notifications are determined to the following classes

Action required Azure might notice something unusual happen on your account, and work with you to remedy this. Azure sends you a notification, either detailing the actions you need to take or how to contact Azure engineering or support.
Incident An event that impacts service is currently affecting one or more of the resources in your subscription.
Maintenance A planned maintenance activity that might impact one or more of the resources under your subscription.
Information Potential optimizations that might help improve your resource use.
Security Urgent security-related information regarding your solutions that run on Azure.

Source

This blog post covers how to subscribe service health alerts and route them to Teams channel.

How to subscribe Azure Service Health notifications?

You can find Service Health from Azure portal by search. Service Health gives you an overview to service issues and planned maintenance actions. You can configure service health notification delivery by clicking "Add service health alert". In Add service health alert page you can determine event types (service issue, planned maintenance, health advisories, security advisory) you're interested and what delivery method (email/SMS/webhook) you want to use. In this sample we use Webhook endpoint provided by Logic App.

undefined

Note! You can also configure Health alert in resource level by selecting Resource health under Support + troubleshooting.

undefined

How to create a webhook endpoint provided by Logic App?

Logic App's role is to receive a service health alert and send it to MS Teams channel. This sample uses Adaptive Cards to visualize notification message in Teams. Azure also sends a new notification event when issue is resolved. We don't want to flood Teams notification channel by generating a new message everytime eventought only status is changed. This samples shows how to reply status update event to the original Teams message thread.

Final result in Teams channel looks like this

undefined

What are Adaptive cards in Teams?

Adaptive Cards are actionable snippets of content that you can add to a conversation through a bot or messaging extension. Using text, graphics, and buttons, these cards provide rich communication to your audience. Source.

AdaptiveCards.io provides a good set of documentation about the topic. Adaptive card schema which is used in this sample is developed by a AdaptiveCards.io designer tool.

Logic App implementation steps

Logic App flow looks like this. Below more details about each steps.

undefined

Step 1: When a HTTP request is received

This action creates HTTP POST endpoint which receives service health notification. 

undefined

Get example notification payload message from here to generate schema.

Step 2: Get Teams messages

This action retrieves all messages from specific Teams channel. Message information is used later to identity existing messages where status update will be added as a reply message.

undefined

Step 3: Filter messages with attachments

Adaptive card content contains all information about the service health notification (ID, Service, Region, Communication etc.). This content is included to the attachment object in Teams message model. This action filters out messages which don't have attachment.

undefined

Expression used:

@greater(length(item()?['attachments']), 0)

Step 4: Filter messages with trackingId

This action filters Teams messages with tracking Id which is provided in the notification. Tracking Id is unique for the incident. 

undefined

Expression used:

item()?['attachments'][0]['content']

Step 5: Condition

Condition action is used to decide whether create a new Teams message or add status update to the existing message as a reply.

undefined

Expression used:

length(body('Filter_messages_with_trackingId'))

Step 6: Condition - True-path

True-path handles cases where notification status is updated to the original Team message thread.

undefined

Expression used:

body('Filter_messages_with_trackingId')[0]

undefined

Parse JSON action is used to enable strongly typing of Teams message.

undefined

Enabling also strongly typing for Status information.

undefined

Update status as a reply to the existing thread by Message Id. 

Step 6: Condition - False-path

False-path adds a new notification message to the Team channel. Adaptive Card schema JSON is copy-pasted here and input values are modified to the schema.

undefined

Now solution is ready and your team is notified when something is happening in Azure platform!

Sample data schemas used in sample

Example service health notification

This example notification payload message is from Microsoft documentation.

{
  "channels": "Admin",
  "correlationId": "c550176b-8f52-4380-bdc5-36c1b59d3a44",
  "description": "Active: Network Infrastructure - UK South",
  "eventDataId": "c5bc4514-6642-2be3-453e-c6a67841b073",
  "eventName": {
      "value": null
  },
  "category": {
      "value": "ServiceHealth",
      "localizedValue": "Service Health"
  },
  "eventTimestamp": "2017-07-20T23:30:14.8022297Z",
  "id": "/subscriptions/<subscription ID>/events/c5bc4514-6642-2be3-453e-c6a67841b073/ticks/636361902148022297",
  "level": "Warning",
  "operationName": {
      "value": "Microsoft.ServiceHealth/incident/action",
      "localizedValue": "Microsoft.ServiceHealth/incident/action"
  },
  "resourceProviderName": {
      "value": null
  },
  "resourceType": {
      "value": null,
      "localizedValue": ""
  },
  "resourceId": "/subscriptions/<subscription ID>",
  "status": {
      "value": "Resolved",
      "localizedValue": "Resolved"
  },
  "subStatus": {
      "value": null
  },
  "submissionTimestamp": "2017-07-20T23:30:34.7431946Z",
  "subscriptionId": "<subscription ID>",
  "properties": {
    "title": "Network Infrastructure - UK South",
    "service": "Service Fabric",
    "region": "UK South",
    "communication": "Starting at approximately 21:41 UTC on 20 Jul 2017, a subset of customers in UK South may experience degraded performance, connectivity drops or timeouts when accessing their Azure resources hosted in this region. Engineers are investigating underlying Network Infrastructure issues in this region. Impacted services may include, but are not limited to App Services, Automation, Service Bus, Log Analytics, Key Vault, SQL Database, Service Fabric, Event Hubs, Stream Analytics, Azure Data Movement, API Management, and Azure Cognitive Search. Multiple engineering teams are engaged in multiple workflows to mitigate the impact. The next update will be provided in 60 minutes, or as events warrant.",
    "incidentType": "Incident",
    "trackingId": "1NA0F-BJGY2",
    "impactStartTime": "2017-07-20T21:41:00.0000000Z",
    "impactedServices": "[{\"ImpactedRegions\":[{\"RegionName\":\"UK South\"}],\"ServiceName\":\"Service Fabric\"}]",
    "defaultLanguageTitle": "Network Infrastructure - UK South",
    "defaultLanguageContent": "Starting at approximately 21:41 UTC on 20 Jul 2017, a subset of customers in UK South may experience degraded performance, connectivity drops or timeouts when accessing their Azure resources hosted in this region. Engineers are investigating underlying Network Infrastructure issues in this region. Impacted services may include, but are not limited to App Services, Automation, Service Bus, Log Analytics, Key Vault, SQL Database, Service Fabric, Event Hubs, Stream Analytics, Azure Data Movement, API Management, and Azure Cognitive Search. Multiple engineering teams are engaged in multiple workflows to mitigate the impact. The next update will be provided in 60 minutes, or as events warrant.",
    "stage": "Active",
    "communicationId": "636361902146035247",
    "version": "0.1.1"
  }
}

Example Teams message

This is an example Teams message which is retrieved via Logic App using Teams connector. Note! Adaptive card content is included in attachments (content-field).

[
  {
    "id": "123456789",
    "replyToId": null,
    "etag": "123456789",
    "messageType": "message",
    "createdDateTime": "2021-10-18T08:04:11.541Z",
    "lastModifiedDateTime": "2021-10-18T08:04:11.541Z",
    "lastEditedDateTime": null,
    "deletedDateTime": null,
    "subject": null,
    "summary": null,
    "chatId": null,
    "importance": "normal",
    "locale": "en-us",
    "webUrl": "",
    "policyViolation": null,
    "eventDetail": null,
    "from": {
      "device": null,
      "user": null,
      "application": {
        "id": "00000000-0000-0000-0000-000000000000",
        "displayName": "Flow",
        "applicationIdentityType": "bot"
      }
    },
    "body": {
      "contentType": "html",
      "content": "<attachment id=\"123456789\"></attachment>"
    },
    "channelIdentity": {
      "teamId": "00000000-0000-0000-0000-000000000000",
      "channelId": ""
    },
    "attachments": [
      {
        "id": "123456789",
        "contentType": "application/vnd.microsoft.card.adaptive",
        "contentUrl": null,
        "content": "[ADAPTIVE CARD CONTENT]",
        "name": null,
        "thumbnailUrl": null
      }
    ],
    "mentions": [],
    "reactions": []
  }
]

Example Adaptive Card content

{
    "type": "AdaptiveCard",
    "body": [
        {
            "items": [
                {
                    "columns": [
                        {
                            "width": "stretch",
                            "items": [
                                {
                                    "size": "large",
                                    "text": "**Service Health Alert**",
                                    "weight": "bolder",
                                    "type": "TextBlock"
                                }
                            ],
                            "type": "Column"
                        },
                        {
                            "width": "stretch",
                            "items": [
                                {
                                    "color": "attention",
                                    "horizontalAlignment": "right",
                                    "size": "large",
                                    "text": "Warning",
                                    "weight": "bolder",
                                    "wrap": true,
                                    "spacing": "none",
                                    "type": "TextBlock"
                                }
                            ],
                            "type": "Column"
                        }
                    ],
                    "type": "ColumnSet"
                }
            ],
            "style": "emphasis",
            "bleed": true,
            "type": "Container"
        },
        {
            "columns": [
                {
                    "width": "110px",
                    "items": [
                        {
                            "text": "Id",
                            "weight": "bolder",
                            "wrap": true,
                            "type": "TextBlock"
                        },
                        {
                            "text": "Title",
                            "weight": "bolder",
                            "wrap": true,
                            "type": "TextBlock"
                        },
                        {
                            "text": "Service",
                            "weight": "bolder",
                            "wrap": true,
                            "spacing": "small",
                            "type": "TextBlock"
                        },
                        {
                            "text": "Region",
                            "weight": "bolder",
                            "wrap": true,
                            "spacing": "small",
                            "type": "TextBlock"
                        },
                        {
                            "text": "Communication",
                            "weight": "bolder",
                            "wrap": true,
                            "spacing": "small",
                            "type": "TextBlock"
                        }
                    ],
                    "type": "Column"
                },
                {
                    "width": "auto",
                    "items": [
                        {
                            "text": "1NA0F-BJGY2",
                            "wrap": true,
                            "type": "TextBlock"
                        },
                        {
                            "text": "Network Infrastructure - UK South",
                            "wrap": true,
                            "type": "TextBlock"
                        },
                        {
                            "text": "Service Fabric",
                            "wrap": true,
                            "spacing": "small",
                            "type": "TextBlock"
                        },
                        {
                            "text": "UK South",
                            "wrap": true,
                            "spacing": "small",
                            "type": "TextBlock"
                        },
                        {
                            "text": "Starting at approximately 21:41 UTC on 20 Jul 2017, a subset of customers in UK South may experience degraded performance, connectivity drops or timeouts when accessing their Azure resources hosted in this region. Engineers are investigating underlying Network Infrastructure issues in this region. Impacted services may include, but are not limited to App Services, Automation, Service Bus, Log Analytics, Key Vault, SQL Database, Service Fabric, Event Hubs, Stream Analytics, Azure Data Movement, API Management, and Azure Cognitive Search. Multiple engineering teams are engaged in multiple workflows to mitigate the impact. The next update will be provided in 60 minutes, or as events warrant.",
                            "wrap": true,
                            "spacing": "small",
                            "type": "TextBlock"
                        }
                    ],
                    "type": "Column"
                }
            ],
            "spacing": "medium",
            "separator": true,
            "type": "ColumnSet"
        }
    ],
    "version": "1.2"
}

Comments