Kyndryl AIOps

Introduction to Kyndryl AIOps

Alerts
Published On Jun 04, 2024 - 1:23 PM

Alerts

Alerts show both critical and warning alerts from ticketed and to-be ticketed events shown by the business application they impact.
Infrastructure and related application health events can be detrimental to the health of an infrastructure device. The panel displays alerts of potentially damaging events sequentially from newest to oldest, top to bottom. As changing conditions associated with specific devices affect applications, the console displays each alert with a time and date stamp.
The Integrated AIOps Alerts insight into the account management system shows real-time business critical applications health. It identifies which proactive actions need to be take to avoid that the business application becomes chronic.The IT Health view provides the delivery team with the next best course of action to take to address the area of concern.
An alert is visible for applications in the dashboard when one or more of its servers are in
Red
or
Amber
status. The red, and amber statuses are used to indicate the severity of the reported issue. Use the scrollbar in the card to view the other alerts. The top right blue square with outpointing arrow opens the view on all business application. Clicking any of the items redirects to the alerts detail page.

Business Value and Benefits

Provides the observability to show what is currently going on for the (affected) business applications. This can included ticketed events, but going forward also other information coming from Application Performance Monitoring (APM) and Digital End-user Experience the customer may have deployed.

Metrics

The Alerts panel shows both critical and warning alerts with critical alerts listed first, as mapped to their business applications.
There are 5 severity categories for event severity:
  • Fatal (6)
  • Critical (5)
  • Major (4)
  • Minor(3)
  • Warning (2)
  • Undefined (1)
  • Clear (0)
The Event Severity is seen in
Alerts
and
Health
cards, associated with an application. In most cases, the severity categories depend on the account setup.
Below are the existing incident ticket severities of the AIOps Health view.
(Green) Healthy: 0 = Clear; 1 is undefined or unknown; 2 is warning (Amber) Warning: 3 = minor; 4 major (Red) Critical: 5 = critical (Red) Fatal: 6
The following table provides a description and calculation for each KPIs/Metrics used within the insight.
KPI/Metric Name
KPI/Metric Description
ID
System ID for DC Resources and Resource ID for MC Resource
Host Name
Hostname for DC Resources and Resource Name for MC Resources
Health
The current health status of the resources that support the application, including Critical, Warning, and Healthy
Status
Active or Inactive
Provider
Service provider associated with that application, such as AWS, Azure, and IBM
Provider Account
The unique provider account ID
Resource Category
Compute, Database, Network, Mainframe etc.
Resource Type
EC2, Elastic Load Balancing, BatchV, etc.
OS
Operating System of the Resources
Application
The application with which the resource is associated
App category
Category to which application belongs
Region
Geographical region serviced by the resource (Example: Europe)
Location
The exact location from where the resources are available (Example: Munich)
Health
Server health event status

Use cases

Use case 1: Call user's attention to critical event alerts on the landing page
The Alerts panel on the top left corner of the landing page brings to attention all the critical and high event alerts for servers mapped to their business applications. These are active ticketed events. The panel shows the newest to oldest. All "critical" alerts are listed before "high" alerts.
Use case 2: Show event details and the associated tickets so that user can avoid logging in to different tools like M&E and ITSM.
The alerts detail page and events detail page provide details of the alerts on the server hostname and the tickets associated with its latest status. The data is updated every 2 mins.
Do you have two minutes for a quick survey?
Take Survey