Kyndryl AIOps

Introduction to Kyndryl AIOps

Home screen overview
Published On Aug 30, 2024 - 10:42 PM

Home screen overview

Learn more about the dashboards, insights, and widgets available on your home screen.
Disclaimer: Connect with Kyndryl representative if you have questions about the documentation and/or insights.
The home screen provides you a view of metrics, dashboards, insights, widgets, and IT Health indicators. Each section describes the essential information like business applications , bring your own data, and health indicators for service management, security and resiliency, management by objective, devices out of capacity or near the end of life.
You can use the dashboards , insights, and widgets available on the home screen to get the following benefits:
  • Cut down resources by reducing toil (Noise reduction) and automating as much of the manual efforts in your user account.
  • Re-skilled or re-deployed resources to concentrate on your needs.
  • Avoid potential business-impacting issues by taking proactive actions.
  • Reduce outages and thus improve uptime and availability on Servers and Applications.
  • Improve and sustain the compliance posture for your user account.

Descriptions of the metrics

When you log into your account, the following metrics are displayed on the landing page:
Name
Description
Range of data
Alerts
Shows active ticketed events grouped by business application. When available, other observability information can also be displayed.
(Amber Event Severity Warning: 3 / Red Event severity Critical: 4 or Fatal: > 5)
Actionable Insights
Shows the insights based on AI/ML cross-referencing multiple data types to address the area of concern or improvement and provide the next best action. You can only see the applicable actionable insights for your account from the insight library.
Current Month to Date (MTD)
Servers having less than 99% uptime
Shows the percentage of the servers for the accounts that have less than 99% uptime, the percentage of servers whose uptime is less than 99%. (Gray 0-5% / Red > 5%).
Rolling 30 days
Business Application MTBF Hours
Shows the Mean Time Between Failure (MTBF) for three applications that have the least MTBF for the account. The total uptime for each application is calculated based on the % Uptime and the Total Working Time % Uptime of application is calculated as the product of the % uptime of all the servers mapped to the business application MTBF for application based on the underlying server's uptime %. (Gray 95-100% / Red 0-95%).
Past 90 days
Total Inventory Items
Shows the number of devices within the inventory. The drill-down also shows the mapping to the business applications.
The latest from SESDR
Remediation with corrective closure
The Automated Corrective Closure of the number is specified as all incident tickets for which a CACF playbook takes corrective action. The Automated Correction Closure percentage is calculated by dividing the Automation Corrective Closure number by all incident tickets in scope. Corrective closure percentage is displayed based on the Automated Corrective Closure number. (Gray > 75% / Amber 50-75% / Red < 50%).
Past 3 months
Incidents per server per month
Incidents per server per month are calculated by dividing all incident tickets in scope (by all servers) of your account.
The default selection for the past 30 days
Best practice deployment
Displays attainment of all policies across the environment. Deviation of the best practices alignment is measured by how many findings (policy check failures) are reported via the best practice checks executed by SDE Automation Tools (SAT). (Gray 90-100% / Amber 85-90% / Red 0-85%).
The default selection for the past 90 days
SSL certificates expiring (<30 days)
Shows all identified SSL certificates that will expire in the next 30 days. Gray certificates mean alive, Amber certificates mean expired, and Red certificates indicate the certificates that will expire within the next 30 days.
The default selection of the next 30 days
P1/P2 active Incidents
Shows all the active (non-resolved) P1 and P2 incident tickets for the account. Grey - If there are P1 & P2 active incidents only in the range of P1 (0-1 day) and P2 (0-2 days), then the active incidents count shows in Grey. Red - If there is even 1 incident that meets the criteria P1 > 1 day or P2 > 2 days, then the active incidents count shows in Red.
The default selection of the past 30 days, historical data from 210 days.
P1/P2 active Problem
Shows the active (non-resolved) P1 and P2 problem tickets for the account. Displays count of problem tickets P1 > 6 days or P2 > 10 days. (Grey P1, 0-6 days and P2, 0-10 days, Red P1 > 6 days or P2 > 10 days).
The default selection of 30 days Historical data from 210 days
Critical Changes
Highlights changes that require additional due diligence to avoid business critical outages. It uses AI/ML to look at the risk on which the change requests are raised. It performs a risk assessment against failure of the change or change-causing incidents. Displays critical change risk for the account within 72 hours. (Gray 0-1 Critical Change / Red > 1).
Upcoming changes in the next 72 hours Historical data for the past 1 year
Unsuccessful Changes
Shows the percentage of changes that failed. Displays % of unsuccessful changes for the account. (Gray = 0 / Amber 0-5% / Red > 5%).
The default selection of the past 30 days. Historical data from 210 days
Patch Overdue %
Shows for which devices the patches have not been applied on time. Displays the % of devices which has patches overdue. (Gray = 0 / Amber 0-2 / Red > 2).
The default selection of the past 1 day.
Devices with health check issues
Shows the percentage of devices that are either missing the health check or have devices that have>5 health check deviations. Displays the devices with either missing health check run or with more than 5 health check deviations. (Gray 0-2% / Red > 2%).
Failed Backups
Shows for which devices the backup has failed. Displays the % failed for the account. (Gray 0-2% / Red > 2%).
Devices out of capacity
Shows for which devices there is less capacity of CPU, Memory, and/or Disk than best practice recommendation. Displays the % of devices that are running out of capacity for the account. (Gray = 0-10 / Amber is 10-30% / Red > 30%).
Past 3 months
Devices EOS/EOL
Shows the End of Support or End of Life dates that have passed or are due within one year. Displays % of devices that are at the End of Life (EOL) or End of Support (EOS) for the account. (Gray = 0-10 / Amber is 10-30% / Red > 30%).
The latest from HWSW Inventor

Data refresh frequency

The following table provides an overview of the refresh frequency for the different types of data.
All insights using this data are not refreshed at the same rate. Real-time should be read as "near real-time".
Mode of refresh
Date Type
Schedule
Real-time
Events
Every 2 min
Real-time
Incident tickets
Every 2 min
Real-time
Change requests
Every 2 min
Real-time
Problem tickets
Every 5 min
Real-time
Service tickets
Every 5 min
Real-time
Automation tickets
Every 30 min
Daily
Inventory
Daily
Netcool LDS
Do you have two minutes for a quick survey?
Take Survey