Kyndryl AIOps

Introduction to Kyndryl AIOps

Server Uptime (All devices (servers) Uptime)
Published On Jun 04, 2024 - 1:23 PM

Server Uptime (All devices (servers) Uptime)

This section shows the Server Uptime (All devices (servers) Uptime) for I-AIOps.
The servers having less than 99% uptime is based upon P1 tickets associated with these servers and the duration from incident creation to incident resolution. It is measured over a 30 days rolling period.

Description

  This IT Health Indicator and insight shows the percentage of servers with less than 99% uptime. The uptime and MTBF is calculated over a period of 30 days rolling period (720 hours), taking into account the duration between created and resolved times of P1 tickets. It also shows for each server the Uptime and MTBF in Hours.
Business Value and benefits:
To measure, assess and improve the quality and reliability of our services and assets.

Metrics

The servers having less than 99% uptime is based upon P1 tickets associated with these servers and the duration from incident creation to incident resolution. It is measured over 30 days (720 Hours) rolling period.
It is required to have the hosts linked to a business application.
(Total Working Time - Total Breakdown Time due to P1 incident tickets) Uptime percentage = ------------ x 100 Total Working Time Servers having less than 99% uptimeServers having less than 99% uptime percentage = -------------------------------------- x 100 all servers for the account Uptime= (Total Working Time - Total Breakdown Time)Mean time between failures (MTBF) = ----------------------------------------------------- Number of Failures (P1 incident tickets)
Servers less than 99% uptime (progress mtd and projection)
  1. There are 2 graphs which provides the trends of Uptime % for the account.
  2. 1st graph is the actual trend for the previous month.
  3. 2nd graph has two sections. The
    blue line
    is the actual trend for month to date (MTD) and the
    blue dotted line
    is the forecast of the trend based on the current month trend for the account.
  4. Red line
    in both graphs depicts the target or threshold of 5% for the account.
  5. So, at any point if the forecast is crossing the threshold, then the account teams will have to check the MTD trend, analyze the repetitive failures on different servers and take corrective actions. This would help to bring the forecast and thus the actual trend below the threshold by end of the month.
The following table provides a description and calculation for each KPIs/Metrics used within the insight.
KPI/Metric Name
KPI/Metric Description
Servers < 99% uptime (In percentage)
Shows the percentage of servers for the account which have less than 99% uptime.
Percent of servers having less than 99% uptime. (Gray 0-5% / Red > 5%).
Number of servers
Total number of servers for the account
Number of Failures
No. Of P1 tickets for all servers in the account in last 30 days
Mean Time Between Failures (in Hours)
MTBF for each server based on the Uptime and No. of P1 Failures
Uptime %
Uptime % is calculated for each server as per the computation method
Total Uptime (in hours)
(Total Working Time - Breakdown Time) of the server
Note - Overlaps of tickets are considered without double counting breakdown time
MTTR (in hours)
MTTR stands for Mean Time to Resolve an Incident.
Its the average time taken for the tickets to be resolved
Number of Failures
No. of P1 tickets for the respective server. It takes into account all P1 incident tickets in Kyndryl scope* associated to the specified server.
* All tickets in scope of Kyndryl including, but not limited to, storage, network, compute, mainframe, database, middleware, applications etc. and excludes customer, vendor (unless they are a direct subsidiary), helpdesk/DWS/service desk resolver group queues.
Total Working Time
Total Server available time for 30 day rolling period i.e., 720 hours/server
Total Breakdown Time
The cumulative time measured between the open date (DD, HH:MM:SS) to the resolution date (DD, HH:MM:SS)of the P1 incident tickets.
Note - Overlaps of tickets are considered without double counting breakdown time
Do you have two minutes for a quick survey?
Take Survey