KPI/Metric Name | KPI/Metric Description | Data Range |
Alerts | Shows active ticketed events grouped by business application. When available, other observability information can be displayed as well. (Amber Event Severity Warning: 3 / Red Event severity Critical: 4 or Fatal: > 5) | - |
Actionable Insights | Shows insights based on AI/ML cross referencing multiple data types to address the area of concern or improvement and provide the next best action. Only applicable actionable insight for the account from the insight library are shown. | Current Month to Date (MTD) |
Servers having less than 99% uptime | Shows the percentage of servers for the account which have less than 99% uptime. Percent of servers whose uptime is less than 99%. (Gray 0-5% / Red > 5%). | Rolling 30 days |
Business Application MTBF Hours | Shows the Mean Time Between Failure (MTBF) for 3 applications which has the least MTBF for the account.The total uptime for each application is calculated based on the % Uptime and the Total Working Time %Uptime of application is calculated as the product of the % uptime of all the servers mapped to the business application MTBF for application based on underlying server's uptime %. (Gray 95-100% / Red 0-95%). | Past 90 days |
Total Inventory Items | Shows the number of devices within the inventory. The drilldown also shows the mapping to the business applications. | Latest from SESDR |
Remediation with corrective closure | The Automated Corrective Closure number is specified as all incident tickets for which a CACF playbook took a corrective action. The Automated Correction Closure percentage is calculated by dividing the Automation Corrective Closure number by all incident tickets in Kyndryl scope.
Corrective closure percentage based on Automated Corrective Closure number. (Gray > 75% / Amber 50-75% / Red < 50%). | Past 3 months |
Incidents per server per month | Incidents per server per month is calculated by dividing all incident tickets in Kyndryl scope (by all servers) of the account.
Only servers, which do not include network devices, storage devices etc. Incidents per server/month prorated per day. (Gray 0-0.8 / Amber 0.8-1.0 / Red > 1.0). | Default selection for past 30 days. Historic data of 210 days |
Best practice deployment | Displays attainment of all policies across environment. Deviation of best practices alignment is measured by how many findings (policy check failures) are reported via the best practice checks executed by SDE Automation Tools (SAT). (Gray 90-100% / Amber 85-90% / Red 0-85%). | Default selection of past 90 days |
SSL certificates expiring (<30 days) | Shows all identified SSL certificated which expire in the next 30 days. # of SSL certificates expired or expiring in 30 days. (Gray certificates alive / Amber certificates expired / Red certificates expiring within next 30 days). | Default selection of next 30 days. |
P1/P2 active Incidents | Shows all the active (non-resolved) P1 and P2 incident tickets for the account. Grey - If there are P1 & P2 active incidents only in the range of P1 - 0-1 day and P2 0-2 days, then the active incidents count shows in Grey Red - If there are even 1 incident which meets the criteria P1 > 1 day or P2 > 2 days, then the active incidents count shows in Red | Default selection of past 30 days. Historical data of 210 days |
P1/P2 active Problem | Shows the active (non-resolved) P1 and P2 problem tickets for the account. Displays count of problem tickets P1 > 6 days or P2 > 10 days. (Grey P1 0-6 day and P2 0-10 days / Red P1 > 6 day or P2 > 10 days). | Default selection of 30 days Historical data of 210 days |
Critical Changes | Highlights changes which require additional due diligence to avoid business critical outage. It uses AI/ML to look at the risk which the support teams raised the change on and performs a risk assessment against failure of the change or change causing incidents. Displays critical change risk for the account within 72 hours. (Gray 0-1 Critical Change / Red > 1). | Upcoming changes in next 72 hours Historical data for past 1 year |
Unsuccessful Changes | Shows the percentage of changes which failed. Displays % of unsuccessful changes for the account. (Gray = 0 / Amber 0-5% / Red > 5%). | Default selection of past 30 days. Historical data of 210 days |
Patch Overdue % | Shows for which devices the patches have not been applied on time. Displays the % of devices which has patches overdue. (Gray = 0 / Amber 0-2 / Red > 2). | Default selection of past 1 day. |
Devices with health check issues | Shows the percentage of devices which is either missing the health check or have devices which has >5 health check deviations. Displays the devices with either missing health check run or with more than 5 health check deviations. (Gray 0-2% / Red > 2%). | - |
Failed Backups | Shows for which devices the backup has failed. Displays the % failed for the account. (Gray 0-2% / Red > 2%). | - |
Devices out of capacity | Shows for which devices there is less capacity of CPU, Memory and/or Disk than best practice recommendation. Displays the % of devices which are running out of capacity for the account. (Gray = 0-10 / Amber is 10-30% / Red > 30%). | Past 3 months |
Devices EOS/EOL | Shows the End of Support or End of Life dates that have passed or are due withnin one year. Displays % of devices which are at the End of Life (EOL) or End of Support (EOS) for the account. (Gray = 0-10 / Amber is 10-30% / Red > 30%). | Latest from HWSW Inventor |
Mode of refresh | Date Type | Schedule |
---|---|---|
Real-time | Events | Every 2 min |
Real-time | Incident tickets | Every 2 min |
Real-time | Change requests | Every 2 min |
Real-time | Problem tickets | Every 5 min |
Real-time | Service tickets | Every 5 min |
Real-time | Automation tickets | Every 30 min |
Daily | Inventory | |
Daily | Netcool LDS |
Metric Name | Purpose | Frequency of Usage | Persona using the Insight |
---|---|---|---|
Alerts | Account Business Health | Daily | Delivery Partner , Delivery Manager, SRE and team leaders |
Actionable Insights | Address areas of concern and areas of improvement | Weekly | SRE, Delivery Partner, Delivery Manager, T&I and team leaders |
Servers having less than 95% uptime | Reduce Outages in the system | Weekly | SRE, Delivery Partner, Delivery Manager, T&I and team leaders |
Business Application MTBF Hours | Reduce Outages in the system | Weekly | SRE, Delivery Partner, Delivery Manager, T&I and team leaders |
SSL Certificate Expiring | Reduce Noise in the system | Weekly | SRE, Delivery Partner, Delivery Manager, T&I and team leaders |
Remediation with Corrective Closure | Reduce manual activities | Weekly | SRE, Delivery Manager and T&I |
Incidents per server per month | Reduce the noise in the system | Daily or Weekly | SRE, Delivery Partner, Delivery Manager, T&I and team leaders |
Best Practice deployment | Apply industry best practices | Weekly | SRE, Delivery Manager and T&I |
P1/P2 active Incidents | Account Business Health | Daily | Delivery Partner , Delivery Manager, SRE and team leaders |
P1/P2 active Problem | Account Business Health | Daily | Delivery Partner , Delivery Manager, SRE and team leaders |
Critical Changes Upcoming in next 3 days | Account Business Health | Daily | Delivery Partner , Delivery Manager, SRE and team leaders |
Critical Changes Upcoming in next weekly CAB cycle | Account Business Health | Weekly CAB | Delivery Partner , Delivery Manager, SRE and team leaders |
Critical Changes Changes which require less review rigor | Account Business Health | Quarterly Continuous Improvement meetings | Delivery Partner , Customer, Delivery Manager, SRE and team leaders |
Unsuccessful Changes | Account Business Health | Weekly | Delivery Partner , Delivery Manager, SRE and team leaders |
Patch Overdue % | Account Compliance Posture | Weekly | SRE , ISA (Integrated security analyst) and team members |
Devices with health check issues | Account Compliance Posture | Weekly | SRE , ISA (Integrated security analyst) and team members |
Failed Backups | Reduce backup failures | Weekly | SRE, Delivery Partner, Delivery Manager, T&I and team leaders |
Devices out of capacity | Capacity management for the account | Weekly | SRE, Delivery Partner, Delivery Manager, T&I and team leaders |
Devices EOS/EOL | Obsolescence management | Monthly | SRE, Delivery Partner, Delivery Manager, T&I and team leaders |