Cloud Services

DevOps Intelligence

Expand menu Collapse menu

DevOps Performance metrics

Published On Aug 28, 2024 - 10:15 AM

DevOps Performance metrics

Learn about DevOps Research Assessment metrics and their role in evaluating and improving DevOps practices within organizations.

The key metrics identified by the Google DORA (DevOps Research and Assessment) team for measuring the performance of software development teams include Deployment Frequency, Change Lead Time, Change Failure Rate, and Mean Time to Recovery. They provide a comprehensive overview of team performance and are vital for evaluating and enhancing your software development process.

DevOps performance metrics are pivotal for DevOps teams as they offer insights into software development, delivery, and maintenance efficiency. They distinguish between high, medium, and low-performing teams, establishing a standard that aids organizations in continuously enhancing DevOps performance for superior business results.

These metrics facilitate continuous improvement by offering tangible indicators for iterative adjustments and refinement. They also aid in benchmarking a team's performance against industry standards, promoting goal setting and performance enhancement.

DORA comprises four metrics:

Deployment frequency metrics: The frequency with which an organization successfully releases to production.

Change lead time metrics: The amount of time required to effect a release to customers.

Change failure rate metrics: The percentage of deployments causing a failure in production.

Mean time to recovery metrics: The amount of time required for an organization to recover from a failure in production.

Data for these metrics is retained for 180 days. Users can choose to view data for any time within the last 180 in the following graduations:

1 day

7 days

14 days

30 days

60 days

90 days

180 days (all retained data)

Custom date range.

Using the Select duration
feature located on the right directly above the graphic displays, the user can select the data history range from a dropdown menu containing these selections.

Deployment Frequency

Deployment frequency is a key performance indicator in DevOps, measuring the rate at which an organization deploys new code or releases software to end-users in a production environment. This metric showcases the speed and agility of a development team, as more frequent deployments typically imply a faster, more responsive development cycle.

Each successful deployment to production will be counted as one, even if the same release is deployed to multiple production environment.

The graph represents the total and average number of deployments for the selected history period. It is essential to map the application across all technical services to ensure the accuracy of data flow in the graphs.

For a more in-depth analysis, click on the launch icon. This action will open a detailed view of a graph and a corresponding table. Similar to the summary graph, you can adjust the time frame for the data in this detailed view.

The details table includes technical service, application name, status, duration, deployment date, release, and environment.

The Deployment frequency table displays all data based on the graph's selected time frame and application. All columns in this table can be sorted. Above this table, you will find a search box that allows searching by Technical services and Applications by name.

Change Lead Time

Change lead time is a crucial DevOps metric that gauges the average duration between when a code change is committed and when that change is successfully deployed into the production environment. This metric reflects your development pipeline's overall efficiency and responsiveness, encompassing all process stages, from code integration and testing to the actual deployment.

The graph represents the total and average number of deployments for the selected history period.

The details table contains the application name, first commit, deployment date, change lead time, and epic details with a redirecting link.

Change Lead Time table displays all data based on the graph's selected time frame and application. Only the Application column can be sorted. Above this table, you will find a search field that allows searching the Application by name.

Change Failure Rate

The Change failure rate is a significant metric in DevOps that signifies the percentage of deployed changes that fail in the production environment necessitating an immediate fix. These failures could be service degradation, system outages, functionality issues, or any other significant problems that negatively impact the end-user experience or system stability.

CFR is mathematically defined as

Deployments Failed
/ Total Number of Deployments

Using this calculation, the interactive graph displays the total and average failure rate of deployments over a selected period.

The detail table provides information such as application name, change failure rate, and incident details. The table includes selectable links that redirect you to relevant sections or additional information for easy navigation.

Change failure rate table displays all data based on the graph's selected time frame and application. Only the Application column can be sorted. Above this table, you will find a search field that allows searching the Application by name.

Mean Time to Recovery (MTTR)

Mean time to recovery (MTTR), also known as Mean Time to Repair, quantifies how rapidly a team can restore service following a failure that impacts customers. It is the elapsed time between the CreationTime
and ResolvedTime
where

Creation Time
is the time at which an incident was created and Resolved Time
is the time at which the incident was promoted to the Resolved state. MTTR is mathematically defined as

Resolved Time
— Creation Time

The graph represents the total and average number of deployments for the selected history period. This graph allows you to filter data by priorities, by default, All priorities
is selected.

The details table includes the Technical service, Application, Number of Incidents, Mean Time To Recovery, and Incident details. For easy navigation, redirecting links are incorporated within the table.

Mean time to recovery table displays all data based on the time frame selected. All columns in this table can be sorted except the Incident
column. Above this table, you will find a search box that allows searching by Technical services and Applications by name. In the case of ServiceNow (NOW), MTTR is calculated from the SNOW Start Time.

Benchmarking DORA metrics

DORA metrics enables the practice of benchmarking
. Benchmarking is the practice of comparing your software development team's performance to industry standards or best practices using the previously identified four key metrics against four threshold standards. These standards are cataloged according to the following table:

DORA benchmark categories
Performance level	Deployment Frequency	Change Lead Time	Change Failure Rate	Mean Time to Recovery
Elite	More than once each day	Less than one day	Up to 5%	Less than one hour
High	Once each day to once each week	Between one day and one week	5% – 10%	One hour to one day
Medium	Once each week to once each month	Between one week and one month	10% – 15%	One day to one week
Low	Nore than once each month	More than one month	More than 15%	More than one week

Configuring DORA metrics

Configuring DORA metrics for data collection is crucial in measuring your team's software development performance. To accurately track Deployment Frequency, Lead Time for Changes, Change Failure Rate, and Time to Restore Service, you must set up a data collection pipeline. This process often involves integrating your code repositories, like GitHub or GitLab, with data aggregation and visualization tools.

Open-source projects like Four Keys can facilitate this by automatically setting up a data ingestion pipeline and compiling the data into a comprehensible dashboard. This setup helps to monitor your metrics and view trends over time, providing valuable insights into your team's operational efficiency and effectiveness.

Adding Deployment information

Deployment data can be obtained in one of two ways:

Adding the connection where release info is available (tools such as,ArgoCD) and add the service configuration from the tools configuration page

Posting the Record from the Bring Your Own Deployment

When adding a connection, the Incident Host URL fields should be populated with your ServiceNow instance host url.

To post the record from the BYODeployment, refer to the configure page, here: Bring Your Own Deploy. Ensure the following:

The fields Environment
and Release
match the provided change request.

The itsm_host and endpoint hostname match the ServiceNow instance url.

If Posting the data by Bring Your Own Deployment set isProduction
to true.

Deployment frequency

This section describes how to configure data collection for Deployment frequency:

Connect to your release info source

You can connect existing tools, where release information is available, and add the service config from the tool's configuration page. For instance: If you're using ArgoCD, the service config should be created with the details available at the documentation page for configuring that service, in this case, Configuring ArgoCD . Instructions for configuring all supported services are available in the DevOps Intelligence documentation (Kyndryl Docs) by clicking the Supported tools and integration
tile

Remember to populate the Incident Host URL
fields with your ServiceNow instance host URL.

Use the BYODeployment API

Alternatively, you can post the record from the BYODeployment using supported APIs. To find out more, click the following link: DevOps Intelligence API Integration. Ensure that the fields environment
and release
match those provided in the change request. Also, make sure that itsm_host
and 'endpoint hostname' match your ServiceNow instance URL. If you post the data via BYODeployment, set isProduction
to true.

Change lead time (CLT)

This section presents configuring Change lead time from BitBucket and GitHub, GitLab, and Azure DevOps.

Atlassian BitBucket:

BitBucket Cloud classifies issues as 'proposal' for epic tasks and 'task' for user stories.

Link tasks to their proposals using a specific format in the description:

Proposal:<space>

followed by the full URL link to the proposal. Make sure to create a PR for each task when merging code. The PR body should contain the full link to the task:

The Description of the task should start with `Proposal:<space>

full URL link to proposal'Once done, All tasks will be connected to that proposal.When working on any task, create a PR while merging code.PR body should have a full link to the task link.The issues will start getting PR data and what all PRs are there for it. The same will be flown to Proposal.If Cherry Pick PR, the PR body should have a full link to the task and also a link to the default branch PR full link. So that we can track cherry-pick PR for which PR in master.

Enable protection rules for release branches to prevent creation/deletion.

Provide a release format prefix, e.g., 'release/' (BitBucket's default format).

For more information on BitBucket configuration, please see: Configuring Atlassian Bitbucket

GitHub and GitLab:

On GitHub and GitLab, use the '

Epic

' label for epic tasks and '

User Story

' for user stories.

Link user stories to their epic tasks using a specific format in the description:

The Description of the issue should start with `Epic:<space> full URL link to the epic [the label created on the first step and here should be the same]

Once done, All stories will be connected to that Epic

When working on any story, make sure you create a PR for the same while merging code.

PR body should have a full link to the user story task

The issues will start getting PR data and what all PRs are there for it and flown to Epic.

If Cherry Pick PR, the PR body should have a full link to the user story task and also a link to the default branch PR full link. So that we can track cherry-pick PR for which PR in master.

For Cherry Pick PRs, include the full link to the user story task and the default branch PR full link in the PR body for tracking.

Enable protection rules for release branches.

Provide a release format prefix, e.g., 'release-20' (all releases should start with this).

For more information on GitHub and GitLab configuration, please refer to the following links: Configuring GitHub and GitHub Enterprise and Configuring GitLab

Azure DevOps:

In Azure DevOps, create feature and user story tasks for handling epics.

Connect user stories to their epic tasks using a specific format in the description:

Feature:<space>

followed by the full URL link to the feature. Create a PR for each story when merging code, and include the full link to the user story in the PR body. The Description of the user story should start with:

`Feature:<space> full URL link to feature

Once done, All stories will be connected to that Feature.

When working on any story, create a PR while merging code. PR body should have a full link to the user story task. The issues will start getting PR data and what all PRs are there for it and the same will be flown to Epic.

If Cherry Pick PR, the PR body should have a full link to the user story task and also a link to the master branch PR full link, so that we can track cherry-pick PR for which PR in master

Enable protection rules for release branches.

Provide a release format prefix, e.g., 'release-20' (all releases should start with this).

For more information on Azure DevOps configuration, please see: Configuring Azure DevOps.

Once the release cut is done and the release branch is created, it automatically links all releases with User Stories. The Epic Change lead time calculation is determined from the first User Story's first PR merge date to the Last User Story's last merge and release deployment time.

To use Jira as an issue-tracking tool and another source control tool, you must provide the SCM Host URL when configuring the Jira Connection. Note that if you specify a separate source tool in the SCM Host URL, a separate IAM connection for that specific source tool should also be established.

Change failure rate (CFR)

For consistent data collection, ensure you format the 'short description' field in the change request as follows:

'***deployment to ENVIRONMENTNAME *** of release RELEASENAME ***'

For "deployment to" and "of release" are required keywords. The placeholders marked with three asterisks (***) should be replaced with appropriate values.

The following image depicts the example of a short description field in a change request. Data will be visualized when incidents are in a closed state.

The Incident should also be mapped to the initiating ChangeRequest by giving the change request ID in the Related Records section Change Request
" and Cause By Change
field.

Linking Incidents with Change Requests

MTTR calculations are performed for Incidents that are either in a closed
or resolved
state. All the incidents that are moved to a resolved or closed state within 180 days are synched. While creating Incidents, the Configuration Item
field should be populated, which will be mapped to TechnicalServiceName in DevOps Intelligence. If the Configuration Item field is not populated, then by default, No CI
will be marked for such incidents and reflected in the Technical Service
column of the MTTR details page.

To correlate incidents with their respective Change Requests, provide the Change Request ID in the 'Related Records' section under the 'Change Request' and 'Cause By Change' fields when creating an incident. The following image presents sample information of ServiceNow.

Mean time to recovery

Mean time to recovery calculations are performed for incidents in a closed
or resolved
state. When creating incidents, ensure that the Configuration Item
field is populated. This field maps to the

TechnicalServiceName
in DI. In cases where the Configuration Item field is not populated, No CI
will be labeled by default for such incidents, which will then be reflected in the Technical Service
column of the MTTR details page.

Do you have two minutes for a quick survey?

Take Survey