Technology Risk – Why your risk appetite can leave you biting off more than you can chew

2nd July 2020

Risk Management Quality Assurance

Better understanding your technical risk

As your organisation looks to increasingly leverage DevOps approaches and automation to enable Digital delivery at pace, your risk position needs to be front and centre of every stage of your delivery pipeline.

Illustration of a semi circular dial with the needle pointing up to an area labelled medium, with areas labelled low and high on either side

These risks can be identified through evaluation of the required quality attributes of your products. This enables your teams to build and implement the technical solutions that deliver the business outcomes your customers demand and also the technical capabilities your teams demand to deliver efficiently and effectively.

Building Quality in

The book Accelerate: The Science of Lean Software and DevOps: Building and Scaling High Performing Technology Organizations (Nicole Forsgreen PhD, Jez Humble, Gene Kim), is one of the most widely accepted books on lean software and devops and suggests 4 key indicators that measure the software delivery performance which provide correlation to the success of the organisation.

These indicators can be used to measure the difference between “Elite” performing and “Low” performing organisations

1. Change Lead Time

“Elite” performers have a lead time for changes of less than 1 day and “Low” performers have a lead time for changes that is between 1 month and 6 months

2. Deployment Frequency

“Elite” performers have an on-demand deployment frequency (on demand - multiple deploys per day) while “Low” performers usually deploy to production between once per month and once every 6 months

3. Change Failure Rate

“Elite” performers have a change failure rate between 0–15% and “Low” performers have a rate from 46–60%

4. Mean Time to Restore (MTTR)

“Elite” performers have an MTTR that is less than 1 hour, and “Low” performers have a MTTR that is between 1 week and 1 month

Chart explaining the relationship of performance metrics: Lead Time, Deployment Frequency, Failure Rate and Time to Restore

Source: https://medium.com/sourcedtech/top-4-metrics-to-measure-your-software-delivery-performance-4a693665cbb4

This blog explains how assessing the quality characteristics through the lens of testing and delivering in DevOps can open up new insights into not just product risk but process risks in how you deliver solutions that may be impacting the value you deliver to your customers.

By improving your process and approaches, you can take your software delivery performance to “Elite” level by better understanding the risks contained in your delivery approach and taking steps to address and mitigate these.

QC - Maintainability

The relentless drive to deliver new value for customers can often mean a constant demand to deliver new functionality, often at the cost of creating technical debt. This can impact an organisation’s ability to perform well against the four key indicators described above that make the difference between “Elite” and “Low” performing businesses.

To address this, we recommend the following;

Modularity – Design systems that are loosely coupled. This enables faster build and test and enables more frequent deployment of small changes.

Analysability – Design and document systems to ensure that a change (and its impact) is clear and understood. This enables the change(s) to be made with a known footprint, a clear risk assessment and enables a risk-based testing approach. This in combination with well-designed and implemented logging can reduce change failures and accelerate time to restore when failure events occur.

Testability – Design and build systems that can be observed, controlled and understood. This enables tests and feedback loops to be atomic (i.e. small tests with single responsibility) and implemented at multiple levels across your technology stack. Better testability enables faster lead times through more efficiency, increased deployment frequency through risk-based testing, reduced change failure rate through deeper test coverage and faster time to restore through targeted fix testing using atomic automated testing.

QC - Compatibility

In modern service based architecture models, compatibility is key to ensuring services can communicate and interact with expected results. This requires the service design and interactions to be agnostic of the inner workings of other services.

Interoperability – Invest time and effort to define the interoperability requirements of your system. Tools such as SwaggerHub will aid design and document API’s and services.

Teams can then use contract testing to verify interaction with mock-ups that replicate interaction with other services. This can reduce change fail rates by providing fast feedback at the development stage and accelerate lead time by removing dependency on multi-tenancy environments to.

QC - Reliability

When your product is your business, reliability is a key attribute. Your products capability to deliver a service dependably and accurately builds your customers trust in the service. Unexpected loss of service, performance or data can drive customers to your competitors.

Recoverability – The cost of a failure event is often not fully risk assessed or understood until it takes out your production system, we see this regularly during deployments and upgrades.

Investing in a repeatable deployment process (using capabilities such as Continuous Integration and Development tooling) will allow you to consistently deploy and learn across multiple environments before releasing to your customers.

Techniques such as site reliability engineering ensure that teams learn constantly about how their systems operate and continually improve their systems based on these learnings, ensuring better availability and faster Mean Time to Restore.

Fault tolerance – The days when a fault could take down an application should be behind us. When designing and implementing digital solutions using services, your architecture needs to be dependency mapped, with failure scenarios identified through risk analysis. This will inform a design that can insert circuit breaker patterns to reduce the probability of full outage due to a service failure (https://martinfowler.com/bliki/CircuitBreaker.html).

Summary

Technical risk doesn’t simply exist within the products you deliver, but also in how you deliver your products. You need to improve your focus on how design, deliver and implement your solutions.

With a clear focus on addressing the risks to deliver against the four key indicators we have explained in this blog, you can improve the service you provide for your customers and drive your organisations success.

In our White Paper, “The Evolution of Risk”, we demonstrate how you can use quality attributes to determine the risk that REALLY exists within your delivery.

2i’s unique AssureRMF improvement process will deliver the Risk Intelligence (RQ) that will enable the insight you require to take action to transform your digital delivery processes to “Elite” level.