Insights Radical Incident Reduction – The 30-30-30 Program

Radical Incident Reduction – The 30-30-30 Program

Want to pump up your IT delivery with a program that excites executives, provides great business value and greatly lowers unplanned labor costs, incident volumes and service outages? The 30-30-30 Program is a fast-paced attention getting program that targets a 30% reduction in incident counts, 30% reduction in resolution times and a 30% increase in support staff skills for getting to root causes and eliminating the incidents they cause. This presents how the program works and presents some examples used in a large IT organization. The approach described can be used in any IT organization.

One of the biggest headaches in any IT organization is that of unplanned labor. This is time spent by IT executives, managers and support staff to deal with incidents and outages, rework, and recovery activities. In today’s DevOps world, this is called technical debt. It carries a price tag. Typically $62.00 per hour on average for most IT organizations today. This cost doesn’t even include other costs such as project delays, poor customer satisfaction, penalties and fines that are triggered when everyone has to drop what their working on to deal with these unplanned issues.

The main objective of the 30-30-30 Program is to target this headache. The 3 key objectives include:

Reduce Incidents – Proactively trend incidents and initiate actions to remove their underlying root causes thereby stopping incidents before users see them and reducing unplanned labor costs

Reduce Incident Impact – For those incidents that can’t be readily removed, reduce times to initiate workarounds and resolutions taking actions to minimize incident impact

Increase Skillsets – Upgrading Service Desk and support team analysis and problem solving skills to reduce the amount of labor spent on analyzing incidents and getting to root cause.

Here are the steps you can take to get your program going:

Step 1 – Ensure Reporting Is In Place for Incidents

Make sure you have adequate reporting in place that can provide basic metrics – these include:

  • Incident counts (ideally by service, application/device or support team)
  • Incident durations – typically from ticket creation to ticket resolution
  • At least 6 months to a year of ticket history

In addition, you will also need:

  • Problem Ticket logging and reporting capabilities
  • Continual Service Improvement (CSI) Register for logging and prioritizing resolution activities and their progress

While the above can be found in many IT Service Management systems, you can also handle these with spreadsheets if those don’t exist.

In this step you will baseline your current unplanned labor costs. Here is an approach (taken from a real IT organization) to do that:

  1. Get the overall IT annual labor budget ($125M)
  2. Get the number of IT employees  (1,100)
  3. Determine the average IT salary cost (A/B or $113,636)
  4. Estimate average hours worked in a year for IT employees (1,832 – an industry average)
  5. Calculate the IT hourly labor cost (C/D or $62)
  6. Get the Incident ticket duration in hours for each incident ticket (from your reports)
  7. Apply a Labor Factor (Percentage of ticket duration that incurred labor versus time the ticket was just sitting in a queue waiting – it’s okay to estimate – our example used 5% for all tickets)
  8. Calculate the ticket unplanned labor cost for each ticket (from above: E * F * G)  and sum this for each incident ticket
  9. Divide sum of unplanned labor costs (H) by overall labor budget (A) to get the unplanned labor rate

Step 3 – Agree the Program with Senior Executive Leadership

The results from Step 2 can be put in a spreadsheet or reporting tool and presented to executive leadership for their buy-in. Here is an example of what was shown to senior executives in our example IT organization:

Every dollar shown is wasted labor. Our example leadership did not like knowing that telephone incidents were costing them $2.3M a year, email issues $2.4M a year, etc. The target of the Program was to cut these costs by at least 30% across the board. This was what got buy-in from leadership to support the program and the areas highlighted above were to receive the most focus.

You can also highlight the unplanned labor portion of their IT budget. This is simply totaling all the annual unplanned labor costs and dividing by the IT annual labor budget. For example, if your IT organization spends $10M on labor and the unplanned labor adds up to $2M, then 20% of the IT budget is being wasted on unplanned labor activities – not a percentage many executives like to see!

It is important the executive leadership be visible and seen as leading this program. This will set the proper tone for staff and management support of this initiative.

Step 4 – Train On Problem Management Core Concepts

For this step, establish a series of communication events to introduce and train staff on the program and problem management best practices. These events should include:

  • An overview of the 30-30-30 Program
  • Training on Problem Management processes and techniques
  • Staff and management roles and responsibilities for the program

Guidance and resources for Problem management processes and techniques can be found in many places throughout the internet. The ITIL 2011 Service Operation Book (Problem Management chapter) and 3rd parties such as Kepner-Tregoe and others also come to mind.

Step 5 – Arm Staff With Techniques That Get To Root Cause

As part of training, it also helps to establish a Problem Management Toolkit as a support aid for IT staff and management. This is just an inventory of different techniques that can be used to quickly control activities and drive down to root causes. A list of these might include techniques such as:

  • 5 Whys
  • Fault Isolation
  • Hypothesis Testing
  • Observation Post
  • Pareto Analysis
  • Kepner and Tregoe Analysis
  • Affinity Mapping
  • Chronological Analysis
  • Pain Value Analysis
  • Fault Tree Analysis
  • Trend Analysis
  • Causal Loop Analysis
  • Service Outage Analysis
  • Problem Brainstorming

While not enough space here to delve into each in detail, much information on any of these can be found on the internet (type any of the above into a search engine). The ITIL book chapter mentioned earlier also describes many of these.

As a starting point, focus on the 5-Whys. This is very simple but also very easy to use and effective. The steps in this technique are:

  1. Describe what took place
  2. Ask “Why?”
  3. Listen to answer given
  4. Ask “Why?” again
  5. Repeat steps 2-4 until root cause is identified

As a guide, the table below lists the above techniques and what situations they may best be deployed in:

Step 6 – Proactively Raise Problem Tickets and Monitor Progress

At this point, have IT staff, management and support teams undertake activities to proactively identify root causes and activities to reduce incidents and their duration. Some key considerations when undertaking this effort include:

  • We want low-hanging fruit – don’t spend lots of time identifying actions that yield little reduction – however tiny efforts and no-brainers should be noted as many of these done in aggregate might yield significant savings
  • Focus on outcomes – every reduction opportunity proposed must provide an estimate of how many tickets will be avoided and an estimate for unplanned labor cost savings (using the approaches described in Step 2 earlier)
  • Document the cost saving opportunities from Step 2 above. – these will be used to confirm and prioritize everyone’s actions (see Step 7 – findings may show that many teams might need to be involved to implement the improvements)
  • Recognize that root causes may not always be technical (e.g. lack of skills, training, communication, no ownership, poor vendor support may also be root causes)

Some key Program roles and activities to undertake from a program perspective can include:

Step 7 – Implement Actions to Resolve Problems or Reduce Impact

A register should be maintained as each team identifies improvement actions. Actions get logged into the register along with their estimated cost saving opportunities. The Problem Manager then coordinates agreement across IT to prioritize which actions will take place. A monthly meeting is suggested to do this along with a review of progress of any actions agreed in previous meetings.

At a minimum, the CSI Register should contain the following:

  • ID Number to quickly reference the action (e.g. X001)
  • A brief title or description of the action (e.g. Fix Application XYZ User Lockouts)
  • A detailed description (e.g. Remove duplicate active directory entries to avoid access errors…)
  • Effort estimate (e.g. 1 month, 3 months, 9 months, etc…)
  • Savings Estimate (e.g. $40-60K)

The above is reviewed by the Problem Manager and key stakeholders to prioritize and identify action items to be undertaken. Those that are approved are then assigned to support teams and service owners for implementation. 

To communicate Program activities to key stakeholders and executives, the chart below presents one way of summarizing what the Program is undertaking:

Figure 2: Cost Savings Presentation Example:

At a minimum, the Program should provide communications on a monthly basis showing business based outcomes such as lower costs and incident counts.

On an ongoing basis, the Problem Manager also monitors compliance to the Program. This can be done by looking at the incident counts for each support team or service and comparing that against things like number of problem tickets raised, total cost savings identified and implementation assignments. The bottom line is to make sure each team or service is proactively working to improve things and not just being reactive.

Challenges You May Run Into

As a last note, the table below identifies challenges you may run into with this Program and what you might do about them: