Predicting Future Incident Counts - Use Regression Analysis!

Regression Analysis is one of many ways in which forecasting and prediction can be done. This presents a brief step-by-step approach, using observed incident counts, to predict future incident trends to create a strong business case to executives for investing in further actions and improvements.

Regression Analysis is a statistical approach that can be utilized to predict future values based on a time-series of observations of some independent variable. This approach can be used to perform forecasting using generally industry accepted statistical means. Regression attempts to find a “best fitting” straight line between data points (plotted X and Y coordinates on a graph) such that the line can then be extended to determine future points on that graph. This approach is best used with some confidence that meet the following criteria:

It makes sense to predict the future behavior of incidents being analyzed based on past performance. Trending makes sense to show what might happen if no actions are taken.
Past data, upon which the analysis is based, represents a true trend and does not vary widely based on major changes in business or IT activities (like a big merger, acquisition or deployment of a major new application).
Enough data is available to trend (e.g. recommended minimum 6 months of incidents, but 12 months or more presents more reliable trends).
Incident data has no outstanding outliers (e.g. each month has a steady stream of incidents that occurred without one month being unusually high or low – in this case you could remove the data for that month from the analysis).

The approach presented here for Regression Analysis will utilize a statistical regression called the Least Square Method. Least Square attempts to determine the best fitting straight line that exists between data points of X and Y coordinates on a graph. Here is a picture of what we’re trying to achieve:

In the above, we are potting monthly incident counts over time. The red line presents the calculated trend. The blue line actual observed monthly incident counts. The trend shows the prediction of how incident counts will rise if nothing is done.

Using the above example, a monthly incident count will represent the Y value, the month in which it is observed will represent the X value. The goal is to predict future values of Y (incident counts) for X values (months) in which there are currently no observations. The Least Square calculation for each Y value is calculated as follows:

Y = a + bX

The Y and X values represent plot points, monthly predicted incident count and month it occurs in respectively. The a and b values are only calculated once using the regression method for observed values (to be described later). The above formula is then executed for each X,Y data pair to determine the plot point locations for the “best fitting” trend line.

Working our example backwards, the following table shows the data points for the actual incident counts followed by trend line incident counts:

Those rows without an actually observed incident count value now just present the forecasted incident value based on a trend of the actual incidents that came before them.

How did we calculate the incident trend count values? Here are the steps:

First calculate the b value from observed data.
Then calculate the a value from observed data.
Calculate the trend value for each row in your data table

Step 1 – Calculating The “b” Value

Note: these steps only apply to data rows that have the Incident Count Actual values. In our example, this would be the 12 months where data was observed (rows 1-12 in the above table)

Count the rows that have Incident Count Actual values (e.g. 12)
Sum the Month Qualitative values (e.g. 1 + 2 + 3 + 4, etc. or 78)
Sum the Incident Count Actual values (e.g. 106,053)
Square each Month Qualitative value (e.g. row 1 would be 1, row 2 would be 4, row 3 would be 9, etc.)
Sum the squares derived in Step 4 (e.g. you should get 650)
Multiply each Month Qualitative value by its corresponding Incident Count Actual Value (e.g. row 1 would be 1 * 7,966, , row 2 would be 2 * 7,497, row 3 would be 3 * 6,699, etc.)
Sum the values derived in Step 6 (e.g. you should get 729,750)
Multiply the value in Step 7 by Step 1 (e.g. 729,750 * 12 or 8,757,000)
Subtract Step 7 from Step 8 (e.g. 8,757,000 – 729,750 or 8,027,250
Multiply Step 2 by Step 3 (e.g. 78 * 106,053 or 8,272,134)
Subtract Step 10 from Step 8 (e.g. 8,757,000 – 8,272,134 or 484,866)
Multiply Step 1 by Step 5 (e.g. 12 * 650 or 7,800)
Square Step 2 (e.g. 78 * 78 or 6,084)
Subtract Step 13 from Step 12 (e.g. 7,800 – 6,084 or 1,716)
Divide Step 11 by Step 14 (e.g. 484,866 / 1,716 or 282.5599

Your “b” value is step 15 or 282.5599

Step 2 – Calculating The “a” Value

Note: these steps only apply to data rows that have the Incident Count Actual values. In our example, this would be the 12 months where data was observed (rows 1-12 in the previous table).

Continuing the steps from above:

Divide Step 3 by Step 1 (e.g. 106,053 / 12 or 8,837.75)
Multiply Step 2 by Step 15 (your “b” number – 282.5599 * 78 or 22,039.39)
Divide Step 17 by Step 1 (e.g. 22,039.67 / 12 or 1,836.6134)
Subtract Step 18 from Step 16 (e.g. 8,837.75 – 1,836.6134 or 7,001.1367)

Your “a” number is Step 19 or 7,001.1367

Step 3 – Calculate The Trend Value For Each Data Row In Your Table

Now that values have been determined for a and b based on the observed (actual) incident counts, the forecast analysis can be run. The formula presented again is:

Y = a + bX

This can now be run for each observed and non-observed row in your table. Note that X = month quantitative value and Y = forecasted incident count.

Using our example:

You can now plot the Incident Count Y values as your trend line similar to the graph example shown earlier.

Other Powerful Uses Of This Forecasting Approach

The results as shown here make a business case for how incidents will trend of no actions are taken to improve anything. It shows how high the incident counts can go and when those impacts may occur.

Management may sometimes push back on this. After all, its just numbers. Who knows? In this case simply track incidents that actually take place in succeeding months and compare them to your trend line. In some cases, the actual counts were seen to go above the trend line (“hey, it’s worse than you thought – convinced yet?).

Another approach is to tie financials into this. For example, if your hourly labor cost to deal with an incident is $62 and incidents average 2 hours (or $124) to deal with, than multiply the forecasted numbers by that cost. Now management can see a financial penalty to taking no action. These can add up to some big numbers that get attention.

You can also apply a little bit of machine learning to this. Keep your forecasted results as an outcome model. For a training model, update a copy of this with ongoing monthly observations to see if that fine tunes your initial forecasts.

You can also use this to estimate the impact of improvement initiatives. For each initiative estimate the reduction in incident counts. For example, if we fix XYZ it will lower monthly incident rates by 5% (as an example). Apply that as a factor to your forecasted rates and show how that can impact the trend line.

To be even more adventurous, you can also show the impact of the forecasted rates. For example, would more support staff have to be hired when incidents get over a certain level? Can the Service Desk handle call volumes or will it break at some point?