Two weeks ago, we started talking about service levels. Last week, we discussed how to write them and I mentioned that the best way to gain experience was to do it – repeatedly. I stand by that statement, but if you’ve never done it before or don’t have a lot of experience in writing them, then you might need some help getting started. So I’m going to provide you with some starting points for a few key service level metrics. These are the ones common for software-related contracts – so they’re not going to be universally applicable to everyone or to all situations. But they might give you a jumping off point for the creation of your own.
So, before you can measure a service level, you have to define one (or more). As I stated before, software-related services are typically measured by two major factors: Problem Response (how quickly the vendor responds to a call for help) and Problem Resolution (how quickly the vendor solves the problem). As two measures of time, they’re similar, but these are two independent measures – a vendor can do well with one and poorly with the other, for example. Additionally, embedded in both of these metrics is a key definition – the concept of Severity. So we actually have to start with the definitions and work forward.
Not all problems are created equal. Severity is the disambiguation of a particular issues’ importance. You should create at least three Severity levels, perhaps four, but never more. I like four because I think that it offers enough distinction between each Severity level without becoming so nuanced as to be irrelevant. I define Sev1 Problems as any problem resulting in a full or partial production stoppage or data inaccuracy. Sev2 Problems are a significant production inhibitor. Sev3 Problems are those where we can do our work, but only through manual intervention that requires significant production or performance inefficiency, or where reporting functions are unavailable. Finally, Sev4 Problems are any condition in/of the software other than those defined as Sev1-3, which affects the service or operation of our systems or network, but does not render such system or network unusable or inoperable.
The net result is that Sev1’s are “the sky is falling” moments; Sev2’s are “holy crap”; Sev3’s are “we’re pulling an all-nighter” and Sev4’s are “I don’t like having to do something in this really wacked-out way because the software doesn’t work to the manual’s spec”. Now, you can redefine these Severity Levels any way that you wish… but the general formula should be followed (not just because I say so… but because these are almost industry standard). As you’ll see in a moment, the distinction between each level is also important in terms of how it impacts your metrics. Additionally, the “missing” 5th severity level is one I simply don’t include anymore – but if you do so, it would be the “user interface” issue – the color palate that makes things hard to read, the minor nit that isn’t inhibiting in any way, it’s just an annoyance.
OK, so now that you have the Severity Levels defined, you can get back to the creation of metrics for Response and Resolution time. As I said before, Response Time is how quickly the vendor is going to answer a call for help. Thinking logically then, the higher the Severity Level, the more quickly the vendor should respond because the more damage delay in response would cause. My standard starts with 2 hour response time for Sev1, 4 hours for Sev2, 8 hours for Sev3 and 12 hours for Sev4. Remember, this is just response time – the time it should take the vendor to give you a PLAN for a resolution, not to actually solve the problem.
With Resolution Time, I’m measuring time, but I’m also measuring completeness, as Resolutions are dependent upon the problem being fully solved (hence the definition of the word “resolution”). For Sev1 Problems, I need immediate assistance, tempered with a little understanding of how software development works. So I ask for 100% of Problem(s) resolved in 24 hours. I follow an almost identical geometric path as the Response Times. Sev2’s should be resolved in 48 hours, Sev3’s in 72 hours and Sev4’s in 96 hours.
Seems pretty simple, actually. And, in many cases, it can be. But again, if I didn’t have a fairly thorough understanding of the software development, testing/QA and bug identification/repair process, I might be tempted to ask for unreasonable metrics, or alternatively, be willing to agree to extremely long times as well. Again, the moral of the story is to know what you want to measure and why and go from there. Next week, we’ll talk about what happens when someone blows a service level.