Monthly Archives: January 2009

Increased Feed Size

At the request of a reader, I’ve increased the number of old posts now included in your RSS feed.  I would appreciate any feedback (good or bad) as a result of this change.  I anticipate that the first time it downloads the feed, it’ll take a few extra seconds, but I’m not sure of anything else.  So again, comments are appreciated.


Eight weeks til BN!

After eight weeks of twists and turns with my publisher and with the Bowker Books in Print system, the Software Licensing Handbook, Second Edition is now available through Barnes and Nobles (and other fine booksellers).  It’s a long sorid tale – not really worth telling.  I’m just happy it’s available again.  They’re even offering it at a discount.

ps. I’ll give a free set of the first three levels of Software Licensing Education Series videos to one person who can name the song title spoofed by the title to this post.  Post your attempt in the comments.  I’ll select a random name from all correct answers by next week at this time.

Service Level Examples

Two weeks ago, we started talking about service levels.  Last week, we discussed how to write them and I mentioned that the best way to gain experience was to do it – repeatedly.  I stand by that statement, but if you’ve never done it before or don’t have a lot of experience in writing them, then you might need some help getting started.  So I’m going to provide you with some starting points for a few key service level metrics.  These are the ones common for software-related contracts – so they’re not going to be universally applicable to everyone or to all situations.  But they might give you a jumping off point for the creation of your own.

So, before you can measure a service level, you have to define one (or more).  As I stated before, software-related services are typically measured by two major factors: Problem Response (how quickly the vendor responds to a call for help) and Problem Resolution (how quickly the vendor solves the problem).  As two measures of time, they’re similar, but these are two independent measures – a vendor can do well with one and poorly with the other, for example.  Additionally, embedded in both of these metrics is a key definition – the concept of Severity.  So we actually have to start with the definitions and work forward.

Not all problems are created equal.  Severity is the disambiguation of a particular issues’ importance.  You should create at least three Severity levels, perhaps four, but never more.  I like four because I think that it offers enough distinction between each Severity level without becoming so nuanced as to be irrelevant.  I define Sev1 Problems as any problem resulting in a full or partial production stoppage or data inaccuracy.  Sev2 Problems are a significant production inhibitor.  Sev3 Problems are those where we can do our work, but only through manual intervention that requires significant production or performance inefficiency, or where reporting functions are unavailable.  Finally, Sev4 Problems are any condition in/of the software other than those defined as Sev1-3, which affects the service or operation of our systems or network, but does not render such system or network unusable or inoperable.

The net result is that Sev1’s are “the sky is falling” moments; Sev2’s are “holy crap”; Sev3’s are “we’re pulling an all-nighter” and Sev4’s are “I don’t like having to do something in this really wacked-out way because the software doesn’t work to the manual’s spec”.  Now, you can redefine these Severity Levels any way that you wish… but the general formula should be followed (not just because I say so… but because these are almost industry standard).  As you’ll see in a moment, the distinction between each level is also important in terms of how it impacts your metrics.  Additionally, the “missing” 5th severity level is one I simply don’t include anymore – but if you do so, it would be the “user interface” issue – the color palate that makes things hard to read, the minor nit that isn’t inhibiting in any way, it’s just an annoyance.

OK, so now that you have the Severity Levels defined, you can get back to the creation of metrics for Response and Resolution time.  As I said before, Response Time is how quickly the vendor is going to answer a call for help.  Thinking logically then, the higher the Severity Level, the more quickly the vendor should respond because the more damage delay in response would cause.  My standard starts with 2 hour response time for Sev1, 4 hours for Sev2, 8 hours for Sev3 and 12 hours for Sev4.  Remember, this is just response time – the time it should take the vendor to give you a PLAN for a resolution, not to actually solve the problem.

With Resolution Time, I’m measuring time, but I’m also measuring completeness, as Resolutions are dependent upon the problem being fully solved (hence the definition of the word “resolution”).  For Sev1 Problems, I need immediate assistance, tempered with a little understanding of how software development works.  So I ask for 100% of Problem(s) resolved in 24 hours.  I follow an almost identical geometric path as the Response Times.  Sev2’s should be resolved in 48 hours, Sev3’s in 72 hours and Sev4’s in 96 hours.

Seems pretty simple, actually.  And, in many cases, it can be.  But again, if I didn’t have a fairly thorough understanding of the software development, testing/QA and bug identification/repair process, I might be tempted to ask for unreasonable metrics, or alternatively, be willing to agree to extremely long times as well.  Again, the moral of the story is to know what you want to measure and why and go from there.  Next week, we’ll talk about what happens when someone blows a service level.

CPI-U All Items

If you’ve taken my advice from the Software Licensing Handbook and included maintenance fee cap language that ties any increase in fees to the Consumer Price Index or x%, “whichever is less”, well, you might be in for a treat! Depending on the index you chose, and the time schedule for it (whether you chose an all-year average, or the average as of a given date for the prior twelve month period), there’s a chance that your CPI number is going to be a negative number.

Yup, that’s right, you might have a built-in maintenance fee decreasing mechanism in your contract. Now, you only have to go find it and find your CPI number.  Oh, and it might also be the time to hope that you have a contract management system and that this is one of the data points you’re tracking.

Zen and Art of Service Levels (with apologies to Robert Pirsig and Eugen Herrigel)

“The aim of Zen practice is to discover [this] Buddha-nature within each person, through meditation and mindfulness of daily experiences. Zen practitioners believe that this provides new perspectives and insights on existence, which ultimately lead to enlightenment.” —Wikipedia

As silly as it sounds, the way to master service levels is to draft them over and over.  Yeah, this is the same way to get better at anything, contracts especially.  But service levels are a little special.  I think it’s because they’re going the way of the Dodo.  As few people ask for them, even fewer know to even think about them.  It’s the same cycle that increases the quality of service levels – just in reverse.  Pirsig’s book was focused on trying to define “quality” and in the end, he settled upon a mix of rationality and romanticism.

I said before that service levels have to be SMART: Specific, Measurable, Attainable, Relevant, Time-Bound.  We’ll blend the rationality and romanticism as we go.

Specific – Service levels start with an understanding of the exact quantities of some metric.  This could really be anything, but tempered with the next quality, you have to be able to count it.  Typically, we start with things that are time-related: uptimes and downtimes, repair times and fix times.  Rationality wins here almost every day (the truly romantic notion is that service levels aren’t needed at all because everything’s going to work out as planned) – these things are really easy to measure… and frankly, ease of measurement is necessary because the folks who will be monitoring the service levels aren’t really interested in tracking them.  But why not be a little romantic, too?  Pick something unique about the particular situation.  Maybe you’re licensing software that processes transactions (so you’d count the transactions processed), or maybe you’ve hired an outsourcer to answer your support calls and service levels could be managed based on the number of successful versus unsuccessful customer service resolutions.

Measurable – This might seem obvious, but you’ve got to be able to measure what it is you’re going to base any metrics upon.  Just counting isn’t necessarily enough.  Rather, you might need to be able to track start/stop times/days (and then do the math to calculate the difference).  If the calculation is manual, you also need people who can keep track.  This, perhaps, is the most problematic part of any service level management… as the folks who want the benefits of the service level (usually managers) are not the people watching the clock or experiencing the outages first-hand (the staff).  So unless the staff has some sort of reason to monitor the metric accordingly, none of this is going to matter.

Attainable – I promised you before that the Myth of the Nines would come back into your life, and here it is.  The simple truth is that Five-9 availability is a pipe dream.  5.26 minutes of downtime a year.  Just think about how long your average PC takes to power-cycle.  Servers are typically a little longer.  Even with redundant systems, backups, high-availability resources and every other techincal resource… it’s just not reasonable.  Notice I didn’t say that it was impossible.  It’s 100% possible.  You can have 100% availability.  The issue is cost.  No one ever wants to PAY for that kind of availability.  Not even your most demanding customers.  Wanna’ test this theory?  Price it out from your vendor(s) (as it’ll take more than one to keep even a single service up 24/7/365) and ask your most demanding customer if they’ll pay for the ENTIRE service themselves (since that’s the real cost to get it).  Let me know if they’re willing to do it, because I have a bridge or two to sell them.  Seriously, I’m not trying to be facetious.  I’m a pretty demanding customer myself, but even I know and understand financial limits.

Relevant – Tied to measurable and specific is that each of your service level metrics be relevant to whatever service you’re receiving/providing.  So if you’ve chosen to measure successful versus unsuccessful customer service resolutions, but it’s not tied to the behavior of the service provider, that’s not a relevant metric.  The provider doesn’t have any control over what is being measured, even with perfect behavior.  So where is their incentive to work towards meeting the metrics (or agreeing to them in the first place)?

Time-Bound – Service levels are limited to time.  At first, this sounds quite limiting, but we’re not talking about time in terms of the length of the relationship (service levels should extend for the entire length of the relationship).  Rather, the time we’re talking about here is the time frame in which each metric will be measured.  So, perhaps you’re watching uptime on a daily basis… or the number of widgets produced in a week… or the number of successful service calls completed in a year… or the average length of time it takes to fix a problem of a given severity level over the span of a quarter.

OK, so now that you’ve considered all five requirements, you should have one or more appropriate service levels.  If you still need some ideas, check back with me for the next installment.  Meanwhile, if you have some ideas for inclusion in the next installment, send them along!

Service Level Basics

I eat out a lot – exempting breakfast (I don’t eat it), I would say that I’m at a restaurant for about 10 of every 14 available meals.  Never mind what this does to my budget, let’s focus on the food.  Now, I’m a pretty simple eater – in fact, I love things plain.  When I go to McDonalds or Burger King, I get the burgers with nothing on them – just meat and bread.  Add in some fries and a drink, and I’m a happy man.

So in most situations, I’ve got three components to my meal: an entree, a side and a drink.  Statistically speaking, there are eight possible combinations of quality (assume that each item can only be good or bad):

bad burger, bad fries, bad drink
bad burger, bad fries, good drink
bad burger, good fries, good drink
bad burger, good fries, bad drink
good burger, bad fries, bad drink
good burger, bad fries, good drink
good burger, good fries, bad drink
good burger, good fries, good drink

Thus, for any given purchase of just these three components to my meal, I have a 1 in 8 chance of getting all three “good” items.  That’s a .12512.5% [thanks to John O. for correcting my math] chance – WAY less than 1% that I’m going to enjoy all three items.  On the other hand, there’s also only a 1 in 8 chance of having all three be “bad”.  There’s a 3 in 8 chance that one will be “good” and a 3 in 8 chance that two will be “good”.  So what do I do?

I set my expectations accordingly and know that there’s a 50% chance that I’ll enjoy at least two of the items (the 3 in 8 that two will be good plus the 1 in 8 that all three will be good).  Yes, I know that there’s a 50% chance for the reverse – but remember also that there are some other variables that we need to account for.  In all name-brand fast-food joints, there are quality standards set by the franchisor.  McDonald, Burger King, Chick-Fil-A, Wendy’s, Arby’s, KFC, Taco Bell, etc… they all have: food that is pre-packaged and sent to the stores (reducing the likelihood of differentiation by store); cooking standards (look behind the counter some time and see if you can find the poster showing the correct “doneness levels”); even standard equipment (fryers, etc) to reduce variations.

So in actuality, there’s a better than 50% chance that my food will be “good” (meeting the corporate standard) because of these outside variables.

OK, so what does this all have to do with software, services and service levels?

Well, it’s 100% the same.  Service levels are quality-based promises a customer seeks from a vendor.  There are a lot of variables (such as the software), a few standardized items (usually the hardware), and you try to pick a few key metrics that you think will be able to give you a quality rating on the meal (the service itself).  The question is whether you can appropriately gauge how often you’re going to be satisfied with what you’ve purchased and cope with it when you’re not.

In the software and services world, service levels are typically measured in response time or uptime, used to enforce the vendor’s sales-pitches that the particular good or service will be as incredible as it was during the demo.  Vendors, of course, don’t like service levels, and customer’s predictably, love them.  However, in all of the years I’ve been playing this game, I very rarely see service levels that benefit either party.

To be effective, service levels have to be SMART (as made popular by Peter Drucker):  Simple, Measurable, Attainable, Relevant and Time-Bound (we discussed these earlier when talking about writing SOWs, too).  So while you might have a service-level grid in your template agreement, for any particular product or service, you have to evaluate those pre-defined levels and see if they make sense for whatever it is that is being purchased.  This is no easy task and requires a lot of input from your colleagues down in IT support, architecture, engineering and management.  You have to look first at the product or service’s use (Is it customer facing?  Is it mission critical (yes, be honest on this one)?  Is it internal-use-only?  Is it backend-use-only?)  Then you have to look at WHEN the product is going to be used (day, night, weekends, random).

But most important, you have to look at the actual impact of being without the product or service and for how long you can be without it before a real negative impact sets in.  So, for example, how long could you be without your word processing application company-wide before productivity takes a significant hit?  Can you actually calculate the damages that would result if noone had access to e-mail for an hour or two?  Probably not.  So you’re left with guess work.  Which makes a vendor (and many customers) pretty squeemish about putting hard dollars to soft numbers.

Over the next few posts, I’m going to talk about a few specifics and we’re going to re-visit the Myth of the Nines.  Get out your red pens and engage track-changes… we’re going to alter your service level perceptions.

Oh, and because I was talking about the 1 in 8 chance of getting three good food items earlier, well, it happened yesterday.  The Burger King in the Charlotte Airport nailed a plain Whopper, fries and a coke.  And it was at a time when I was REALLY desperate for good food, too.  Less than 113% chance, perhaps, but still possible.