Category Archives: Uncategorized

More on 5-9 Availability

I had a few posts on 5-9 availability in the last two years.  Today, ZDNet reports that 3Tera is offering 5 9’s.  The really positive thing I saw in the article is that 3Tera plans to provide automatic SLA credits in the event of downtime that blows the metric – which, if there’s truly 5-9 availability, shouldn’t take much doing.

I’ll ask the question again that I asked before:  Is it worth the cost?

Advertisements

Software Licensing Education Series – 400s Track Now Available!

Designed for the busy or on-the-go professional, the Software Licensing Education Series (SLES) is video-based training on the complete gamut of software licensing topics. Presented in a college-course level format, with topics increasing in complexity and building upon prior lessons, the SLES allows an audio-visual learner another way to gain knowledge on licensing topics.  Each video is approximately 20-30 minutes in length, so each Track contains about 2 hours of expert instruction in core software licensing topics!

The 400 Track videos include:
SLES 401 – Services Issues 2
SLES 402 – Maintenance and Support 1
SLES 403 – Maintenance and Support 2 (special 1-hour course)
SLES 404 – ASP and SaaS Issues

(500s Tracks are currently in production and will be released shortly!)

Videos are formatted for a computer or portable video player (such as an iPod) and consist of a slide-show format with voice-over instruction, so you can even learn just by listening!

Service Levels and Remedies

We’ve been discussing Service Levels for the last few weeks.  We covered construction, setup, drafting and even some recommendations on how the Service Levels might look.  But what happens in the event that the Service Levels aren’t met (they’re “blown”)?

We should start with a review of why we sought Service Levels in the first place – which is usually because you really just want what you’ve been promised in terms of service or performance.  In this case, remedies for blown Service Levels should be two-fold:  1) to get service restored as quickly as possible, and 2) to recompense for any damages caused as a result of the blown service level.  So you’ve probably even drafted the Service Level metrics so as to put a lot of importance on fewer outages or less downtime – with significant financial penalties.  But you don’t really want the money, you want great service.  The net result is that in the event of a blown Service Level, you probably won’t enforce the financial penalty… it’s more bark than bite.  But you are a stickler for responsiveness and attentiveness, and you will enforce the remedies if you feel that not only have the Service Levels been blown, but that you’re getting ignored, too.  This is justifiable.

But maybe you’ve got a little more jaded view and you already know that you’re not going to get exactly what was promised.  You drafted the Service Levels with really tight controls – trying to count every little thing (without regard to what’s really important) in the particular deal.  You instead feel that software or services are exact sciences and that if the vendor is silly enough to make a promise, you should get everything you’re “entitled” to.  The result is that at the first hint of a blown Service Level, you’re on the phone with the vendor asking for service credits (never mind restoration of service).

Personally, I hope you’re in the first group of folks and not the second.  Software and Services, by their nature, are going to have hiccups.  This is just how it goes.  Even older, established products can have difficulties (environments are constantly changing).  So instead of trying to beat the vendor with your contract, you use the contract strategically to make sure that you’re getting what is really important to the deal … and not some sort of small financial benefit for each blown level.

However, there are times where it is absolutely justifiable to use the contract as a sword and a shield.  One of these cases is what’s known as “death by ducks” – the situation where you have many small Service Level issues.  None of them, on their own, would be worth enforcing as a true blown event.  But together, you get pecked to death by the small things because they cumulatively add up to a significant performance issue.  Here, you should have anticipated this issue and have a small extra section in your Service Level language detailing the remedies available in the event of x number of small issues that add up to a certain Severity Level.  Heck, you can even define it as a certain number of Sev3’s = a Sev2.  And a certain number of Sev2’s = a Sev1.  How would you handle it?


Service Level Examples

Two weeks ago, we started talking about service levels.  Last week, we discussed how to write them and I mentioned that the best way to gain experience was to do it – repeatedly.  I stand by that statement, but if you’ve never done it before or don’t have a lot of experience in writing them, then you might need some help getting started.  So I’m going to provide you with some starting points for a few key service level metrics.  These are the ones common for software-related contracts – so they’re not going to be universally applicable to everyone or to all situations.  But they might give you a jumping off point for the creation of your own.

So, before you can measure a service level, you have to define one (or more).  As I stated before, software-related services are typically measured by two major factors: Problem Response (how quickly the vendor responds to a call for help) and Problem Resolution (how quickly the vendor solves the problem).  As two measures of time, they’re similar, but these are two independent measures – a vendor can do well with one and poorly with the other, for example.  Additionally, embedded in both of these metrics is a key definition – the concept of Severity.  So we actually have to start with the definitions and work forward.

Not all problems are created equal.  Severity is the disambiguation of a particular issues’ importance.  You should create at least three Severity levels, perhaps four, but never more.  I like four because I think that it offers enough distinction between each Severity level without becoming so nuanced as to be irrelevant.  I define Sev1 Problems as any problem resulting in a full or partial production stoppage or data inaccuracy.  Sev2 Problems are a significant production inhibitor.  Sev3 Problems are those where we can do our work, but only through manual intervention that requires significant production or performance inefficiency, or where reporting functions are unavailable.  Finally, Sev4 Problems are any condition in/of the software other than those defined as Sev1-3, which affects the service or operation of our systems or network, but does not render such system or network unusable or inoperable.

The net result is that Sev1’s are “the sky is falling” moments; Sev2’s are “holy crap”; Sev3’s are “we’re pulling an all-nighter” and Sev4’s are “I don’t like having to do something in this really wacked-out way because the software doesn’t work to the manual’s spec”.  Now, you can redefine these Severity Levels any way that you wish… but the general formula should be followed (not just because I say so… but because these are almost industry standard).  As you’ll see in a moment, the distinction between each level is also important in terms of how it impacts your metrics.  Additionally, the “missing” 5th severity level is one I simply don’t include anymore – but if you do so, it would be the “user interface” issue – the color palate that makes things hard to read, the minor nit that isn’t inhibiting in any way, it’s just an annoyance.

OK, so now that you have the Severity Levels defined, you can get back to the creation of metrics for Response and Resolution time.  As I said before, Response Time is how quickly the vendor is going to answer a call for help.  Thinking logically then, the higher the Severity Level, the more quickly the vendor should respond because the more damage delay in response would cause.  My standard starts with 2 hour response time for Sev1, 4 hours for Sev2, 8 hours for Sev3 and 12 hours for Sev4.  Remember, this is just response time – the time it should take the vendor to give you a PLAN for a resolution, not to actually solve the problem.

With Resolution Time, I’m measuring time, but I’m also measuring completeness, as Resolutions are dependent upon the problem being fully solved (hence the definition of the word “resolution”).  For Sev1 Problems, I need immediate assistance, tempered with a little understanding of how software development works.  So I ask for 100% of Problem(s) resolved in 24 hours.  I follow an almost identical geometric path as the Response Times.  Sev2’s should be resolved in 48 hours, Sev3’s in 72 hours and Sev4’s in 96 hours.

Seems pretty simple, actually.  And, in many cases, it can be.  But again, if I didn’t have a fairly thorough understanding of the software development, testing/QA and bug identification/repair process, I might be tempted to ask for unreasonable metrics, or alternatively, be willing to agree to extremely long times as well.  Again, the moral of the story is to know what you want to measure and why and go from there.  Next week, we’ll talk about what happens when someone blows a service level.


Zen and Art of Service Levels (with apologies to Robert Pirsig and Eugen Herrigel)

“The aim of Zen practice is to discover [this] Buddha-nature within each person, through meditation and mindfulness of daily experiences. Zen practitioners believe that this provides new perspectives and insights on existence, which ultimately lead to enlightenment.” —Wikipedia

As silly as it sounds, the way to master service levels is to draft them over and over.  Yeah, this is the same way to get better at anything, contracts especially.  But service levels are a little special.  I think it’s because they’re going the way of the Dodo.  As few people ask for them, even fewer know to even think about them.  It’s the same cycle that increases the quality of service levels – just in reverse.  Pirsig’s book was focused on trying to define “quality” and in the end, he settled upon a mix of rationality and romanticism.

I said before that service levels have to be SMART: Specific, Measurable, Attainable, Relevant, Time-Bound.  We’ll blend the rationality and romanticism as we go.

Specific – Service levels start with an understanding of the exact quantities of some metric.  This could really be anything, but tempered with the next quality, you have to be able to count it.  Typically, we start with things that are time-related: uptimes and downtimes, repair times and fix times.  Rationality wins here almost every day (the truly romantic notion is that service levels aren’t needed at all because everything’s going to work out as planned) – these things are really easy to measure… and frankly, ease of measurement is necessary because the folks who will be monitoring the service levels aren’t really interested in tracking them.  But why not be a little romantic, too?  Pick something unique about the particular situation.  Maybe you’re licensing software that processes transactions (so you’d count the transactions processed), or maybe you’ve hired an outsourcer to answer your support calls and service levels could be managed based on the number of successful versus unsuccessful customer service resolutions.

Measurable – This might seem obvious, but you’ve got to be able to measure what it is you’re going to base any metrics upon.  Just counting isn’t necessarily enough.  Rather, you might need to be able to track start/stop times/days (and then do the math to calculate the difference).  If the calculation is manual, you also need people who can keep track.  This, perhaps, is the most problematic part of any service level management… as the folks who want the benefits of the service level (usually managers) are not the people watching the clock or experiencing the outages first-hand (the staff).  So unless the staff has some sort of reason to monitor the metric accordingly, none of this is going to matter.

Attainable – I promised you before that the Myth of the Nines would come back into your life, and here it is.  The simple truth is that Five-9 availability is a pipe dream.  5.26 minutes of downtime a year.  Just think about how long your average PC takes to power-cycle.  Servers are typically a little longer.  Even with redundant systems, backups, high-availability resources and every other techincal resource… it’s just not reasonable.  Notice I didn’t say that it was impossible.  It’s 100% possible.  You can have 100% availability.  The issue is cost.  No one ever wants to PAY for that kind of availability.  Not even your most demanding customers.  Wanna’ test this theory?  Price it out from your vendor(s) (as it’ll take more than one to keep even a single service up 24/7/365) and ask your most demanding customer if they’ll pay for the ENTIRE service themselves (since that’s the real cost to get it).  Let me know if they’re willing to do it, because I have a bridge or two to sell them.  Seriously, I’m not trying to be facetious.  I’m a pretty demanding customer myself, but even I know and understand financial limits.

Relevant – Tied to measurable and specific is that each of your service level metrics be relevant to whatever service you’re receiving/providing.  So if you’ve chosen to measure successful versus unsuccessful customer service resolutions, but it’s not tied to the behavior of the service provider, that’s not a relevant metric.  The provider doesn’t have any control over what is being measured, even with perfect behavior.  So where is their incentive to work towards meeting the metrics (or agreeing to them in the first place)?

Time-Bound – Service levels are limited to time.  At first, this sounds quite limiting, but we’re not talking about time in terms of the length of the relationship (service levels should extend for the entire length of the relationship).  Rather, the time we’re talking about here is the time frame in which each metric will be measured.  So, perhaps you’re watching uptime on a daily basis… or the number of widgets produced in a week… or the number of successful service calls completed in a year… or the average length of time it takes to fix a problem of a given severity level over the span of a quarter.

OK, so now that you’ve considered all five requirements, you should have one or more appropriate service levels.  If you still need some ideas, check back with me for the next installment.  Meanwhile, if you have some ideas for inclusion in the next installment, send them along!


Service Level Basics

I eat out a lot – exempting breakfast (I don’t eat it), I would say that I’m at a restaurant for about 10 of every 14 available meals.  Never mind what this does to my budget, let’s focus on the food.  Now, I’m a pretty simple eater – in fact, I love things plain.  When I go to McDonalds or Burger King, I get the burgers with nothing on them – just meat and bread.  Add in some fries and a drink, and I’m a happy man.

So in most situations, I’ve got three components to my meal: an entree, a side and a drink.  Statistically speaking, there are eight possible combinations of quality (assume that each item can only be good or bad):

bad burger, bad fries, bad drink
bad burger, bad fries, good drink
bad burger, good fries, good drink
bad burger, good fries, bad drink
good burger, bad fries, bad drink
good burger, bad fries, good drink
good burger, good fries, bad drink
good burger, good fries, good drink

Thus, for any given purchase of just these three components to my meal, I have a 1 in 8 chance of getting all three “good” items.  That’s a .12512.5% [thanks to John O. for correcting my math] chance – WAY less than 1% that I’m going to enjoy all three items.  On the other hand, there’s also only a 1 in 8 chance of having all three be “bad”.  There’s a 3 in 8 chance that one will be “good” and a 3 in 8 chance that two will be “good”.  So what do I do?

I set my expectations accordingly and know that there’s a 50% chance that I’ll enjoy at least two of the items (the 3 in 8 that two will be good plus the 1 in 8 that all three will be good).  Yes, I know that there’s a 50% chance for the reverse – but remember also that there are some other variables that we need to account for.  In all name-brand fast-food joints, there are quality standards set by the franchisor.  McDonald, Burger King, Chick-Fil-A, Wendy’s, Arby’s, KFC, Taco Bell, etc… they all have: food that is pre-packaged and sent to the stores (reducing the likelihood of differentiation by store); cooking standards (look behind the counter some time and see if you can find the poster showing the correct “doneness levels”); even standard equipment (fryers, etc) to reduce variations.

So in actuality, there’s a better than 50% chance that my food will be “good” (meeting the corporate standard) because of these outside variables.

OK, so what does this all have to do with software, services and service levels?

Well, it’s 100% the same.  Service levels are quality-based promises a customer seeks from a vendor.  There are a lot of variables (such as the software), a few standardized items (usually the hardware), and you try to pick a few key metrics that you think will be able to give you a quality rating on the meal (the service itself).  The question is whether you can appropriately gauge how often you’re going to be satisfied with what you’ve purchased and cope with it when you’re not.

In the software and services world, service levels are typically measured in response time or uptime, used to enforce the vendor’s sales-pitches that the particular good or service will be as incredible as it was during the demo.  Vendors, of course, don’t like service levels, and customer’s predictably, love them.  However, in all of the years I’ve been playing this game, I very rarely see service levels that benefit either party.

To be effective, service levels have to be SMART (as made popular by Peter Drucker):  Simple, Measurable, Attainable, Relevant and Time-Bound (we discussed these earlier when talking about writing SOWs, too).  So while you might have a service-level grid in your template agreement, for any particular product or service, you have to evaluate those pre-defined levels and see if they make sense for whatever it is that is being purchased.  This is no easy task and requires a lot of input from your colleagues down in IT support, architecture, engineering and management.  You have to look first at the product or service’s use (Is it customer facing?  Is it mission critical (yes, be honest on this one)?  Is it internal-use-only?  Is it backend-use-only?)  Then you have to look at WHEN the product is going to be used (day, night, weekends, random).

But most important, you have to look at the actual impact of being without the product or service and for how long you can be without it before a real negative impact sets in.  So, for example, how long could you be without your word processing application company-wide before productivity takes a significant hit?  Can you actually calculate the damages that would result if noone had access to e-mail for an hour or two?  Probably not.  So you’re left with guess work.  Which makes a vendor (and many customers) pretty squeemish about putting hard dollars to soft numbers.

Over the next few posts, I’m going to talk about a few specifics and we’re going to re-visit the Myth of the Nines.  Get out your red pens and engage track-changes… we’re going to alter your service level perceptions.

Oh, and because I was talking about the 1 in 8 chance of getting three good food items earlier, well, it happened yesterday.  The Burger King in the Charlotte Airport nailed a plain Whopper, fries and a coke.  And it was at a time when I was REALLY desperate for good food, too.  Less than 113% chance, perhaps, but still possible.


Services Descriptions

I saw an agreement the other day which listed the services in about two sentences.  This might be an acceptable description if the services are personal lawnmowing… or room painting.

Generally speaking, your services description for any project of any length of time needs to be more than a paragraph.  No, it’s not about length – it’s about quality.  But while quality doesn’t rely on length (see Ken Adams for more detail on that argument), it’s an indicator of completeness, which is what’s really important.

Let’s look at a basic project, like lawnmowing.  A single sentence services description would probably look like this:

“Provider will mown the grass at xyz location one time per week between today and 1 year from today.”

Is that a complete enough description?  Perhaps.

How do you know if it’s complete enough?  Start to ask questions.

1.  How big is the area to be serviced?

2.  How long is it going to take to mow the lawn?

3.  Are there times of the day where the noise would be problematic?

4.  Should the provider “edge” the lawn at the curbs and sidewalks?

5.  What should be done with the clippings?

6.  Who is going to be doing the mowing?

7.  Who supplies the equipment?

8.  Is there an adjustment for the cost of fuel for the mowers?

OK… I think I’ve made the point.  Even for something as seemingly simple as lawnmowing, it doesn’t take long to generate a list of questions that should be answered in your services description/SOW.

Just remember, your services description is all that stands between you and your money for the quality of the services performed.  You will NOT get to use what’s in your head, what you assume, and in some cases, what you think to be industry standard, to hold the services provider accountable after contract signature.  In other words, if you can think it, WRITE IT DOWN.

Seems a little anal retentive?  Sure.  You can even apologize and tell the other side that you’re just weird.  Making fun of yourself actually is beneficial to your negotiating position if you can do it with sincerity (read the book I recommended last week for details – it’s in there).  Trust me, a little anal retentiveness now goes a long way to preventing problems later.

Want to know more?  Get the Software Licensing Education Series – 300s Track.