Service Operation: Major Incident Management Overview

In a recent post I mentioned we’d offer some thoughts on fancy integration workflow with Incident, Major Incident and Problem Management.

But first, let’s begin with some suggestions on how to approach Major Incident Management and then in a future article we’ll discuss how it might integrate with Incident, Change and Problem Management.

Within the Service Operation and Service Transition lifecycle stages where most ITSM projects focus, there are three processes that can be effectively started without immediate use of an ITSM tool to enable them. These processes are Problem Management, Change Management and Major Incident Management.

In fact, Major Incident Management is a process that should be designed almost completely separate from any I

TSM software considerations. The tool should be almost an afterthought, helping to document what was done, but not driving any workflow during a Major Incident. You should be able to respond to any Major Incident independent of any supporting software or systems. Because, depending on the situation- you have to be prepared for those systems to not be available.

Another, more prosaic, reason for this system independence is the urgent nature of a Major Incident.

 

Whenever you appoint the owner of a Major Incident, or assign someone a Task to perform in troubleshooting or resolving a Major Incident, there needs to be real-time contact and confirmation. There is no time to waste waiting for tickets in work queues to be noticed and picked up. This is a process that relies on immediate communication and acknowledgment.

In considering the design of the process itself, one necessary element that must be defined is the authority to declare. A Major Incident is a deliberate declared state. It is an organizational decision to prioritize response to a specific Incident above all other activities. Some organizations have the Major Incident process owner declare all Major Incidents, in others the office of the CIO does this. It is also common to have the authority to declare be delegated to the Service Owner of the principle Service affected by the Incident.

Wherever you choose to locate your Major Incident declaration authority, be sure to consider the chain of command in the event you cannot immediately contact the first person on the list.

It is helpful to define some guidelines governing the threshold for Major Incident declaration. My advice would be to not go overboard with this. Some organizations try to define these boundaries to the penny, ‘if we are losing $10,000.01 an hour or more then we have a Major Incident.’ Reality is seldom going to be that precise. Keeping the declaration guidelines general is usually best.

Here are some suggested categories of thresholds you might consider providing guidance around:

  • Service interruption that causes potential impact on safety
  • Impact on vital business processes
  • Impact on teaching and learning (for our higher education clients)
  • Loss of revenue
  • Loss of reputation – Public Perception
  • Compliance – Regulatory breaches
  • Impact on ability to perform work

These are just some commonly used categories, the parameters you develop will obviously vary depending on the specific nature of your industry and organization. For example, one higher education client identified the concept of a ‘Teaching Emergency:’

  • The inability of at least one faculty member to teach OR more than one student to learn, now or within the next 24 hours.

In this post we have provided some suggestions to get started thinking about how to approach designing an IT Major Incident Management process. In our next article we will continue the topic with a discussion of the process roles and high level workflow.

If you would like to discuss these concepts further – you can contact us at contact_us@service-catalyst.com or call us at +1.888.718.1708 and let us know you would like to discuss Major Incident Management or anything about ITSM and ServiceNow implementation services.