What is Incident Management
Unplanned events can be expected in any industry, but when it happens in the space of IT, it can lead to serious consequences. Identifying an incident and having a plan in place can be the difference between keeping things running smoothly or dealing with a catastrophe. This is where incident management comes into play. Incident management is a process designed to respond to an unplanned event or service interruption with the goal of restoring service back to its original state. Ideally, service is restored as quickly as possible with a minimal lasting impact. In this blog post, we’ll take a look at why effective incident management is important along with how to get started on creating a plan so that your association can stay on track.
Service Request vs Incident
A service request is a more standard occurrence in that it doesn’t involve anything being broken nor does it impact service. These types of requests are usually looking for information, advice, or access to a service, application, or equipment on behalf of the user. These include common requests that have a defined process and can be resolved relatively quickly and easily. These types of issues have a pre-definable workflow and outcome. Some examples of a service request include:
An incident, however, is more serious in that it can cause an unplanned interruption which can impact the productivity of employees as well as the quality of service for the end-user. These disruptions can cost both time and money. Some common incident types are:
An incident management plan will establish a clear process that enables IT to be both consistent and efficient when tackling these unexpected issues.
Incident Management Best Practices
Developing a standard operating procedure establishes a clear process that will enable your IT department to be consistent and effective when it comes to addressing incidents that may arise.
Below, we offer some guidelines designed to help carve out a clear path to resolution when it comes to an unexpected IT incident:
Incidents arise in various ways, usually from a user reporting an issue or event monitoring triggering an alert. When this happens, you’ll want to acknowledge the issue and reach out to the user, if applicable, to get clarification.
It’s important to log the incident. This is usually done by creating a ticket and including any important information. Tracking updates within the ticket can be helpful as you move along in the process.
This is where you conduct your initial investigation. Start by categorizing the ticket in question. Does the incident affect a system, service, or device? Next, establish urgency and assign a priority based on possible impact. If necessary, link to related ongoing incidents that may be resulting in multiple alerts and/or user calls. Finally, check for any known errors or related incidents in the knowledge bank in an effort to identify a resolution.
Assuming that a solution has not yet been identified, it’s time to identify a possible cause in order to provide a satisfactory solution or workaround. Be sure to update findings in the incident ticket.
This is where your solution is implemented. First, you’ll want the appropriate approval required to apply the solution. Next, communicate the solution to all relevant parties involved. If necessary, submit all appropriate requests needed to implement the solution.
Once you’ve confirmed that the affected service has been restored, you can update the incident ticket with resolution details. Be thorough so that you can refer to the ticket for future troubleshooting.
Identify your Incident Types (Source: InfoTech)
This doesn’t mean identifying an incident when it occurs but rather anticipating what could go wrong and having a plan in place. You’re basically creating a crisis management plan that is specifically focused on IT. To get started, you’ll want to brainstorm possible scenarios and devise a plan for each one. Start with your most critical assets, those that will have the largest impact on your users or clients if they were to occur. Your runbook is the go-to guide in the event of each incident.
If or when these incidents arise, you’ll want to detail the frequency and impact of specific incident types. Some things to consider include:
Incident Type: What is the problem at hand?
Incident Description: Provide details on the incident in question.
Frequency: How often could this type of incident happen?
Expected Trend: Is this frequency likely to change? Are there revisions to controls/processes that can affect frequency?
Functional Impact: To what degree could this affect critical services?
Information Impact: How much sensitive data is affected?
Recoverability Effort: How easy will it be to recover from this incident?
Document Owner: Who will be responsible for this specific incident?
Notes: Include modifications and/or clarifications to provide important details about the incident.
Once you have a robust list of information, you can prioritize the order of your runbook development as needed.
Creating Incident Runbooks
A runbook is a document that provides a detailed guide on how your association’s incident procedure should be handled. Having an incident runbook in place means that your team is equipped and well-versed in the standard procedures needed to respond to and resolve incidents in real-time. This is a key factor in setting your association’s procedures for responding to unplanned occurrences. More importantly, it helps avoid chaos by being able to address incidents consistently.
Each runbook will look different depending on how it is customized. Below is a loose framework for important considerations when creating your own runbooks.
Your runbook can be as robust or basic as you’d like but it’s important to realize that logging each incident in your runbook not only offers a set process for resolving issues but also provides a roadmap for the future. If you’ve learned something new that can expedite future incidents, it can easily be logged and referenced moving forward.
Wrapping IT Up
Incidents in the IT space can lead to serious consequences, which is why having an incident management plan in place is important to restore service in a timely and effective manner. Having an incident runbook in place ensures that teams are equipped with standard procedures for responding to and resolving incidents. Would you like to learn more about implementing an incident management plan for your association? At Cimatri, our team of IT experts are Certified Association Executives (CAE) who have spent over two decades specifically working with associations. Contact us today and let’s discover how we can help you get started on your next IT project!