Definition of Terms
Impact is determinedby how many personnel or functions are affected.
There are three grades of impact:
- 3 – Low – One or two personnel. Service is degraded but still operating within SLA specifications
- 2 – Medium – Multiple personnel in one physical location. Service is degraded and still functional but not operating within SLA specifications. It appears the cause of the Problem falls across multiple service provider groups
- 1 – High – All users of a specific service. Personnel from multiple agencies are affected. Public facing service is unavailable
The impact of the incidents associated with a problem will be used in determining the priority for resolution.
An incident is an unplanned interruption to an IT Service or reduction in the Quality of an IT Service.
Failure of any Item, software or hardware, used in the support of a system that has not yet affected service is also an Incident.
For example, the failure of one component of a redundant high availability configuration is an incident even though it does not interrupt service.
An incident occurs when the operational status of a Production item changes from working to failing or about to fail, resulting in a condition in which the item is not functioning as it was designed or implemented.
The resolution for an incident involves implementing a repair to restore the item to its original state.
A design flaw does not create an incident. If the product is working as designed, even though the design is not correct, the correction needs to take the form of a service request to modify the design.
The service request may be expedited based upon the need, but it is still a modification, not a repair.
A database that contains information on how to fulfil requests and resolve incidents using previously proven methods / scripts.
A Known Error is a problem that has an identified root cause and for which a workaround or (temporary) solution has been identified. This term is also describes a fault in the infrastructure that can be attributed to one or more faulty CI’s (Configuration Items) in the Infrastructure and causes, or may cause, one or more incidents for which a workaround and/or resolution is identified.
Proactive Problem Management
Proactive Problem Management is one of two important Problem Management processes.
It is used to detect and prevent future problems/incidents.
Proactive problem Management includes the identification of trends or potential weaknesses.
Proactive Problem Management is performed by the Service Operations group.
A problem is the underlying cause of an incident and can be identified in the following ways:
- It is identified as soon as an incident occurs that cannot be matched to existing or recorded problems for which a root cause is to be sought.
- It is identified as a result of multiple Incidents that exhibit common symptoms.
- It is identified from a single significant Incident,indicative of a single error, for which the cause is unknown, but for which theimpact is significant (a Major Incident).
The Problem Repository is a database containing relevant information about all problems whether they have been resolved or not.
General status information along with notes related to activity should also be maintained in a format that supports standardised reporting.
Priority is determined by utilising a combination of the Problem’s impactand severity.
For a full explanation of the determination ofpriority refer to the section of this document titled Priority Determination.
Reactive Problem Management
Reactive Problem Management is one of two important Problem Management processes.
It is used to analyse and resolve the causes of incidents.
Reactive Problem Management is performed by the Service Operations group.
Time elapsed between the time the problem is reported and the time it is assigned to an individual for resolution.
A Resolution is the correction of a root cause so that the related incidents do not continue to occur.
Request for Change
A Request for Change (RFC) proposes a change to eliminate a known error and is addressed by the Change Management process.
A root cause of an incident is the fault in the service component which made the incident occur.
A Service Agreement is a general agreement outlining services to be provided, as well as costs of services and how they are to be billed.
A service agreement may be initiated between IT Enterprise and another agency.
A service agreement is distinguished from a Service Level Agreement in that there are no ongoing service level targets identified in a Service Agreement.
Service Level Agreement
Often referred to as the SLA, the Service Level Agreement is the agreement between IT Enterprise and the customer outlining services to be provided, and operational support levels as well as costs of services and how they are to be billed.
Service Level Target
Service Level Target is a commitment that is documented in a Service Level Agreement.
Service Level Targets are based on Service Level Requirements, and are needed to ensure that the IT Service continues to meet the original Service Level Requirements.
Service Level Targets should be specific, measurable, achievable, relevant, and timely.
Severity is determined by how much the user is restricted from performing their work.
There are three grades of severity:
- 3 – Low – Issue prevents the userfrom performing a portion of their duties.
- 2 – Medium – Issue prevents the userfrom performing critical time sensitive functions.
- 1 – High – Service or major portionof a service is unavailable
The severity of a problem will be used in determining the priority for resolution.
A workaround is a way of reducing or eliminating the impact of an incident or problem for which a full resolution is not yet available.
Scope of Problem Management
The scope of the Problem Management includes a standard set of processes, procedures, responsibilities and metrics that are utilised by all IT Enterprise services, applications, systems and network support teams.
Problem Management includes the activities required to diagnose the root cause of incidents and to determine the resolution to those problems.
It is also responsible for ensuring that the resolution is implemented through the appropriate control procedures, especially Change Management and Release Management.
Problem Management maintains information about problems and the appropriate workarounds and resolutions, so that the organisation is able to reduce the number and impact of incidents over time.
In this respect, Problem Management has a strong interface with Knowledge Management, and tools such as the Known Error Database will be used for both.
Although Incident and Problem Management are separate processes, they are closely related and will typically use the same tools, and use the same categorisation, impact and priority coding systems.
This will ensure effective communication when dealing with related incidents and problems.
Inputs and Outputs
Inputs to the Problem Management Process include the following:
- Problem records
- Incident details
- Configuration details from the Configuration Management Database.
- Supplier details about the products used in the infrastructure.
- Service Catalog and Service LevelAgreements.
- Details about the infrastructure and the way it behaves, such as capacity records, performance measurements, ServiceLevel reports, etc.
Outputs to the Problem Management Process include the following:
- Problem records
- Known Error Database
- Requests for Change
- Closed Problem records
- Management information
Metrics reports should generally be produced monthly with quarterly summaries.
Metrics to be reported are:
- Total numbers of problems (as acontrol measure).
- Breakdown of problems at each stage(e.g. logged, work in progress, closed etc.)
- Size of current problem backlog.
- Number and percentage of major problems.
Recent Blog Posts
GroupLink ITSM Problem Management’s beta release is scheduled for Q4 of 2021. Fill out the form below and we’ll you know when it becomes available.