![]() |
|
|
|
|
||
|
|
||
|
|
||
|
Consulting, training, and coaching to help you solve messy problems that disrupt your operations or cause customers to take their business elsewhere. |
|
|
|
Leveraging a Crisis:
How to Learn the Truth About What Really Happened and Use It to Change How You Do Business
by Jeanne Sawyer |
||
Leveraging a Crisis Capitalizing on Problem Occurrences
|
Like other high-tech
companies, a major networking company (well call it "BigNet") has found
keeping up with rapid growth presents an interesting and significant challenge. In
particular, it stressed the Customer Service Organization's escalation process to where it
could no longer effectively meet either customer needs or their own internal needs.
Fundamentally, the process was not designed to handle the growing numbers of customers or
the increasing diversity in customer and BigNet requirements. The result was increasing dissatisfaction and frustration on the part of everyone who participated in identifying and resolving escalated problems. Increasing numbers of problems were treated as exceptions, which is expensive and time-consuming for all concerned. Relief was needed urgently, but where to begin? This article describes the true story of how BigNet approached this business process problem. Peter, the Vice President of Customer Service, decided to take a different approach than usual for such projects. Rather than chartering a team to draw "as is" and "should be" maps of the escalation process, Peter decided to choose a specific incident that was a representative demonstration of what happened to the overloaded process. With this approach, if the escalation process wasnt really causing the problem or was only a partial contributor, we would find out. He would analyze that incident using a method called TŘ Root Causes Analysis, a tool specifically designed to learn the truth about complicated incidents. The method focuses actions on preventing recurrences and improving recovery when problems occur. This approach promisedand delivereda quick and accurate way to understand exactly what was really happening and to identify the most immediate opportunities for improvement. Peter and his team chose the most recent automatic teller machine (ATM) outage in a particular customers network to be the TŘ event. This particular customer had experienced a series of outages, and was so angry and frustrated by the constant problems that BigNet was worried about losing a major customer. Such outages are particularly complicated to resolve because so many physical components, software products and different companies are involved. In this case, BigNet provided the network management hardware and software. Other participants included the bank that owned and operated the ATM, three phone companies, contractors responsible for installing and maintaining the cabling, and several contractors responsible for parts delivery and providing field engineers. There had been numerous meetings among technical staff and executives from the participating companies, but outages continued to occur and take too long to resolve. The outage selected for analysis was clearly representative of an escalated situation!
Peter wanted to use TŘ Root Cause Analysis (RCA) because it focuses on an event, or specific incident. Since an incident is something undesirable that happens at an identifiable time and place (the TŘ event), the analysis begins from an objective viewpoint. Everybody could easily agree that ATM outages are undesirable. In an already volatile relationship, this was an important advantage over other root cause analysis methods, which require an agreed problem definition before you can begin. Beyond discovering the cause of the incident, TŘ RCA uncovers what happens during recovery. Speedy recovery when a crisis occurs can substantially reduce the impact of the incident, so it is highly beneficial to identify root causes and prevent recurrence of problems in the recovery process. In this example, the ATM outage lasted 22 hours, and several months later, BigNet, the customer and the other participating companies were still dealing with follow-on problems associated with the outage. Thus the ability to improve the recovery process was also a major motivation in choosing to perform TŘ RCA. Click here to see a diagram that illustrates the picture that TŘ RCA creates. "Still water" before the TŘ incident represents the time when everything is completely normal, with no ripple hinting at what is to come. Still water after the TŘ incident represents the time when recovery is complete and there are no lingering traces of what occurred. TŘ RCA Pays Off in Improved Application Availability and Improved Productivity Performing TŘ RCA is intensive work, and like any such investment, must yield specific benefits. TŘ RCA, when applied to an appropriate incident, improves application availability to end users as well as improves productivity for all parties concerned. In the case discussed here, that includes technical staff and executives from BigNet as well as from the customer and all the other participating companies. The productivity improvements derive from the following specific benefits of using TŘ RCA:
Capitalizing on Problem Occurrences TŘ RCA takes advantage of the fact that problem incidents don't just happen for no reason. By studying an incident as the direct outcome of a series of events, we can break the problem into understandable and manageable pieces: we draw the picture of how the precursor events interact and lead directly to the TŘ incident. Similarly, recovery from an incident is the direct outcome of the activities that take place following the TŘ incident. The recovery can be speedy and effective or less than terrific. The TŘ incident is the basis for a discovery process that gives us the information we need to prevent future occurrences of a whole class of incidents. The process has seven steps: 1. Define the TŘ event precisely.
3. Identify "key problematic events."
4. Analyze apparent causes4. Analyze apparent causes.
6. Identify corrective actions.
The escalation process, originally targeted as the culprit, had little to do with the problem. Redesigning it would have had no impact on the real issues that were frustrating the bank. Instead, the real root causes were discovered. ATMs are now installed correctly the first time and other issues that were uncovered in the analysis have also been addressed. The bank, which was almost ready to change vendors, is now BigNets largest support customer. TŘ Root Cause Analysis is a straightforward way to find out why something happened and leverage that knowledge to prevent similar incidents in the future. It also enables speedier recovery when incidents to occur. One significant incident is the basis for a discovery process: we analyze it as the result of a chain of events. By preventing recurrence of individual problematic events, we reduce the chances of the TŘ outage incident recurring. Similarly, we eliminate individual problematic events in the recovery process. This practical approach keeps things manageable. The end result is to improve service and save money for customers and suppliers alike. |
Solving Problems PermanentlySM
JSawyer@SawyerPartnership.com
tel. 408-929-3622
Copyright ©2010.
The Sawyer Partnership. All rights reserved.
Jeanne Sawyer, Ph.D.