Steps
to Solving Problems Effectively
1: Define
Success Criteria and Measure Performance
2: Identify
Priorities and Risks
3: Develop
and Execute Action Plan
4: Measure
Results
5: Celebrate
|
|
A crisis was coming. It
was the middle of April, and busy season would begin on July 1, with a seasonal 40%
increase in call volume. Busy season for this major utility companys multi-site call
center would last until September 1. The new computer systems were anything but stable,
and performance was inadequate even at the current volumes. The utility company and two
key vendors were frantically trying to resolve the multitude of problems, but it was
obvious to all concerned that the current path was leading to a crisis. At risk were
millions of dollars and everyones reputation. Failure would be national front page
news. This true story of how three
companies implemented a successful "Busy Season Crisis Avoidance" (BSCA) project
provides a step-by-step example of how customers and vendors can solve problems
effectively and build a real partnership by choosing where they want to go, and defining
an effective route to get there.
Step 1: Define
Success Criteria and Measure Performance
Identifying and
defining any problem begins with data: we start by defining what success means for all
participants in measurable, objective terms. This tells us what to measure and defines
both "threshold" and "optimal" levels of performance. The threshold
level defines the point at which we fail; optimal level was where we really wanted to be.
The question to answer is, "How will we know
the problem is solved, or as in the example, how will we know that a crisis was
successfully avoided?" This key step forces us to focus carefully, get specific, and
agree about what is important.
However, we can use the success criteria for more
than simply answering "yes" or "no" to whether we solved the problem.
We can track trends in these key measures to help us:
- Determine the truth about what is actually
occurring,
- Understand and plan for risks and vulnerabilities,
- Focus action to achieving specific results, and
- Verify that the action taken achieved those
results.
In the BSCA project, the success criteria were
defined and agreed to in an intensive, facilitated all-day work session. Participants
included key management and technical representatives from both vendors and the customer.
It was hard work. The group struggled with limiting the success criteria to the minimum
required to survive busy season, without adding "would-be-nice" objectives. If
extras were added, no matter how desirable for other reasons, the work required to survive
busy season would be diluted or even obstructed. However, survival depended on our ability
to focus on exactly what was needed to survive busy seasonnothing more, nothing
less. This was not the time for stretch goals.
Choosing the right metrics was the hardest part
because everyone had to discipline their thinking differently than they ever had before.
The first difficulty was staying focused on defining survival. The second difficulty was
pinning it down to specific, measurable criteria. The third difficulty was accepting that,
for this project, establishing thresholds that were less than perfect was not only OK, but
necessary. We succeeded by brainstorming possible metrics, then questioning each
rigorously:
- Was the metric important to surviving busy season?
If we did not make the threshold, would the president of the utility, or the utilities
commission, be calling to ask what happened? If we did achieve it, did we deserve
congratulations and a party?
- Was the metric specific and measurable? Would we
knowunambiguouslywhether or not we were successful?
After much discussion, the group agreed on
threshold and optimal levels of system performance and availability that, if met, would
define success. These included such measures as system response time, numbers of timeouts,
amount of time lost to unplanned outages, and occurrences of databases not being
synchronized.
Once the metrics were defined, it was relatively
straightforward to begin collecting the data to demonstrate how we were progressing. For
BSCA, updated graphs were produced and reviewed weekly. This allowed us to detect and
intervene early if any trend lines headed in the wrong direction or got dangerously close
to the threshold levels. Because we started collecting data at the same time we started
taking action, we were not able to demonstrate improvement over previous performance. This
was not important, however, since we really didnt care about details of past
history. What was important was that the metrics showed our intervening actions were
effective at keeping performance below the trouble threshold. The figure below shows an
example of how we tracked the data from the beginning of the project through busy season
for retrieval timeouts during busy hour. The project began in April. The immediate,
positive impact is clear (and successful beyond what anyone thought possible at the
beginning of the project).

Step 2:
Identify Priorities and Risks
The next step is to identify priorities
based on the success criteria. By analyzing the data, we can choose the areas to focus on
that are most important to achieving our success metrics. For example, for BSCA, if
timeout levels had started to rise, we knew we must investigate why and address that
specific problem immediately. If we did not, timeouts would soon exceed the threshold
level, and we would not have survived busy season. The defined success criteria dictate
the priorities: we would focus on anything that threatened our success metricsand
not waste our time on anything that had little or no impact on those metrics.
Step 2 also requires that we analyze the
situation specifically for risks. We consider both threats to our business that could
result from attempting to address the issue and getting it wrong (or from not addressing
the issue at all) as well as vulnerabilities, or possible obstacles to success. What could
go wrong? Once we know what could go wrong, we can take steps to eliminate the
possibility, or at least contain the impact.
Priorities and risks are always evaluated with
respect to their impact on our ability to achieve the success criteria. The goal is to
resolve first the issues with the largest negative impact on, in this case, the
performance and availability metrics. For example, everyone realized that the software
vendor had been making, numerous changes to the application software, primarily to make
available new features that the customer requested. However, the constant change was a
major contributor to making the system unstable, a consequence that would cause BSCA to
fail. Once we agreed that surviving busy season was more important than the new features,
and once we understood from the data how much system availability suffered when the
software was changed, it was easy to agree that no new software would be installed during
busy season unless it was to correct a bug that threatened our performance and
availability metrics, i.e., our success criteria.
Similar analysis enabled the customer to decide
to wait until after busy season to retire some old equipment that the new system would
eventually make unnecessary. Although the customer had obviously sound reasons to want the
inventory off the books as early as possible, it wasnt worth jeopardizing busy
season survival with yet another major change. The success criteria gave us the ability to
say "no" or "not now" to changes that previously had been considered
mandatory.
Step 3:
Develop and Execute the Project Plan
With the priorities and risks clearly
identified, the next step is to decide what to do about them and then do it. For
each priority and risk, we do the following steps:
- Chunk it into component issues and define success
criteria to measure that the issue is resolved,
- For each issue, analyze for root causes and verify
that the causes are real and important to achieving success criteria,
- For each cause, identify what deliverables will
enable us to eliminate the cause,
- For each deliverable, identify what actions we
must take to create the deliverable.
The actions are the bottom line: until somebody
does something to change things, well keep getting what were getting. Taking
action to avoid or mitigate risks is as critical as taking action to resolve issues that
are directly causing the problem.
In BSCA, for example, we identified one major
chunk as the lack of change management. Individuals from the utility and the vendors would
change things on the system according to their independent requirements, usually with
inadequate planning and often conflicting with each other. We determined that one of the
causes of the change management problems was that there was no procedure for authorizing
any given change or determining when that change should be implemented. Deliverables
included documented procedures, a change request form that had to be approved before the
change could be implemented, and change criteria for deciding whether a change was
essential to BSCA and therefore should be implemented during busy season.
Step 4:
Measure Results
As actions are implemented, we measure how
were doing by continually checking our success criteria metrics. If the metrics
indicate an undesirable trend, or worse, that weve exceeded the threshold levels, we
immediately re-evaluate what were doing to find out why and adjust course as
necessary. In this case, it is clear from the steady downward trend of the number of
timeouts that the actions are having the desired effect. Some of the other metrics were
less ideal, but in every case the metric provided advance warning and we were able to
identify what caused the reversal and correct it before we were back in serious trouble.
Step 5:
Celebrate
Avoiding a major crisis is hard work and
the people who achieve it deserve congratulations. This key step is often overlooked,
perhaps because new issues are always ready for our attention or perhaps because its
hard to identify when a crisis has been avoided. In any case, avoiding the crisis is far
more beneficial to the customer and vendors alike, and we must take care to identify when
that occurs and reward the individuals who make it happen.
The success of BSCA was celebrated in
mid-September, after the seasonal increase was over and the metrics were available that
proved we had survived. In this case, everyone who participated in the effort from the
utility and both vendors was invited to a special breakfast hosted by senior executives
from all three companies. The charts demonstrating the achievement were proudly displayed.
The busy season crisis was avoided, and
demonstrated by measurable results. The party was great, the customer executives got their
bonuses, the vendor got more business. Nobody made front page news.
Top |