The Risks You Can’t Foresee
Mục Lục
Idea in Brief
The Problem
Even a company with a world-class risk management system will come up against novel risks it has not planned for.
Why It Happens
Some risks are so remote that no manager imagines them. And even if the firm does envision them, it may be unwilling to invest in the capabilities and resources to cope with them because they seem so unlikely.
The Solution
Recognize novel risks by being alert for anomalies, interpreting reports from the field, and scanning for unusual events outside your industry. Once you’ve identified a novel risk event, mobilize an incident team or empower your people on the front lines to deal with it quickly.
Well-run companies prepare for the risks they face. Those risks can be significant, and while they’re not always addressed successfully—think Deepwater Horizon, rogue securities traders, and explosions at chemical plants—the risk management function of a company generally helps it develop protocols and processes to anticipate, assess, and mitigate them.
Yet even a world-class risk management system can’t prepare a company for everything. Some risks are so remote that no individual manager or group of managers could ever imagine them. And even when firms envision a far-off risk, it may seem so improbable that they’re unwilling to invest in the capabilities and resources to cope with it. Such distant threats, which we call novel risks, can’t be managed by using a standard playbook.
Managing Risk and Resilience: Series reprint
buy copies
In this article we’ll explore the defining characteristics of these risks, explain how to detect whether they’ve materialized, and then describe how to mobilize resources and capabilities to lessen their impact.
What Makes Risks Novel
Unlike the more-familiar and routine risks a company faces, novel risks are difficult to quantify in terms of likelihood or impact. They arise in one of three situations:
The triggering event is outside the risk bearer’s realm of imagination or experience or happens somewhere far away.
These kinds of events are sometimes labeled black swans, but they’re not inherently unpredictable. The global financial crisis of 2008, for instance, has often been described as a black swan because most banks investing in and trading mortgage-backed securities were blind to the risks embedded in their portfolios. They didn’t envision a general decline in real estate prices. A small number of investors and banks familiar with real estate and financial markets, however, did anticipate a mortgage market meltdown and earned huge profits by shorting mortgage-backed securities.
Often, unforeseen risks arise from distant events at a company’s supplier. Take the case of a small fire in a Philips semiconductor plant in Albuquerque, New Mexico, in March 2000. Triggered by a lightning strike, it was extinguished by the local fire department within minutes. The plant manager dutifully reported the fire to the plant’s customers, telling them that it had caused only minor damage and that production would resume in a week. The purchasing manager at Ericsson, a major customer, checked that his on-hand inventory of the plant’s semiconductors would meet production needs over the next couple of weeks and didn’t escalate the issue.
Unfortunately, the fire’s smoke and soot and the extensive hosing of the facility had contaminated the clean rooms where highly sensitive electronic wafers were fabricated, and production didn’t restart for several months. By the time the Ericsson purchasing manager learned about the delay, all alternative suppliers of several of the plant’s wafers had already been committed to other companies. The component shortages cost Ericsson $400 million in lost revenues from the delayed launch of its next-generation mobile phone and contributed to its exit from this market the following year.
Multiple routine breakdowns combine to trigger a major failure.
Large, interconnected technologies, systems, and organizations can lead to a situation in which a number of events, each manageable in isolation, coincide to create a “perfect storm.” Consider Boeing’s development of the 787 Dreamliner. For this plane, Boeing introduced new structural materials—composites rather than aluminum—to make the airframe lighter; required its first-tier suppliers to take unprecedented responsibility for design, engineering, and the integration of subassemblies; and replaced the hydraulic controls used in previous generations of aircraft with electronic controls that required large lithium batteries for backup. A Boeing engineer interviewed in the Seattle Times in 2011 noted that compared with all prior models, the 787 was “a more complicated airplane, with newer ideas, new features, new systems, new technologies.”
The clearest signal that a novel risk is emerging is anomalies—things that just don’t make sense. This sounds obvious, but most anomalies are difficult for people to recognize.
Boeing experienced seven major and unexpected delays to the 787’s development, with commercial flights beginning three and a half years later than originally planned. The delays added more than $10 billion in development costs and forced Boeing to purchase a major supplier to prevent its insolvency. After the 787 was launched, its onboard lithium batteries caught fire during a number of flights, which led authorities to ground all the planes for several months. The company told Reuters, “We made too many changes at the same time—new technology, new design tools, and a change in the supply chain—and thus outran our ability to manage it effectively.”
The risk materializes very rapidly and on an enormous scale.
Organizations train personnel, design equipment, and map out responses to address foreseeable risks but judge it impractical or uneconomical to prepare for events that are beyond a certain magnitude. Some events, moreover, are so huge that they make even the best cost-benefit analysis obsolete and happen so fast that they overwhelm planned responses. We call this category tsunami risks, after the Fukushima nuclear plant catastrophe in Japan, an archetypal example.
Fukushima, like many other power plants in Japan, had been designed to withstand rare events such as earthquakes and ocean waves up to 5.7 meters high. But the Tōhoku earthquake in March 2011 generated a remarkable 14-meter-high tsunami that swept over the plant’s seawall, filling its basements and knocking out the emergency generators at the plant, which had already suffered severe damage from the quake. The impact was overwhelming: The plant had three nuclear meltdowns and three hydrogen explosions, releasing radioactive contamination throughout the local region and forcing more than 100,000 people to evacuate. During the next three years, Tokyo Electric paid out more than $38 billion to compensate individuals and businesses for the disruption.
The Covid-19 pandemic is similar. The world was already familiar with managing global outbreaks of viruses that cause acute respiratory symptoms, including the SARS epidemic in 2003, H5N1 “avian” flu in 2004 to 2006, and H1N1 in 2009. The CoV-2 coronavirus, despite being a variant of SARS, was novel because people it infected were both asymptomatic and contagious for an extended period, spreading it much farther and faster than most national health care systems had planned for.
Companies can sometimes avoid the worst consequences of novel risks by using scenario analysis, a routine risk management tool, to identify them and then taking action to mitigate them. But even if applied frequently, this technique will not cover all eventualities, and sooner or later companies will confront risks they’re unprepared for.
Recognizing Novel Risks
The clearest signal that a novel risk is emerging is anomalies—things that just don’t make sense. This sounds obvious, but most anomalies are difficult for people to recognize or process.
Take two of the cases already described. An experienced purchasing manager for semiconductors should arguably have realized that the soot, smoke, and large quantities of water that accompany even a minor fire could compromise the integrity of clean rooms. A senior risk manager at Boeing, presumably familiar with complex engineering projects, should have anticipated that novel risks could arise in the development of a plane when first-tier suppliers were performing major tasks they had never done before, the plane incorporated materials never used before at such a scale in a large aircraft, and familiar analog hydraulic controls were replaced with entirely new electronic ones.
Failures to pick up signals are rooted in well-documented biases. Decades of behavioral research show that people pay attention to information that confirms their beliefs but disregard it when it conflicts with them. They often dismiss repeated deviances and near misses as mere blips. This “normalization of deviance” gets reinforced by groupthink, which causes team leaders to suppress or ignore concerns and anomalies reported by lower-level personnel.
Biases are also often reinforced by standard procedures. In 1998, for example, a Deutsche Bahn high-speed train derailed in Lower Saxony, Germany, killing 101 people and seriously injuring 88 others. But the accident could have been avoided. A passenger had seen a large piece of metal (later determined to have been a section of a wheel) emerge from the floor into a cabin, where it became wedged between two passenger seats. Yet he didn’t activate a nearby emergency brake, because a prominently displayed sign warned that travelers would be subject to a large fine if they pulled the brake without authorization—a measure intended to prevent unnecessary train stoppages.
The passenger dutifully went to find a conductor, who had the authority to activate the brake but still failed to do so. When the conductor was sued for negligence by Deutsche Bahn, he successfully defended his actions by claiming that he had followed an established rule that required him to visually inspect any problem (which in this case was several carriages away) before triggering an emergency stop. His adherence to the protocol for managing a routine risk delayed his response to the novel event—with catastrophic consequences.
The bottom line is that recognizing a novel risk requires people to suppress their instincts, question their assumptions, and think deeply about the situation. This System Two thinking, as Daniel Kahneman terms it, is unfortunately more time-consuming and more demanding than making a rapid evaluation and following the rules. And in cases like the train derailment, the pressure of the moment makes it more rather than less likely that people will default to their instinctive thinking mode. Given those problems, companies cannot rely on managers familiar with routine risk protocols to identify novel risks. They should instead:
Empower a senior executive to worry about what could go wrong.
At Nokia, another large customer of the Philips Albuquerque semiconductor plant, information about any unusual event in a supply chain had to be reported to a senior vice president of operations, logistics, and sourcing. This executive, who had few day-to-day operational responsibilities, served as the company’s top troubleshooter, or—as we like to say—its “chief worry officer.”
This role differs from that of a traditional chief risk officer, whose priorities are to improve the management of known routine risks and to identify new risks that can then be transformed into manageable routine risks. By contrast, the worry officer has to quickly recognize the emergence of any novel risk and mobilize a process for addressing it in real time.
When Nokia’s purchasing manager received the call about the plant fire, he checked that existing inventory levels were adequate and logged it as a routine event, just as his Ericsson counterpart had done. But following protocol, he reported it to the senior VP as a supply chain anomaly. The VP investigated further and learned that parts shortages from the plant could potentially disrupt more than 5% of the company’s annual production.
Victor Prado/The Licensing Project
The VP mobilized a 30-person multifunction team to manage the potential threat. Engineers redesigned some chips so that they could be obtained from alternative sources, and the team quickly purchased most of the remaining chips from other suppliers. But there were two types of chips for which Philips was the only supplier. The VP called the Nokia CEO, reaching him on the corporate plane, briefed him about the situation, and got him to reroute the plane to land in the Netherlands and go meet with Philips’s CEO at Philips headquarters.
After the meeting the two companies agreed that “Philips and Nokia would operate as one company regarding those components,” according to an interview the troubleshooter gave the Wall Street Journal. In effect, Nokia could now use Philips as its captive supplier for the two scarce chips. The relationship allowed Nokia to maintain production of existing phones, launch its next generation of phones on time, and benefit when Ericsson exited the mobile phone market.
Digitize event reporting.
Digital technology can be a powerful tool in the search for anomalies, as the experiences of the Swiss electricity utility Swissgrid illustrate. Through a user-friendly mobile app, RiskTalk, Swissgrid’s employees can quickly report safety violations, maintenance problems, and imminent equipment failures. A rotating group of risk, safety, and quality managers monitor the app’s messages in a central control room, applying data analytics to connect the dots between these small and unrelated reports and identify potential novel risks. A control room manager who believes that a low-probability novel risk might materialize can analyze it more deeply to determine whether to implement a nonroutine response. In effect, members of the team serve as the company’s chief worry officers, empowered to think deeply about and respond quickly to novel risks.
Recognizing a novel risk requires people to suppress their instincts, question their assumptions, and think deeply about the situation.
In addition to encouraging employee reports, companies can look outside their organizations for information about novel risks. Swissgrid has joined forces with the Swiss army, the Swiss national police force, and several other federal and state agencies and corporations to develop a real-time national crisis-management platform that can be accessed by all parties involved. Each entity uses the platform to report any issue it learns about, such as a forest fire, an accident triggering a massive traffic jam, or unusual snow conditions or avalanches in the Alps. Risk managers at Swissgrid, connected to the platform, get early visibility into external situations that could potentially interrupt the reliable flow of electricity to customers.
Imagine what if.
Companies can also identify potential novel risks indirectly—by looking at what has happened in other industries and countries and then asking themselves, “What if that happens here?”
At Swissgrid the senior risk officer keeps an eye out for unsettling developments like the Swissair bankruptcy and the high-profile cyberattack on the shipping giant Maersk. Following any such event, he schedules an extraordinary-risk workshop attended by senior managers and risk officers from every business unit and by external subject-matter experts. After deliberation, the group creates an action plan that can be deployed should something similar occur in Swissgrid’s supply chain. This systematic process helps the company spot potential novel risks and transform them into managed ones.
As Swissgrid’s CEO, Yves Zumwald, has noted, “Our business, with individual risks and intricate connections spread across all our units, is too complex for any one individual to fathom. Yet we cannot wait for problems to show up and then solve them like firefighters. [The systems we have put in place] enable us to solve a lot of problems proactively.” Those now include many risks that would be complete surprises to most other companies.
Responding to Novel Risks
For all a company’s efforts to anticipate what-ifs, novel risks will still emerge, and companies will not have a script or a playbook for managing them “right of boom,” or after disaster has struck. Also, nothing in the backgrounds of operating or risk managers will help them respond quickly and appropriately. In this situation a company needs to make decisions that are (a) good enough, (b) taken soon enough to make a difference, (c) communicated well enough to be understood, and (d) carried out well enough to be effective until a better option emerges. A company has two options for right-of-boom responses:
Deploy a critical-incident-management team.
This standard approach to a novel risk—creating a central team to oversee the response—works well when an event has widespread impact but doesn’t need a complete, immediate solution.
The team should consist of employees from different functions and levels of the company, external people with relevant expertise, and representatives of stakeholders and partners. For a novel event such as the Covid-19 epidemic, for example, a company’s critical-incident team would need people with medical, public health, and public policy expertise, which the firm might not have in-house. For managing the consequences of delays in large-scale product development—for instance, for a new aircraft—the team should work closely with its suppliers. Over time, as the situation changes and new information emerges, the membership of the team may change.
All communications should be brutally honest about the situation, highlight clearly what the organization doesn’t yet know, and provide a rational basis for hope.
The team deciphers the situation, identifies the most important issues, and establishes priorities among the firm’s multiple, and sometimes competing, constituencies and interests. It can delegate specific questions, such as how to access and preserve cash and how to manage key components in the supply chain, to other individuals or subgroups to examine, but the team must maintain responsibility for coordinating all aspects of the response.
The team usually meets at least daily and more often if the event is evolving rapidly. It manages communication within the firm and coaches the CEO on external communications. All communications should be brutally honest about the reality of the situation, highlight clearly what the organization doesn’t yet know, provide a rational basis for hope, and empathize with all stakeholders affected by the event.
The discussion dynamics are important. A critical-incident team brings together diverse individuals who may have never met before and might be reluctant to speak candidly among people they don’t know, especially those higher up in the organization. The aim is to encourage inquiry, not advocacy, which is why meetings must be psychologically safe gatherings where everyone can offer untested ideas and disagree. What is right is far more important than who is right. That’s partly why someone other than the team’s leader should facilitate meetings. By listening rather than speaking, the leader reduces the likelihood that subordinates will defer to their perception of the chief decision-maker’s opinion.
Manage the crisis at the local level.
Some novel risks don’t allow for the luxury of a critical-incident team. Time is of the essence, and details about the situation are difficult to communicate to company headquarters far from where the threat has emerged. In those situations, responses must be delegated to personnel closest to the event.
Take Adventure Travel Agency (not its real name), a Boston-based company offering trips off the beaten track to experienced travelers. It initially employed U.S. tour guides who were familiar with its targeted customers. But the CEO soon learned, painfully, that any trip could involve accidents, illness, and disruptions from extreme weather, natural disasters, political unrest, hotel cancellations, airline delays, and strikes. Novel risks came with the business’s territory.
In a lengthy, costly process, the company replaced its American guides with local guides in each country, who had considerable knowledge of their regions and strong local contacts. It empowered the new guides to problem-solve and implement a response to any novel situation that arose during a trip. The company believed that the guides had the best information about challenges that might come up; the best knowledge, connections, and resources to develop creative responses; the best understanding of the tour group’s preferences regarding responses; and the ability to put the chosen solution quickly into effect. The company’s headquarters assisted them by performing tasks best handled by a central staff (such as rescheduling flights and rebooking hotel reservations).
The travel company’s decentralized approach of authorizing operations people to also serve as risk managers departs from established risk management standards. But for a distant novel-risk event requiring an immediate response, centralized risk managers would have limited information about the event, be unaware of local options and preferences, and have little to no ability to rapidly implement a response.
The OODA Loop
The OODA loop—observe, orient, decide, act—was devised by a Korean War–era fighter pilot, Colonel John Boyd, who believed that pilots whose OODA loops were faster than those of their adversaries would control air battles. After a novel risk event, a critical-incident team with an OODA loop that outpaces changes in the environment will better control the event’s impact on the company.
Initially, the team observes to learn all it can about the situation. The team orients itself by making sense of the situation and identifying its key elements. Members generate options, assess the likely consequences of each, select the best one, and take steps to implement the chosen response—treating the decision not as a permanent commitment to a course of action but as part of an ongoing experiment. The team begins the next OODA loop by observing the event’s evolution—particularly how its own actions modified the situation.
The initial decisions by either a centralized team or a local employee will be speculative, given how little information will be available in an uncertain, dynamic environment. Being perfectly exactly correct cannot be a performance standard. Any response may, in hindsight, have been suboptimal. But the company has no alternative other than to make a quick, “probably approximately correct” decision, learn from it, acquire new information, and act again and again to stay ahead of events. (For more on how to do this, see the sidebar “The OODA Loop.”)
Risks come in many forms and flavors. Companies can manage the ones they know about and anticipate. But novel risks—those that emerge completely out of the blue—will arise either from complex combinations of seemingly routine events or from unprecedentedly massive events. Companies need to detect them and then activate a response that differs from standard approaches to managing routine risks. That response must be rapid, improvisational, iterative, and humble, since not every action taken will work as intended.
A version of this article appeared in the November–December 2020 issue of Harvard Business Review.