Your browser is out-of-date!

Update your browser to view this website correctly. Update my browser now

×

Beware! The Blue Screen of Death

The recent global IT outage sparked by an erroneous software update from cybersecurity company CrowdStrike has amplified calls for crisis planning and should prompt pro AV customers to take a fresh look at their crisis resilience, writes Ken Dunn

There were a few moments on 19 July 2024 when things were looking decidedly dicey in terms of the global IT outlook. A faulty update distributed by cybersecurity company CrowdStrike to its Falcon Sensor security platform caused significant problems with Microsoft Windows computers running the software – including instances of the dreaded Blue Screen of Death (BSOD), whereby an affected computer is effectively rendered non-operational until there is user intervention.

Fishtech uses Crestron technology in its Cyber Defense Center to provide cloud security

Widely referred to as the largest outage in IT history, the incident meant that (in the words of The Guardian) “flights and hospital appointments were cancelled, payroll systems seized up, and TV channels went off-air”. Whilst the fear that every impacted computer would have to be corrected manually did not come to pass, and a fix was released  relatively quickly, it has been estimated that the worldwide financial cost could end up being more than $10bn  – which suggests the expense of a more protracted global outage doesn’t really bear thinking about.

LONG-TERM CONSEQUENCES
But it’s likely this incident will have consequences far beyond the immediate financial and practical considerations. The long-term outlook for CrowdStrike, whose cloud-delivered Falcon Platform prevents, or responds to security breaches, remains to be seen. But the potential impact of such an outage will give organisations everywhere pause for thought after witnessing how far the ripples can travel.

For travel centres, corporate facilities and other critical pro AV user groups, perhaps the most positive way of framing recent events is they underline the need for regular infrastructural reviews and effective contingency planning in case any future outages come knocking.

Akamai is a provider of cloud computing, security and content delivery services. The company says its flagship platform, the Akamai Connected Cloud, has been designed to stave off threats and frequently produces detailed research on specific security incidents. The recent CrowdStrike episode was no exception, inspiring a fascinating blog which underlined the extent to which bad actors regard these incidents as an opportunity for scams devised to steal information and spread malware.

Invited to consider what lessons companies should derive from the recent outage, Steve Winterfeld, Akamai’s advisory CISO, says there are two main aspects. “One is to understand what your vendor agreements are. It’s critical to know what they’re responsible for, when they’re responsible for notifying, and what aid they’re responsible for giving. Then the second aspect is around your [own] crisis management plan, which is where it really gets interesting…”

‘Interesting’ in this case arguably equates to ‘potentially very complex’ given there are so many outage types that a crisis management plan needs to encompass. As well as deliberate DDoS or ransomware attacks, there are also triggers of non-malicious intent like the recent erroneous CrowdStrike update. Winterfield continues: “What I want to emphasise is that [this episode] was around a specific piece of technology, but I don’t think that’s as important as your crisis management plan to deal with it.” 

That means devising a strategy to address a number of questions. Winterfield says: “How are you going to recover? How fast can you recover? If there is a [BSOD] and you couldn’t recover quickly, what is going to help you out and enable you to get past something that took out some percentage of all your technology?”

Datapath’s terminals are used by first responders, industrial sites and military units

Ultimately, it falls on organisations to ensure they have done their due diligence in terms of business continuity, disaster recovery and cyber-resiliency. “What we’re really talking about here is resilience,” confirms Winterfeld. “Business continuity and disaster recovery are more [related to] when it’s an unintended issue; cyber resilience is more when it has [malicious origins] to my mind. But either way, you still have the same sort of resiliency responsibilities, which is where you get to this concept of having a very solid system that’s
really cohesive.”

CRISIS COMPLEXITY
It’s evident that Winterfeld has reservations about intricate infrastructural set-ups involving multiple suppliers. He believes that they may on occasions reduce the ability to talk about what kind of crisis has emerged. He notes: “But then you get into complexity and, personally as a CISO, I think complexity is the enemy of operations and security. [For example] say you’re having a problem in the cloud and you’re not having one on-prem. If you have different security, or operations tools, in the cloud versus on-prem, it makes for complex troubleshooting, [so really] you have to pick a lot of hybrid tools that will operate in both environments.”

As to whether the recent outage should make customers reconsider their use of on-prem vs cloud, Winterfeld says he prefers the cloud because in theory it has the ability to surge, expand, and recover faster. But ultimately, there is still a big role for individual companies to ensure they have anticipated every eventuality: “How good is your plan and how well have you rehearsed it?” he asks.

Regarding preparedness, Winterfeld perceives a fair amount of consistency within industries, especially those that are heavily dependent on their online presence. He ranks media and entertainment in the middle of the pack for maturity. He believes the sector is dealing more with privacy than outright malicious attacks to take them down. However, he clearly feels operators still have work to do. 

Among Winterfeld’s advice for pro AV users is an emphasis on careful and effective network segmentation. “There’s nobody I can think of that should have a big flat network where if something happens and one person gets sick, everybody gets sick,” he says.

Kramer tech in a control room

He believes continued vigilance about DDoS and ransomware attacks is essential as, although they will surge and wane, these threats will continue to be around. But he also urges awareness of the growing risk of ‘rogue’, or ‘zombie’ APIs.

UPDATE RISKS
Cerberus Tech, a leader in cloud-native IP contribution and distribution, was another company to respond to Installation’s call for advice to pro AV users in the wake of recent events. “Never deploy untested or unstable updates,” says Brad Carter, CEO and co-founder of Cerberus, whose solutions including the Livelink platform. “While backups might not always be an option, having all your eggs in one basket exposes you to risks. I’ve always prioritised providing customers with choices. For broadcast workflows, users can construct main and backup workflows that can be activated on demand across different protocols and clouds. This flexibility doesn’t eliminate issues, but offers a safeguard against potential problems.”

He also urges caution in the light of the increased role that AI may play in delivering and maintaining security mechanisms in the future. “Despite AI advancements, human error remains a factor,” he believes. “Instead of using this incident to push for more AI, we should consider the risks of autonomous AI updates.”

SILENT THREATS
The threat posed by back-end updates that happen silently is also highlighted by Craig Bury, CTO and partner of consultancy Three Media Holdings. He says: “Such considerations probably would not be at the top of the list of things to be concerned about inside a broadcast or media operation because you’ve got scheduling and all the other stuff, such as financial systems, that you need to consider.”

Bury believes there are good reasons to perceive the recent incident as a wake-up call for those sorts of applications wherein the security updates itself automatically in the background without the user having to do anything. The danger is that an invalid component in an update ends up causing a major outage. It also emphasises the case for enterprises and businesses to investigate tools which allow security updates to be applied by the local IT support people, or the user. Such an approach provides more control, he says, which is especially important in critical operations like those encountered in command & control centres, or areas of broadcast.  

Bury advises as a general rule to be aware of the potential risks and use the appropriate mitigation tools. And that surely applies to any organisation in pro AV, broadcast or somewhere in-between. It’s a complex security environment, in which risk factors and vulnerabilities vary between companies, as will the specific operations and assets most critical to their continued existence. In short, there is no one-size-fits-all approach even as more and more companies migrate towards a cloud-centric model.

Indeed, July’s incident has also underlined the interconnectedness of things and how an issue with one element can suddenly disrupt an entire operation. So if there was any lingering complacency about resilience beforehand, it should really have dissipated for good now.