Konstantin Nagorny, Chief Engineer at Linxdatacenter, St. Petersburg
The investigation of the fire that destroyed OVH's Strasbourg Data Center (SBG2) in March 2022 concluded that the unfortunate event was a result of catastrophic violations of fire safety requirements.
Here we take a closer look at the circumstances of the fire and make a list of dos and don'ts for the future.
The fire at VH's Strasbourg Data Center (SBG2) led to the blackouts of 3.6 million sites and services across the global web. The shutdown affected government portals, banks, retailers, and media.
Despite the efforts of firefighters, the whole building was lost in a large uncontrollable fire. The circumstances of the fire remind us that even the major providers’ infrastructure is prone to design mistakes and can be vulnerable to disasters.
The first versions and the final report
The early findings from a preliminary investigation pointed to the uninterruptible power supply (UPS) devices as the cause of the fire. According to media reports, they caught fire a few hours after maintenance that included replacement of a few parts.
The UPS devices reportedly caught fire again several days after the main fire was gone. It’s highly likely that the data center owner used lithium batteries, which are almost impossible to extinguish due to the exothermic chemical reactions caused by the extreme heat. Reignition in the off state confirms this hypothesis.
The fire brigade of the Bas-Rhin department that worked on the site reported electric arcs longer than 1 m in the UPS room.
Due to the lack of a main switch, it took around 3 hours to turn off the power supply of the data center. The UPS inverters were energized for all that time and some servers continued to work.
In addition, the electrical cable routers were not isolated and the floors and ceilings of the data center were made of wood. The free cooling system which normally supplied the servers with cool fresh air helped the fire to spread across the building.
Finally, SBG2 had no automatic fire extinguishing system that could have mitigated the disaster.
Let’s take a closer look at how all these details led to the massive fire, and how it could possibly have been prevented.
Wooden floors and ceilings
A data center should be designed as a complex of isolated compartments with walls, doors, and service shafts constructed of fire-resistant materials. They should be able to resist fire for 45 to 60 minutes.
All these measures should provide enough time to evacuate personnel, call firefighters, and power off the equipment. Yet this is no panacea: the temperature may be high enough to burn the metal. That’s why some countries impose even higher standards and require all metal constructions to be additionally covered with a refractory paint.
Most likely, the wooden components of the data center had some coating, but a low-quality one. Additionally, the SBG2 spaces weren’t really isolated compartments, which allowed the fire to spread quickly from area to area and eventually destroy the entire building.
Lack of automatic fire extinguishing system
There’s currently no generally accepted standard for automated extinguishing in data centers. Most often, everything is up to the owner.
Yet it’s mandatory for a data center to have an automated fire alarm system that can detect a fire and launch processes that prevent it from spreading and causing further damage.
Even in the absence of an automated fire extinguishing system, there are solutions that compensate for it. For instance, the owner can reduce the use of flammable materials in some critical areas in the data center.
Spread of fire due to free cooling
After the fire alarm goes off, the ventilation system should be shut down, and fire dampers between compartments should be closed. The dampers are located at the edge of each compartment and prevent fire from spreading, as well as cut off oxygen supply from the outside.
In case of fire, a ventilation system functioning as normal can be a serious factor that helps the fire to spread.
Emergency power-off (EPO)
A kill switch, or emergency power-off, is a safety mechanism to power off the whole facility in case of emergency when the power can’t be turned off in the usual manner.
Some people think that a single EPO switch can be dangerous in terms of operational sustainability, since if it is pressed by accident, it can power off the whole data center. Dedicated kill switches for different data center systems can be a more reliable solution. For instance, separate switches can be used for separate lines of UPS that power the servers, reducing the risk of simultaneous shutdown of two UPS beams in case of false alarm or accidental trigger.
The EPO buttons must be located outside the equipment rooms. Employees need safe access to them in case of fire. If no special EPO boards are installed, an ordinary electrical switch can work, if located on a switchboard outside the equipment room.
There are several measures to consider to improve the fire safety of a data center.
The premises of the data center should be built as isolated fire compartments. The walls, the floors, and the ceiling should be fireproof to prevent fire from spreading in any direction. Even if the insides of a data center are designed as isolated fire compartments, a roof made of flammable materials will make it extremely vulnerable to fire. In case of fire, the flames will spread all over it, and firefighters will have to flood the entire data center with water to extinguish it.
The cable routes through the walls of fire compartments should be filled with non-flammable materials with the same level of fire resistance as the walls. For instance, a special foam can be used. Both power cables and telecom cables should have fire isolation to prevent fire spreading over them like a fuse.
Cable routes and electrical switches should be in good working condition. Electrical switches and EPO buttons should be located outside the equipment rooms and be safe to use to power off the burning equipment.
The data center should have an automated fire alarm system that takes the design of the server rooms into account. For instance, to detect smoke in a large volume of fast-moving air, special early smoke detection sensors should be used.
For premises where diesel generators are located, IR sensors should be used to detect fire in the IR spectrum to prevent false alarms from the regular sensors that detect heat and smoke. In the noisy environment of server rooms with UPSs and DGUs, stroboscopic alerters should be used to warn the employees in time.
If a fire alarm is triggered, the ventilation should be automatically turned off and the fire dampers should be closed to isolate fire compartments and stop the supply of oxygen to the premises. A data center with a closed cooling system can continue operations in case of a local fire. A free cooling system will be shut completely once the ventilation is locked.
It’s a good idea to install gas fire extinguishing systems in some critical areas of a data center, such as server rooms, UPS rooms, and battery rooms. The gas should be safe for humans to prevent health damage for the employees if they find themselves trapped in the premises by fire.
Fires in data centers are usually caused by faulty electrical wiring, electrical equipment and switches, hot welding and other uses of open flames. It’s important to minimize the risks of man-made fire by documenting and monitoring all the work. Of course, smoking should be prohibited inside the data center.
Storage of any flammable materials in the data center should be organized outside the server rooms and other critical areas.
We hope this article helped you to learn more about fire safety in data centers.
BEST, money transfer and payments operator
The customer faced a technical issue with a persistent BGP session flag with Linxdatacenter hardware. We examined the problem and found out that one of customer’s hosts was under a DDoS attack.
Because of the distributed nature of the attack, traffic couldn’t be filtered effectively, and disconnecting the host from the external network wasn’t an option. The attack stopped after changes in the server configuration, but resumed the day after. A 5.5 Gbps attack overloaded the junctions with internet providers, affecting other Linx Cloud users. To mitigate the effects of the attack, we employed a dedicated DDoS protection service.
To ensure the continuous availability of resources hosted in Linx Cloud, we rerouted all the customer’s traffic through StormWall Anti-DDoS system. The attack was stopped within half an hour. To prevent future cyberattacks, we organized all connections to the customer’s resources through the StormWall network.
Thank you for your inquiry, we will get back to you shortly!