A Technical Yet Accessible Guide to Spain’s Blackout Investigation: Week 2
In this edition, I dive into the latest official data, highlight key elements of the Spanish power grid, and provide a cyber-physical perspective on a highly relevant topic: nuclear power plants.
Index
1. The loss of 2200 MW.
2. Interconnection with France
3. Golfech Nuclear Power Plant
4. Nuclear Power Plants: A crash course.
5. Conclusions
1. The loss of 2200 MW.
Right before the weekend, the European Network of Transmission System Operators for Electricity ENTSO-E released a very significant piece of information: within 20 seconds, a series of generation trips in southern Spain led to a loss of 2200 MW. To put things in context, this figure is equivalent to two nuclear power plants (or 2 units). Quite a brutal event.
Can this information be reconciled with what we knew up until now? I'll compare the ENTSO-E timeline with the official information provided by relevant actors, such as governments, energy companies or the Spanish TSO, REE.
ENTSO-E
"Starting at 12:32:57 CET and within 20 seconds afterwards, presumably a series of different generation trips were registered in the south of Spain, accounting to an initially estimated total of 2200 MW."
Spanish Government (Sara Aagesen, Minister for Ecological Transition and the Demographic Challenge)
"We looked at the minutes before the blackout happened and saw that 19 seconds before it, there was also a loss of generation."
The Spanish government pointed to three separate "generation loss events" within those 20 seconds. Initially, only two were spotted, but then a third was identified 19 seconds before the first one (in chronological order of discovery). On the other hand, ENTSO-E mentions "presumably a series of different generation trips", which does not contradict Minister's statement but opens the door to further interpretations. Let's elaborate on it.
Everyone working on the open-source analysis so far has used data available from Gridradar's Malaga PMU, but in view of ENTSO-E statement I think it shouldn't be assumed this data accurately reflects power grid's status during those seconds, and previous minutes. Anyway, based on that data I created this graph to introduce the issue.
As ENTSO-E states, "At the moment of the incident, there were no oscillations and the power system variables were within the normal operating range." This indicates that, prior to the first event, the Spanish power grid was functioning normally, with no evident failures (N condition). At 12:32:57, all sources concur that a "generation loss event", or "generation trip", occurred. Let's refer to this and any subsequent incidents as simply "events". This event would correspond to the first event detected in the timeline (but the third in chronological order of discovery)
The key question now is whether an "event" represents a single generation plant tripping, a group of (same primary source) plants disconnecting simultaneously, or a mix? My intention is not to speculate, but to use these legitimate questions to introduce important concepts that will help us better understand the official reports once they are eventually published.
Scenario #1
In this scenario I'll try to explain the kind of reasoning that led me to think consider a massive disconnection of photovoltaic plants as a likely explanation for these 'missing' 2200 MW. However, please note that this does not imply that photovoltaic plants are the root cause of the blackout.
Let's assume that each event represents a single generation plant tripping. Therefore, right after the first event (12:32:57), the Spanish power grid would be operating in N-1, and then apparently recovering. I assume it recovered based on the official statement made by REE, the day after the blackout (so let's be cautious because information was limited back then).
REE(Spanish TSO)
"At around 12:33, the system was stable, with the variables that define the operation of the electrical system — frequency, voltage, and power flows — remaining stable and within secure operating conditions."
So I must assume that the system recovered from the first event (12:32:57), but also from the second one (12:33:16), according to the following statement:
REE(Spanish TSO)
"Immediately afterwards [12:33:16], what we have been able to identify is consistent with a loss of generation in the southwestern region of the Spanish peninsular system, a loss that was successfully managed, after which system stability was restored"
Although that stability didn't last long...The third event (12:33:17) was too much.
REE(Spanish TSO)
"A second and a half later [12:33:17], however, another event occurred, also consistent with a generation loss. This second event degraded the operating conditions of the electrical system from that moment onward."
If we assume each of these three events represents a single generation plant, this would imply that plants of at least ≥500 MW were involved. Losing 2200 MW from just three plants in the southern Spain would require the involvement of, at least, a Combined Cycle power plant, which are the top generation plants in that area.
My initial reasoning was something along the lines of: "Okay, maybe a failure/attack at a specific substation disconnected three power plants, or maybe the other way around". For example, the 'Pinar del Rey' substation connects the lines from the three 'San Roque' combined cycle plants, which together account for approximately 2400 MW, not far from the 2200 MW that were 'lost'.
However, if that had been the case, It would be surprising if either REE or the operators of these plants (the Spanish energy giants), were unable to identify at least one of the facilities involved immediately, or shortly thereafter, given their direct visibility over them...On top of that, we have to take into account that these combined cycle power plants are most likely classified as critical infrastructures.
The nail in the coffin for this theory is the power mix publicly available on REE's website. At 12:30, just before the blackout, total combined cycle generation was 1633 MW, so the numbers simply don’t add up.
If combined cycle power plants alone can account for the 2200 MW, there's only one other possibility (excluding a generation power mix, which again would involve a top generation plant): Photovoltaic.
The three-events-three-plants scenario can be easily discarded here, as there are no photovoltaic power plants greater than 500 MW.
Therefore, the most plausible scenario is that these 'events', whether generation loss events or generation trips, actually involve groups of plants. A loss of 2200 MW coming exclusively from a group of photovoltaic plants is entirely feasible, as can be seen clearly on the map.
There is nothing in this scenario that contradicts any of the official statements, as we are merely discussing a consequence of whatever the initiating event may have been. As has been made clear from the beginning, and as ENTSO-E acknowledges in its statement:
"The blackout is the result of a complex sequence of events"
In fact, the official statements seems to point to the multiple plants:
REE (Spanish TSO)
"The fact that the disconnections occurred in the southwestern region of the peninsula may suggest that the generation loss was solar."
UNEF - Unión Fotovoltaica Española (Industry association)
"The energy injected into the grid was scheduled the day before, and yesterday the planned schedule was being strictly followed. The photovoltaic plants did not disconnect voluntarily; they were disconnected from the grid."
This statement from UNEF is particularly interesting, but too ambiguous (at least for me) to conclude anything. Is it pointing to plant's protections? is it suggesting REE's automatic/manual operations? In the analysis from 'Week 1', I introduced, among other things discussed here, the Sistema de Reducción Automático de Potencia (SRAP), which could have been a way to disconnect them. However, in the context of this statement, it’s worth introducing another important concept related to the Spanish power grid: Programming Units (PU).
The PU is the fundamental structure used to in the market and REE's balancing services, which ensure that electricity supply matches demand in real time. A PU may consist of a single large generation plant (over 100 MW) or a group of smaller generation facilities that share the same primary energy source. In this case, each individual facility is referred to as a Physical Unit. Therefore, a Programming Unit can comprise a series of geographically distant photovoltaic plants grouped together under one operational entity. In the event of a grid imbalance or congestion that requires intervention, the CECRE uses both automatic and manual systems to execute the necessary actions. A key system supporting these operations is GEMAS, which was originally designed to safely integrate the surge in wind energy. It has since evolved to include other renewable sources, such as photovoltaic plants.
This 'system of systems' is complex, involving a wide range of components such as real-time state estimators, custom forecasting models for both wind and solar generation (SIPREOLICO and SIPRESOLAR), inputs from the TSO, DSOs, and balancing service providers... To sum up, REE operators and systems consider everything necessary to potentially reconfigure the grid and maintain a secure N-1 operational model. The resulting actions (e.g., set-points, limitation and activation orders, etc...) are then sent to the Physical Units via the CCGs, as explained in the previous edition of this analysis.
Given the complexity of these systems and operations, it’s worth considering that a corner case, bug, or technical fault may have played a role in the blackout, involving a cyber component, though not in the form of an attack.
Scenario #2
In this scenario, these events would actually mark the boundaries of a series of generation trips involving multiple groups of plants. The power grid never fully recovered from the first event, resulting in an N–k scenario over the course of those 20 seconds (a 'series of generation trips' as mentioned by ENTSO-E). Until more details emerge, I can’t make a definitive assessment, but I wanted to present it as a possibility worth considering.
2. Interconnection with France
ENTSO-E timeline mentions "The TSOs of Spain (Red Electrica) and France (RTE) took actions to mitigate these oscillations". I couldn't find any official statement about these mitigations.
So I decided to take a look at how photovoltaic generation behaved in relation to exports to France on the day of the blackout, using official data from REE. Two hours before the blackout, starting around 10:30 AM, a time when, on a sunny day with clear skies, solar output should be ramping up, there were two pronounced drops in photovoltaic generation (~900 MW and ~500 MW). One of them occurred just before the oscillations began.
This appears to be related to a controlled adjustment made to address an 'unexpected' drop and surge in demand.
During that period photovoltaic generation started to closely correlate with the flow of electricity to France, to the extent that by 10:50 AM, Spain was actually importing 267 MW from France...I'm not a grid operations expert, and although surely this isn't the first time something like this has happened, I'm genuinely curious whether it may have influenced the sequence of events that followed. As always, I welcome (and appreciate) insights from readers with technical expertise in this specific area.
For comparison, I’ve also included data from the previous week, which had similar weather conditions.
3. Golfech Nuclear Power Plant
There is some controversy surrounding the Golfech Nuclear Power Plant. A theory circulating in certain Spanish media outlets suggests that the plant tripped before the blackout, potentially contributing to grid instability. However, France's TSO, RTE, stated that the plant was automatically disconnected after the blackout.
In the timeline published by ENTSO-E, we can observe the following:
"Starting at 12:32:57 CET and within 20 seconds afterwards, presumably a series of different generation trips were registered in the south of Spain, accounting to an initially estimated total of 2200 MW. No generation trips were observed in Portugal and France."
Therefore, I give no credibility to the theory that Golfech tripped before the blackout.
Let’s use this context to introduce some key concepts about a topic that’s especially relevant right now: nuclear power plants.
4. Nuclear Power Plants: A crash course.
I've studied many different cyber-physical systems, but none have captivated me as much as nuclear power plants.
The main purpose of a nuclear power plant is to generate electricity, operating on the same principle as thermal power plants that also rely on the Rankine cycle. The key difference lies in how the heat is generated: thermal power plants burn fossil fuels to produce heat, while nuclear power plants use fission as the heat source, thus providing a carbon-free energy.
A nuclear power plant can be separated into two different parts:
Nuclear Island This section houses the Nuclear Steam Supply System, which includes the nuclear reactor and most of its support, operation, control, and safety systems. It also contains all the components necessary to produce the steam that drives the turbine.
Conventional Island This part is similar to any other thermal power plant based on the Rankine cycle. However, it may also include systems and components that contribute to the reactor's safety.
A NPP with a Pressurized Water Reactor (PWR) is usually comprised of three different circuits.
However, the most fascinating processes take place inside the reactor. It's important to understand that PWRs are designed with inherent safety mechanisms that rely solely on the laws of physics. I’ve created this diagram to introduce three of the most important ones.
Fuel Temperature Coefficient
The fuel temperature coefficient (FTC) is mainly driven by Doppler broadening of U-238 resonances, a beautiful physical process, which increases neutron absorption as fuel temperature rises. A negative FTC helps prevent power surges by reducing neutron flux with temperature increases.
Void Coefficient
PWRs are designed with a negative void coefficient, meaning the formation of steam bubbles (voids) in the moderator, due to pressure drops, reduces neutron moderation and adds negative reactivity, helping stabilize the reactor. In contrast, old RBMK reactors like Chernobyl’s had a positive void coefficient, and no containment. As a result, invoking 'Chernobyl' as a cautionary example is not relevant to the design and safety features of modern nuclear power plants.
Moderator Temperature Coefficient
PWRs are designed to be under-moderated, so any increase in moderator temperature (and corresponding decrease in density) reduces neutron moderation, introducing negative reactivity and reducing neutron multiplication.
What does it mean when you hear that a nuclear power plant has been 'automatically disconnected'?
There are two options: either the reactor (Nuclear Island) is tripped, or the turbine (Conventional Island) is.
If the reactor is tripped, the turbine cannot continue operating because there would be a load imbalance, its steam demand can no longer be met. Similarly, if the turbine is tripped, the reactor must also be shut down; otherwise, steam pressure would build up to dangerous levels.
However, this doesn’t mean the nuclear power plant 'failed', quite the opposite. All safety systems functioned as intended, bringing the plant to a safe state.
Load Following vs Base load
NPPs can be licensed to operate in load-following mode (France) instead of base load (Spain). This implies that there are important differences in the way that an NPP will be operated, including its related digital I&C systems (especially the Control Rods Drive Mechanism). Load-following mode provides flexibility to the grid but also poses significant challenges.
It is worth mentioning that Load-following nuclear power plants in France helped Spain in recovering from the blackout.
What About potential Cyber Attacks?
In October 2024 I published "A Practical Analysis of Cyber-Physical Attacks Against Nuclear Reactors". Essentially, if someone had approached me to assess the options for attacking a nuclear power plant, this would have been my delivery.
This free, 140-page, research paper aims to provide a comprehensive technical analysis of hypothetical cyber-physical attacks targeting the safety systems of nuclear reactors (PWRs), such as the Reactor Protection System (RPS) and the Engineered Safety Features Actuation System (ESFAS). It is partially based on the reverse engineering of Framatome’s Teleperm XS (TXS), a digital instrumentation and control platform designed specifically for safety systems in nuclear power plants, deployed in numerous nuclear reactors worldwide, including those in Europe, the USA, Russia, and China.
The paper also contains a comprehensive introduction that describes the nuclear engineering and nuclear physics concepts behind nuclear fission, Pressurized Water Reactors (PWRs) and NPPs, which are required to follow the subsequent cyber-physical attack scenarios. Prior knowledge of nuclear physics or reactor engineering is not assumed, making it accessible.
Conclusions
Once again, the battle against disinformation is crucial. This blackout is basically a technical issue, so let’s keep it that way.
The potential cyber-attack theory seems to be fading as more details emerge. Although, as I reported to the Spanish authorities right after the blackout, there are still many things to improve. The FT has just published a story on this.
It will be important to fully understand the sequence of events (something that the official reports will surely provide) in order to assess whether cyber-attacks could realistically replicate some of the steps that led to the grid failure, under specific conditions.