Abstract
Tokamaks are the most promising way for nuclear fusion reactors. Disruption in tokamaks is a violent event that terminates a confined plasma and causes unacceptable damage to the device. Machine learning models have been widely used to predict incoming disruptions. However, future reactors, with much higher stored energy, cannot provide enough unmitigated disruption data at high performance to train the predictor before damaging themselves. Here we apply a deep parameter-based transfer learning method in disruption prediction. We train a model on the J-TEXT tokamak and transfer it, with only 20 discharges, to EAST, which has a large difference in size, operation regime, and configuration with respect to J-TEXT. Results demonstrate that the transfer learning method reaches a similar performance to the model trained directly with EAST using about 1900 discharge. Our results suggest that the proposed method can tackle the challenge in predicting disruptions for future tokamaks like ITER with knowledge learned from existing tokamaks.
Similar content being viewed by others
Introduction
Nuclear fusion energy could be the ultimate energy for humankind. Tokamak is the leading candidate for a practical nuclear fusion reactor. It uses magnetic fields to confine extremely high temperature (100 million K) plasma. Disruption is a catastrophic loss of plasma confinement, which releases a large amount of energy and will cause severe damage to tokamak machine1,2,3,4. Disruption is one of the biggest hurdles in realizing magnetically controlled fusion. DMS(Disruption Mitigation System) such as MGI (Massive Gas Injection) and SPI (Shattered Pellet Injection) can effectively mitigate and alleviate the damage caused by disruptions in current devices5,6. For large tokamaks such as ITER, unmitigated disruptions at high-performance discharge are unacceptable. Predicting potential disruptions is a critical factor in effectively triggering the DMS. Thus it is important to accurately predict disruptions with enough warning time7. Currently, there are two main approaches to disruption prediction research: rule-based and data-driven methods. Rule-based methods are based on the current understanding of disruption and focus on identifying event chains and disruption paths and provide interpretability8,9,10,11. Feature engineering may benefit from an even broader domain knowledge, which is not specific to disruption prediction tasks and does not require knowledge of disruptions. On the other hand, data-driven methods learn from the vast amount of data accumulated over the years and have achieved excellent performance, but lack interpretability12,13,14,15,16,17,18,19,20. Both approaches benefit from the other: rule-based methods accelerate the calculation by surrogate models, while data-driven methods benefit from domain knowledge when choosing input signals and designing the model. Currently, both approaches need sufficient data from the target tokamak for training the predictors before they are applied. Most of the other methods published in the literature focus on predicting disruptions specifically for one device and lack generalization ability. Since unmitigated disruptions of a high-performance discharge would severely damage future fusion reactor, it is challenging to accumulate enough disruptive data, especially at high performance regime, to train a usable disruption predictor.
There are attempts to make a model that works on new machines with existing machine’s data. Previous studies across different machines have shown that using the predictors trained on one tokamak to directly predict disruptions in another leads to poor performance15,19,21. Domain knowledge is necessary to improve performance. The Fusion Recurrent Neural Network (FRNN) was trained with mixed discharges from DIII-D and a ‘glimpse’ of discharges from JET (5 disruptive and 16 non-disruptive discharges), and is able to predict disruptive discharges in JET with a high accuracy15. The Hybrid Deep-Learning (HDL) architecture was trained with 20 disruptive discharges and thousands of discharges from EAST, combined with more than a thousand discharges from DIII-D and C-Mod, and reached a boost performance in predicting disruptions in EAST19. An adaptive disruption predictor was built based on the analysis of quite large databases of AUG and JET discharges, and was transferred from AUG to JET with a success rate of 98.14% for mitigation and 94.17% for prevention22.
Mixing data from both target and existing machines is one way of transfer learning, instance-based transfer learning. But the information carried by the limited data from the target machine could be flooded by data from the existing machines. These works are carried out among tokamaks with similar configurations and sizes. However, the gap between future tokamak reactors and any tokamaks existing today is very large23,24. Sizes of the machine, operation regimes, configurations, feature distributions, disruption causes, characteristic paths, and other factors will all result in different plasma performances and different disruption processes. Thus, in this work we selected the J-TEXT and the EAST tokamak which have a large difference in configuration, operation regime, time scale, feature distributions, and disruptive causes, to demonstrate the proposed transfer learning method. J-TEXT is a tokamak with a full-carbon wall where the main types of disruptions are those induced by density limits and locked modes25,26,27,28,29. In contrast, EAST is a tokamak with a metal wall where disruptions caused by density limits and locked modes are also observed, but the most frequent causes of disruptions are temperature hollowing, edge cooling, and VDEs30,31,32,33. In addition, from the view of the experimental setup, J-TEXT focuses on investigating disruptions, while EAST focuses on long-pulse, steady-state operations. These differences create a significant gap between J-TEXT and EAST.
It is also necessary to point out that these methods published in the literature benefit from domain knowledge related to disruption15,19,22. The input diagnostics and features are representative of disruption dynamics and the methods are designed carefully to better fit the inputs. However, most of them refer to successful models in Computer Vision (CV) or Natural Language Processing (NLP) applications. The design of these models in CV or NLP applications are often influenced by how human perceives the problems and heavily depends on the nature of the data and domain knowledge34,35. However, the tokamak produces data that is quite different from images or text. Tokamak uses lots of diagnostic instruments to measure different physical quantities. Different diagnostics also have different spatial and temporal resolutions. Different diagnostics are sampled at different time intervals, producing heterogeneous time series data. So designing a neural network structure that is tailored specifically for fusion diagnostic data is needed.
In this paper, we present a transfer learning-based method to predict disruptions for future fusion reactors, and give a demonstration of transferring a model trained with a mid-sized circular limiter configuration tokamak J-TEXT to a larger-sized non-circular divertor configuration with elongation and triangularity of EAST. (i) We present a feature extractor that is expected to learn from existing tokamaks and extract general disruption-related features across different tokamaks, namely the Fusion Feature Extractor (FFE, see Fig. 1). The FFE is the basis for transferring the pre-trained model to the target domain (see The deep learning-based FFE design in Methods). (ii) We train and evaluate the FFE-based disruption predictor on the J-TEXT tokamak, to prove whether the predictor is able to extract disruption-related precursors, and to obtain a high-performance pre-trained model in the source domain. (iii) We apply the parameter-based transfer learning method in predicting disruptions for future tokamaks. We transfer the pre-trained model on circular tokamak to predict disruptions in non-circular divertor tokamak with only a few discharges and verify the performance with numerical experiments. (iv) We prove with further numerical experiments that the low-level layers of the FFE extract general features across tokamaks.
The colored layers are the inputs of the model. Each color code stands for a different sampling rate for the diagnostic. The Mirnov coils are resampled to 50 kHz (in blue); the core channel of plasma density and soft X-ray array are resampled to 10 kHz (in orange); the radiation arrays (soft X-ray and the Absolute eXtreme UltraViolet (AXUV) radiation measurement) and other diagnostics (saddle coils, plasma current, and displacements) are resampled to 1 kHz (in gray). The input diagnostics are chosen according to disruption dynamics, such as magnetohydrodynamic (MHD)instabilities, density limits, and radiation asymmetry. The FFE is designed and implemented according to the characteristics of the diagnostics: the parallel convolution 1D layers are used to extract spatial features and high-frequency temporal features within a feature frame. For different kinds of diagnostics with different typical frequencies, appropriate sample rates and sliding window sizes are applied respectively for better feature extraction, as can be found in the figure. Diagnostics-bearing features with low frequencies are concatenated with the output of the parallel convolution 1D layers and form a feature frame together. Multiple frames form a time series for the Long Short-Term Memory (LSTM) to capture temporal features with a larger time scale. The extracted features are then fed into the classifier to tell if the sample indicates future disruptions. © IOP Publishing Ltd. All rights reserved.
Results and discussions
The purpose of this research is to improve the disruption prediction performance on target tokamak with mostly knowledge from the source tokamak. The model performance on target domain largely depends on the performance of the model in the source domain36. Thus, we first need to obtain a high-performance pre-trained model with J-TEXT data. Then we apply the model to the target domain which is EAST dataset with a freeze&fine-tune transfer learning technique, and make comparisons with other strategies. We then analyze experimentally whether the transferred model is able to extract general features and the role each part of the model plays.
Training and evaluation of the source model on the J-TEXT tokamak
The performance of the transferred model in the target domain largely depends on the knowledge it learns from the source domain. In our case, we focus on mitigating disruptions on EAST tokamak based on the knowledge of J-TEXT. J-TEXT is a medium-sized tokamak with circular plasma configuration37,38. Disruption-related experiments on J-TEXT primarily focus on disruptions induced by tearing modes and density limits. If we use the resistive diffusion time (\({\tau }_{R}={a}^{2}/\eta\), where \(a\) is the minor radius of a tokamak and \(\eta\) the magnetic diffusivity) to briefly describe the time scale of physical phenomena taking place in a tokamak39,40, we assume that the time scales of certain physical processes are shorter on J-TEXT (\({\tau }_{R}\approx 25{{{{{\rm{ms}}}}}}\)) than on other larger machines. The source model is trained with data on J-TEXT in 2017–2018 campaigns. The training set consists of 494 discharges (189 disruptive), which covers the main disruptive paths in J-TEXT and contains a large number of samples. The validation set consists of 140 discharges (70 disruptive), to help evaluate the performance of the model and to guide adjusting the model so that it is not over-parameterized. The test set consists of 220 discharges (110 disruptive). Discharges with a current decay rate of over \(2.5{{{{{\rm{MA}}}}}}{{{{{\rm{s}}}}}}^{-1}\) are recognized as disruptive, and the disruption time is considered as the beginning of the current quench and is denoted as \({T}_{{{{{\rm{disr}}}}}}\). The warning time given by the model is denoted as \({T}_{{{{{\rm{alarm}}}}}}\). Samples within the fixed length of 62 ms in advance of \({T}_{{{{{\rm{disr}}}}}}\) are labeled as disruptive. Though literature have shown that depending on different disruptive paths, the time scale of the “disruptive” phase could be quite different, results based on the FFE show a better result predicting using a constant to label the disruptive discharges (see Labeling in Methods). Earlier experiments show that it needs 5 ms in advance for the MGI to mitigate the disruption on J-TEXT5,41. Another 5 ms safety margin was added for future mitigation methods which would take longer to take effect. Thus, alarms launched >10 ms in advance are considered tardy alarms in J-TEXT. Alarms launched >300 ms in advance are considered early alarms in J-TEXT. In this paper, we care about whether most of the disruptions during the flattop phase are predicted with enough time in advance to trigger the DMS and to mitigate the disruption, and whether the false alarms affect the normal operation of the tokamak. The output of the model represents a probability of a sample being disruptive with a time resolution of 1 millisecond. When the probability consecutively exceeds the selected threshold (0.75 in this work) several times (5 times in this work), the discharge is considered to be disruptive and the DMS is supposed to be triggered. Thus, we use success alarm rate (\({{{{{\rm{SAR}}}}}}=\frac{{N}_{{{{{\rm{success}}}}}}}{{N}_{{{{{\rm{disruptive}}}}}}}\)) and false alarm rate (\({{{{{\rm{FAR}}}}}}=\frac{{N}_{{{{{\rm{false}}}}}}}{{N}_{{{{{\rm{non}}}}}-{disruptive}}}\)) to evaluate whether the predictor can meet the requirement to effectively trigger the DMS and to give as less false alarms22. It is worth noting that, \({N}_{{{{{\rm{success}}}}}}\) represents the disruptions that are considered to be mitigated effectively, which excludes the early alarms and tardy alarms. The SAR and FAR are the most important metrics for disruption mitigation, as they represent the disruptions mitigated effectively, and the normal discharges interrupted incorrectly. To better visualize the results, balanced accuracy (\({{{{{\rm{BA}}}}}}=({{{{{\rm{SAR}}}}}}+(1-{{{{{\rm{FAR}}}}}}))/2\)) is selected as an overall metric that covers both SAR and FAR to evaluate the performance.
An accumulated percentage of disruption predicted versus warning time is shown in Fig. 2. All disruptive discharges are successfully predicted without considering tardy and early alarm, while the SAR reached 92.73%. To further gain physics insights and to investigate what the model is learning, a sensitivity analysis is applied by retraining the model with one or several signals of the same kind left out at a time. The results of the sensitivity analysis are shown in Fig. 3. The model classification performance indicates the FFE is able to extract important information from J-TEXT data and has the potential to be transferred to the EAST tokamak.
The vertical dashed line at 10 ms indicates tardy alarms. The model is able to predict 100% of disruptions in the test set on the J-TEXT tokamak with a warning time of 1 ms, 96.36% of disruptions with 10 ms. A warning time of 5 ms is enough for the Disruption Mitigation System (DMS) to take effect on the J-TEXT tokamak. To ensure the DMS will take effect (Massive Gas Injection (MGI) and future mitigation methods which would take a longer time), a warning time larger than 10 ms are considered effective.
The Fusion Feature Extractor (FFE) based model is retrained with one or several signals of the same kind left out each time. Naturally, the drop in the performance compared with the model trained with all signals is supposed to indicate the importance of the dropped signals. Signals are ordered from top to bottom in decreasing order of importance. It appears that the radiation arrays (soft X-ray (SXR) and the Absolute eXtreme UltraViolet (AXUV) radiation measurement) contain the most relevant information with disruptions on J-TEXT, with a sampling rate of only 1 kHz. Though the core channel of the radiation array is not dropped and is sampled with 10 kHz, the spatial information cannot be compensated. Density and the locked-mode-related signals also contain a large amount of disruption-related information. According to statistics, the majority of disruptions in J-TEXT are induced by locked modes and density limits, which aligns with the results. However, the mirnov coils which measure magnetohydrodynamic (MHD)instabilities with higher frequencies are not contributing much. This is probably because these instabilities will not lead to disruptions directly. It is also shown that the plasma current is not contributing much, because the plasma current does not change much on J-TEXT.
A typical disruptive discharge with tearing mode of J-TEXT is shown in Fig. 4. Figure 4a shows the plasma current and 4b shows the relative temperature fluctuation. The disruption occurs at around 0.22 s which the red dashed line indicates. And as is shown in Fig. 4e, f, a tearing mode occurs from the beginning of the discharge and lasts until disruption. As the discharge proceeds, the rotation speed of the magnetic islands gradually slows down, which could be indicated by the frequencies of the poloidal and toroidal Mirnov signals. According to the statistics on J-TEXT, 3~5 kHz is a typical frequency band for m/n = 2/1 tearing mode. Since J-TEXT does not have a high-performance scenario, most tearing modes at low frequencies will develop into locked modes and will cause disruptions in a few milliseconds. The predictor gives an alarm as the frequencies of the Mirnov signals approach 3.5 kHz. The predictor was trained with raw signals without any extracted features. The only information the model knows about tearing modes is the sampling rate and sliding window length of the raw mirnov signals. As is shown in Fig. 4c, d, the model recognizes the typical frequency of tearing mode exactly and sends out the warning 80 ms ahead of disruption.
a shows the plasma current of the discharge and b shows the electron cyclotron emission (ECE)signal which indicates relative temperature fluctuation; c and d show the frequencies of poloidal and toroidal Mirnov signals; e, f show the raw poloidal and toroidal Mirnov signals. The red dashed line indicates Tdisruption when disruption takes place. The orange dash-dot line indicates Twarning when the predictor warns about the upcoming disruption. The green horizontal line indicates the frequency of 3.5 kHz. The model sends out a warning at the time when the frequency of the Mirnov coils approaches 3.5 kHz, where the rotating speed of the magnetic island slows down and eventually keeps at a low frequency. The warning indicates that the model learns to identify 2/1 tearing modes in J-TEXT. The model learns the feature by itself with limited information given to model.
To further verify the FFE’s ability to extract disruptive-related features, two other models are trained using the same input signals and discharges, and tested using the same discharges on J-TEXT for comparison. The first is a deep neural network model applying similar structure with the FFE, as is shown in Fig. 5. The difference is that, all diagnostics are resampled to 100 kHz and are sliced into 1 ms length time windows, rather than dealing with different spatial and temporal features with different sampling rate and sliding window length. The samples are fed into the model directly, not considering features’ heterogeneous nature. The other model adopts the support vector machine (SVM). The inputs of the SVM are manually extracted features guided by physical mechanism of disruption42,43,44. Features containing temporal and spatial profile information are extracted based on the domain knowledge of diagnostics and disruption physics. The input signals of the feature engineering are the same as the input signals of the FFE-based predictor. Mode numbers, typical frequencies of MHD instabilities, and amplitude and phase of n = 1 locked mode are extracted from mirnov coils and saddle coils. Kurtosis, skewness, and variance of the radiation array are extracted from radiation arrays (AXUV and SXR). Other important signals related to disruption such as density, plasma current, and displacement are also concatenated with the features extracted.
Performances between the three models are shown in Table 1. The disruption predictor based on FFE outperforms other models. The model based on the SVM with manual feature extraction also beats the general deep neural network (NN) model by a big margin. The results further prove that domain knowledge help improve the model performance. If used properly, it also improves the performance of a deep learning model by adding domain knowledge to it when designing the model and the input.
The performance of transferred model on the EAST tokamak
For deep neural networks, transfer learning is based on a pre-trained model that was previously trained on a large, representative enough dataset. The pre-trained model is expected to learn general enough feature maps based on the source dataset. The pre-trained model is then optimized on a smaller and more specific dataset, using a freeze&fine-tune process45,46,47. By freezing some layers, their parameters will stay fixed and not updated during the fine-tuning process, so that the model retains the knowledge it learns from the large dataset. The rest of the layers which are not frozen are fine-tuned, are further trained with the specific dataset and the parameters are updated to better fit the target task. In our case, the FFE trained on J-TEXT is expected to be able to extract low-level features across different tokamaks, such as those related to MHD instabilities as well as other features that are common across different tokamaks. The top layers (layers closer to the output) of the pre-trained model, usually the classifier, as well as the top of the feature extractor, are used for extracting high-level features specific to the source tasks. The top layers of the model are usually fine-tuned or replaced to make them more relevant for the target task.
In our case, the pre-trained model from the J-TEXT tokamak has already been proven its effectiveness in extracting disruptive-related features on J-TEXT. To further test its ability for predicting disruptions across tokamaks based on transfer learning, a group of numerical experiments is carried out on a new target tokamak EAST. Compared to the J-TEXT tokamak, EAST has a much larger size, and operates in steady-state divertor configuration with elongation and triangularity, with much higher plasma performance (see Dataset in Methods). The configuration and operation regime gap between J-TEXT and EAST is much larger than the gap between those ITER-like configuration tokamaks. Information and results about the numerical experiments are shown in Table 2.
In the experiments, different strategies are applied to predict disruptions in the same test set from the EAST tokamak. All five cases apply the same set of hyper-parameters and are tested on the same discharges from EAST. The hyper-parameters are chosen by systematically scanning the number of ParallelConv1D blocks and LSTM layers. As a result, the model with two ParallelConv1D blocks and one LSTM layer performs the best in the J-TEXT validation set among all. We choose a labeling threshold of 172 ms (see Labeling in Methods). Alarms launched less than 30 ms in advance are considered as tardy alarms in EAST, while alarms launched more than 3 s in advance are considered as early alarms in EAST, considering time scale of EAST as well as the requirements for DMS in ITER48. Case 1 is trained with 1896 EAST discharges (355 of them disruptive) from scratch and is selected as baseline (BA = 78.06%). Case 2 is fine-tuned with 20 EAST discharges (10 disruptive) based on the pre-trained model from J-TEXT, and reaches the most similar performance (BA = 71.39%). Case 3 is trained with 494 J-TEXT discharges (189 disruptive) and 20 EAST discharges (10 disruptive) from scratch. Compared with Case 2, even though the same training set is used, mixing data together (BA = 67.50%) will neither exploit the limited information from EAST nor the general knowledge from J-TEXT. One possible explanation is that the EAST discharges are not representative enough and the architecture is flooded with J-TEXT data. Case 4 is trained with 20 EAST discharges (10 disruptive) from scratch. To avoid over-parameterization when training, we applied L1 and L2 regularization to the model, and adjusted the learning rate schedule (see Overfitting handling in Methods). The performance (BA = 60.28%) indicates that using only the limited data from the target domain is not enough for extracting general features of disruption. Case 5 uses the pre-trained model from J-TEXT directly (BA = 59.44%). Using the source model along would make the general knowledge about disruption be contaminated by other knowledge specific to the source domain. To conclude, the freeze & fine-tune technique is able to reach a similar performance using only 20 discharges with the full data baseline, and outperforms all other cases by a large margin. Using parameter-based transfer learning technique to combine both the source tokamak model and data from the target tokamak properly may help make better use of knowledge from both domains.
Interpretable investigation of the transferred model
In order to validate whether the model did capture general and common patterns among different tokamaks even with great differences in configuration and operation regime, as well as to explore the role that each part of the model played, we further designed more numerical experiments as is demonstrated in Fig. 6. The numerical experiments are designed for interpretable investigation of the transfer model as is described in Table 3. In each case, a different part of the model is frozen. In case 1, the bottom layers of the ParallelConv1D blocks are frozen. In case 2, all layers of the ParallelConv1D blocks are frozen. In case 3, all layers in ParallelConv1D blocks, as well as the LSTM layers are frozen. Within each case, a more in-depth comparison is made by applying different freeze&fine-tune strategies. In cases 1-a, 2-a, and 3-a, the parameters of the frozen layers are not updated while the rest of the models which are not frozen are further tuned with 20 EAST discharges (10 disruptive). In cases 1-b, 2-b, and 3-b, the layers unfrozen are replaced with new layers without any training. The new layers are trained from scratch with the same 20 EAST discharges rather than updating the origin parameters. In case 1-c, 2-c, and 3-c, the same process as 1-a, 2-a, and 3-a is applied. Then the frozen layers are unfrozen so that the parameters of the layers can be updated. The whole model is then further trained with the same 20 EAST discharges.
The bottom layers which are closer to the inputs (the ParallelConv1D blocks in the diagram) are frozen and the parameters will stay unchanged at further tuning the model. The layers which are not frozen (the upper layers which are closer to the output, long short-term memory (LSTM) layer, and the classifier made up of fully connected layers in the diagram) will be further trained with the 20 EAST discharges. In cases 1-b, 2-b, and 3-b, the unfrozen layers are first replaced with new layers untrained, and are then trained with the 20 discharges.
As a result, it is the best practice to freeze all layers in the ParallelConv1D blocks and only fine-tune the LSTM layers and the classifier without unfreezing the frozen layers (case 2-a, and the metrics are shown in case 2 in Table 2). The layers frozen are considered able to extract general features across tokamaks, while the rest are thought to be tokamak specific. We assume that the ParallelConv1D layers are supposed to extract the feature within a frame, which is a time slice of 1 ms, while the LSTM layers focus more on extracting the features in a longer time scale, which is tokamak dependent.
Comparing across the cases, it is found that freezing all layers in the ParallelConv1D blocks performs the best (case 2-a/b/c, BA = 71.39%/71.11%/68.89%). Freezing only the bottom layers of the ParallelConv1D blocks performs slightly worse (case 1-a/b/c, BA = 70.56%/70.56%/67.50%), and freezing both layers in ParallelConv1D blocks and LSTM layers performs the worst (case 3-a/b/c, BA = 68.33%/68.33%/67.50%). The time scales for global plasma dynamics and local instabilities are different on the two tokamaks, as pointed out by the different typical resistive time scales (\({\tau }_{R}\approx 25{{{{{\rm{ms}}}}}}\) on J-TEXT while \({\tau }_{R}\ge 500{{{{{\rm{ms}}}}}}\) on EAST). There is no obvious way of manually adjust the trained LSTM layers to compensate these time-scale changes. The LSTM layers from the source model actually fits the same time scale as J-TEXT, but does not match the same time scale as EAST. The results demonstrate that the LSTM layers are fixed to the time scale in J-TEXT when training on J-TEXT and are not suitable for fitting a longer time scale in the EAST tokamak.
In case 1-b, 2-b, and 3-b, instead of directly fine-tuning the unfrozen layers, we abandoned the pre-trained layers, and replace them with new layers that has never been trained and then fine-tuned them with the same 20 discharges from EAST, the performance with this approach is almost the same with case 1-a, 2-a and 3-a. Specifically, abandoning the knowledge in the pre-trained LSTM and classifier layers will not cause performance degradation (case 1-b and 2-b). The results further prove that the LSTM layers extract tokamak-specific features related to different time scales of tokamaks and are not very useful in transfer to new tokamak, while the ParallelConv1D layers extract domain-invariant features across different tokamaks. Moreover, the performances of case 1-c, 2-c, and 3-c, which unfreezes the frozen layers and further tune them, are much worse. The results indicate that, limited data from the target tokamak is not representative enough and the common knowledge will be more likely flooded with specific patterns from the source data which will result in a worse performance.
As a conclusion, our results of the numerical experiments demonstrate that parameter-based transfer learning does help predict disruptions in future tokamak with limited data, and outperforms other strategies to a large extent. Additionally, the layers in the ParallelConv1D blocks are capable of extracting general and low-level features of disruption discharges across different tokamaks. The LSTM layers, however, are supposed to extract features with a larger time scale related to certain tokamaks specifically and are fixed with the time scale on the tokamak pre-trained. Different tokamaks vary greatly in resistive diffusion time scale and configuration. This makes them not contribute to predicting disruptions on future tokamak with a different time scale. However, further discoveries in the physical mechanisms in plasma physics could potentially contribute to scaling a normalized time scale across tokamaks. We will be able to obtain a better way to process signals in a larger time scale, so that even the LSTM layers of the neural network will be able to extract general information in diagnostics across different tokamaks in a larger time scale. Our results prove that parameter-based transfer learning is effective and has the potential to predict disruptions in future fusion reactors with different configurations.
Conclusions
Parameter-based transfer learning can be very helpful in transferring disruption prediction models in future reactors. ITER is designed with a major radius of 6.2 m and a minor radius of 2.0 m, and will be operating in a very different operating regime and scenario than any of the existing tokamaks23. In this work, we transfer the source model trained with the mid-sized circular limiter plasmas on J-TEXT tokamak to a much larger-sized and non-circular divertor plasmas on EAST tokamak, with only a few data. The successful demonstration suggests that the proposed method is expected to contribute to predicting disruptions in ITER with knowledge learnt from existing tokamaks with different configurations. Specifically, in order to improve the performance of the target domain, it is of great significance to improve the performance of the source domain. We designed the deep learning-based FFE neural network structure based on the understanding of tokamak diagnostics and basic disruption physics. It is proven the ability to extract disruption-related patterns efficiently. The FFE provides a foundation to transfer the model to the target domain. Freeze & fine-tune parameter-based transfer learning technique is applied to transfer the J-TEXT pre-trained model to a larger-sized tokamak with a handful of target data. The method greatly improves the performance of predicting disruptions in future tokamaks compared with other strategies, including instance-based transfer learning (mixing target and existing data together). Knowledge from existing tokamaks can be efficiently applied to future fusion reactor with different configurations. However, the method still needs further improvement to be applied directly to disruption prediction in future tokamaks.
In addition, there is still more potential for making better use of data combined with other types of transfer learning techniques. Making full use of data is the key to disruption prediction, especially for future fusion reactors. Parameter-based transfer learning can work with another method to further improve the transfer performance. Other methods such as instance-based transfer learning can guide the production of the limited target tokamak data used in the parameter-based transfer method, to improve the transfer efficiency. Disruptions in magnetically confined plasmas share the same physical laws. Though disruptions in different tokamaks with different configurations belong to their respective domains, it is possible to extract domain-invariant features across all tokamaks. Physics-driven feature engineering, deep domain generalization, and other representation-based transfer learning techniques can be applied in further research.
Finally, the deep learning-based FFE has more potential for further usages in other fusion-related ML tasks. Multi-task learning is an approach to inductive transfer that improves generalization by using the domain information contained in the training signals of related tasks as domain knowledge49. A shared representation learnt from each task help other tasks learn better. Though the feature extractor is trained for disruption prediction, some of the results could be used for another fusion-related purpose, such as the classification of tokamak plasma confinement states. The pre-trained model is considered to have extracted disruption-related, low-level features that would help other fusion-related tasks be learned better. The pre-trained feature extractor could drastically reduce the amount of data needed for training operation mode classification and other new fusion research-related tasks.
Methods
Dataset
Different tokamaks own different diagnostic systems. However, they are supposed to share the same or similar diagnostics for essential operations. To develop a feature extractor for diagnostics to support transferring to future tokamaks, at least 2 tokamaks with similar diagnostic systems are required. In addition, considering the large number of diagnostics to be used, the tokamaks should also be able to provide enough data covering various kinds of disruptions for better training, such as disruptions induced by density limits, locked modes, as well as other reasons. In addition, future reactors will perform in a higher performance operational regime than existing tokamaks. Thus the target tokamak is supposed to perform in a higher-performance operational regime and more advanced scenario than the source tokamak which the disruption predictor is trained on. With the concerns above, the J-TEXT tokamak and the EAST tokamak are selected as great platforms to support the study as a possible use case. The J-TEXT tokamak is used to provide a pre-trained model which is considered to contain general knowledge of disruption, while the EAST tokamak is the target device to be predicted based on the pre-trained model by transfer learning.
The J-TEXT tokamak has been operated since its first plasma was obtained at the end of 200738. The J-TEXT tokamak is a medium-sized tokamak with a major radius R = 1.05 m and a minor radius a = 0.25 m. A complete diagnostic system with over 300 channels of various diagnostics has been installed on J-TEXT. The typical discharge of the J-TEXT tokamak in the limiter configuration is done with a plasma current Ip of ~200 kA, a toroidal field Bt of ~2.0 T, a pulse length of 800 ms, plasma densities ne of 1–7 × 1019 m−3, and an electron temperature Te of about 1 keV. J-TEXT has accumulated a diverse range of data on various types of disruptions, especially disruptions induced by density limits and locked modes.
The EAST tokamak is an ITER-like, fully super-conducting tokamak with a major radius R = 1.85 m and a minor radius a = 0.45 m50. The EAST tokamak shares some of the common diagnostic systems with J-TEXT, such as the measurements of radiation, displacement, locked modes, MHD instabilities, plasma current, plasma density as well as other common diagnostics. The typical discharge of the EAST tokamak in the divertor configuration is done with a plasma current Ip of around 450 kA, a toroidal field Bt of around 1.5 T, a pulse length of around 10 s, and a βN of around 2.1.
The study is conducted on the J-TEXT and EAST disruption database based on the previous work13,51. Discharges from the J-TEXT tokamak are used for validating the effectiveness of the deep fusion feature extractor, as well as offering a pre-trained model on J-TEXT for further transferring to predict disruptions from the EAST tokamak. To make sure the inputs of the disruption predictor are kept the same, 47 channels of diagnostics are selected from both J-TEXT and EAST respectively, as is shown in Table 4. When selecting, the consistency across discharges, as well as between the two tokamaks, of geometry and view of the diagnostics are considered as much as possible. The diagnostics are able to cover the typical frequency of 2/1 tearing modes, the cycle of sawtooth oscillations, radiation asymmetry, and other spatial and temporal information low level enough. As the diagnostics bear multiple physical and temporal scales, different sample rates are selected respectively for different diagnostics.
854 discharges (525 disruptive) out of 2017–2018 compaigns are picked out from J-TEXT. The discharges cover all the channels we selected as inputs, and include all types of disruptions in J-TEXT. Most of the dropped disruptive discharges were induced manually and did not show any sign of instability before disruption, such as the ones with MGI (Massive Gas Injection). Additionally, some discharges were dropped due to invalid data in most of the input channels. It is difficult for the model in the target domain to outperform that in the source domain in transfer learning. Thus the pre-trained model from the source domain is expected to include as much information as possible. In this case, the pre-trained model with J-TEXT discharges is supposed to acquire as much disruptive-related knowledge as possible. Thus the discharges chosen from J-TEXT are randomly shuffled and split into training, validation, and test sets. The training set contains 494 discharges (189 disruptive), while the validation set contains 140 discharges (70 disruptive) and the test set contains 220 discharges (110 disruptive). Normally, to simulate real operational scenarios, the model should be trained with data from earlier campaigns and tested with data from later ones, since the performance of the model could be degraded because the experimental environments vary in different campaigns. A model good enough in one campaign is probably not as good enough for a new campaign, which is the “aging problem”. However, when training the source model on J-TEXT, we care more about disruption-related knowledge. Thus, we split our data sets randomly in J-TEXT. As for the EAST tokamak, a total of 1896 discharges including 355 disruptive discharges are selected as the training set. 60 disruptive and 60 non-disruptive discharges are selected as the validation set, while 180 disruptive and 180 non-disruptive discharges are selected as the test set. It is worth noting that, since the output of the model is the probability of the sample being disruptive with a time resolution of 1 ms, the imbalance in disruptive and non-disruptive discharges will not affect the model learning. The samples, however, are imbalanced since samples labeled as disruptive only occupy a low percentage. How we deal with the imbalanced samples will be discussed in “Weight calculation” section. Both training and validation set are selected randomly from earlier compaigns, while the test set is selected randomly from later compaigns, simulating real operating scenarios. For the use case of transferring across tokamaks, 10 non-disruptive and 10 disruptive discharges from EAST are randomly selected from earlier campaigns as the training set, while the test set is kept the same as the former, in order to simulate realistic operational scenarios chronologically. Given our emphasis on the flattop phase, we constructed our dataset to exclusively contain samples from this phase. Furthermore, since the number of non-disruptive samples is significantly higher than the number of disruptive samples, we exclusively utilized the disruptive samples from the disruptions and disregarded the non-disruptive samples. The split of the datasets results in a slightly worse performance compared with randomly splitting the datasets from all campaigns available. Split of datasets is shown in Table 4.
Normalization
With the database determined and established, normalization is performed to eliminate the numerical differences between diagnostics, and to map the inputs to an appropriate range to facilitate the initialization of the neural network. According to the results by J.X. Zhu et al.19, the performance of deep neural network is only weakly dependent on the normalization parameters as long as all inputs are mapped to appropriate range19. Thus the normalization process is performed independently for both tokamaks. As for the two datasets of EAST, the normalization parameters are calculated individually according to different training sets. The inputs are normalized with the z-score method, which \({X}_{{{{{\rm{norm}}}}}}=\frac{X-{{{{{\rm{mean}}}}}}(X)}{{{{{{\rm{std}}}}}}(X)}\). Theoretically, the inputs should be mapped to (0, 1) if they follow a Gaussian distribution. However, it is important to note that not all inputs necessarily follow a Gaussian distribution and therefore may not be suitable for this normalization method. Some inputs may have extreme values that could affect the normalization process. Thus, we clipped any mapped values beyond (−5, 5) to avoid outliers with extremely large values. As a result, the final range of all normalized inputs used in our analysis was between −5 and 5. A value of 5 was deemed appropriate for our model training as it is not too large to cause issues and is also large enough to effectively differentiate between outliers and normal values.
Labeling
All discharges are split into consecutive temporal sequences. A time threshold before disruption is defined for different tokamaks in Table 5 to indicate the precursor of a disruptive discharge. The “unstable” sequences of disruptive discharges are labeled as “disruptive” and other sequences from non-disruptive discharges are labeled as “non-disruptive”. To determine the time threshold, we first obtained a time span based on prior discussions and consultations with tokamak operators, who provided valuable insights into the time span within which disruptions could be reliably predicted. We then conducted a systematic scan within the time span. Our aim was to identify the constant that yielded the best overall performance in terms of disruption prediction. By iteratively testing various constants, we were able to select the optimal value that maximized the predictive accuracy of our model.
However, research has it that the time scale of the “disruptive” phase can vary depending on different disruptive paths. Labeling samples with an unfixed, precursor-related time is more scientifically accurate than using a constant. In our study, we first trained the model using “real” labels based on precursor-related times, which made the model more confident in distinguishing between disruptive and non-disruptive samples. However, we observed that the model’s performance on individual discharges decreased when compared to a model trained using constant-labeled samples, as is demonstrated in Table 6. Although the precursor-related model was still able to predict all disruptive discharges, more false alarms occurred and resulted in performance degradation. These results indicate that the model is more sensitive to unstable events and has a higher false alarm rate when using precursor-related labels. In terms of disruption prediction itself, it is always better to have more precursor-related labels. However, since the disruption predictor is designed to trigger the DMS effectively and reduce incorrectly raised alarms, it is an optimal choice to apply constant-based labels rather than precursor-relate labels in our work. As a result, we ultimately opted to use a constant to label the “disruptive” samples to strike a balance between sensitivity and false alarm rate.
Weight calculation
As not all sequences are used in disruptive discharges, and the number of non-disruptive discharges is far more than disruptive ones, the dataset is greatly imbalanced. To deal with the problem, weights for both classes are calculated and passed to the neural network to help to pay more attention to the under-represented class, the disruptive sequences. The weights for both classes are calculated as: \({W}_{{{{{\rm{disrutive}}}}}}=\frac{{N}_{{{{{\rm{disruptive}}}}}}+{N}_{{{{{\rm{non}}}}}-{disruptive}}}{{2* N}_{{{{{\rm{disruptive}}}}}}},{W}_{{{{{\rm{non}}}}}-{disruptive}}=\frac{{N}_{{{{{\rm{disruptive}}}}}}+{N}_{{{{{\rm{non}}}}}-{disruptive}}}{2* {N}_{{{{{\rm{non}}}}}-{disruptive}}}\). Scaling by \(\frac{{N}_{{{{{\rm{disruptive}}}}}}+{N}_{{{{{\rm{non}}}}}-{disruptive}}}{2}\) helps to keep the loss to a similar magnitude.
Overfitting handling
Overfitting occurs when a model is too complex and is able to fit the training data too well, but performs poorly on new, unseen data. This is often caused by the model learning noise in the training data, rather than the underlying patterns. To prevent overfitting in training the deep learning-based model due to the small size of samples from EAST, we employed several techniques. The first is using batch normalization layers. Batch normalization helps to prevent overfitting by reducing the impact of noise in the training data. By normalizing the inputs of each layer, it makes the training process more stable and less sensitive to small changes in the data. In addition, we applied dropout layers. Dropout works by randomly dropping out some neurons during training, which forces the network to learn more robust and generalizable features. L1 and L2 regularization were also applied. L1 regularization shrinks the less important features’ coefficients to zero, removing them from the model, while L2 regularization shrinks all the coefficients toward zero but does not remove any features entirely. Furthermore, we employed an early stopping strategy and a learning rate schedule. Early stopping stops training when the model’s performance on the validation dataset starts to degrade, while learning rate schedules adjust the learning rate during training so that the model can learn at a slower rate as it gets closer to convergence, which allows the model to make more precise adjustments to the weights and avoid overfitting to the training data.
The deep learning-based FFE design
Our deep learning model, or disruption predictor, is made up of a feature extractor and a classifier, as is demonstrated in Fig. 1. The feature extractor consists of ParallelConv1D layers and LSTM layers. The ParallelConv1D layers are designed to extract spatial features and temporal features with a relatively small time scale. Different temporal features with different time scales are sliced with different sampling rates and timesteps, respectively. To avoid mixing up information of different channels, a structure of parallel convolution 1D layer is taken. Different channels are fed into different parallel convolution 1D layers separately to provide individual output. The features extracted are then stacked and concatenated together with other diagnostics that do not need feature extraction on a small time scale. The concatenated features make up a feature frame. Several time-consecutive feature frames further make up a sequence and the sequence is then fed into the LSTM layers to extract features within a larger time scale. In our case, we choose Relu as our activation function for the layers. After the LSTM layers, the outputs are then fed into a classifier which consists of fully-connected layers. All layers except for the output also select Relu as the activation function. The last layer has two neurons and applies sigmoid as the activation function. Possibilities of disruption or not of each sequence are output respectively. Then the result is fed into a softmax function to output whether the slice is disruptive.
Training and transferring details
When pre-training the model on J-TEXT, 8 RTX 3090 GPUs are used to train the model in parallel and help boost the performance of hyperparameters searching. Since the samples are greatly imbalanced, class weights are calculated and applied according to the distribution of both classes. The size training set for the pre-trained model finally reaches ~125,000 samples. To avoid overfitting, and to realize a better effect for generalization, the model contains ~100,000 parameters. A learning rate schedule is also applied to further avoid the problem. The learning rate takes an exponential decay schedule, with an initial learning rate of 0.01 and a decay rate of 0.9. Adam is chosen as the optimizer of the network, and binary cross-entropy is selected as the loss function. The pre-trained model is trained for 100 epochs. For each epoch, the loss on the validation set is monitored. The model will be checkpointed at the end of the epoch in which the validation loss is evaluated as the best. When the training process is finished, the best model among all will be loaded as the pre-trained model for further evaluation.
When transferring the pre-trained model, part of the model is frozen. The frozen layers are commonly the bottom of the neural network, as they are considered to extract general features. The parameters of the frozen layers will not update during training. The rest of the layers are not frozen and are tuned with new data fed to the model. Since the size of the data is very small, the model is tuned at a much lower learning rate of 1E-4 for 10 epochs to avoid overfitting. As for replacing the layers, the rest of the layers which are not frozen are replaced with the same structure as the previous model. The weights and biases, however, are replaced with randomized initialization. The model is also tuned at a learning rate of 1E-4 for 10 epochs. As for unfreezing the frozen layers, the layers previously frozen are unfrozen, making the parameters updatable again. The model is further tuned at an even lower learning rate of 1E-5 for 10 epochs, yet the models still suffer greatly from overfitting.
Data availability
Raw data were generated at the J-TEXT and EAST facilities. Derived data are available from the corresponding author upon reasonable request.
Code availability
The computer code that was used to generate figures and analyze the data is available from the corresponding author upon reasonable request.
References
MHD, I. P. E. G. D. & Editors, I. P. B. MHD stability, operational limits and disruptions. Nucl. Fusion 39, 2251–2389 (1999).
Hender, T. C. et al. MHD stability, operational limits and disruptions. Nucl. Fusion 47, S128–S202 (2007).
Boozer, A. H. Theory of tokamak disruptions. Phys. Plasmas 19, 058101 (2012).
Schuller, F. C. Disruptions in tokamaks. Plasma Phys. Control. Fusion. 37, A135–A162 (1995).
Luo, Y. H. et al. Designing of the massive gas injection valve for the joint Texas experimental tokamak. Rev. Sci. Instrum. 85, 083504 (2014).
Li, Y. et al. Design of a shattered pellet injection system on J-TEXT tokamak. Rev. Sci. Instrum. 89, 10K116 (2018).
Sugihara, M. et al. Disruption scenarios, their mitigation and operation window in ITER. Nucl. Fusion 47, 337–352 (2007).
Aymerich, E. et al. A statistical approach for the automatic identification of the start of the chain of events leading to the disruptions at JET. Nucl. Fusion 61, 036013 (2021).
Lungaroni, M. et al. On the potential of ruled-based machine learning for disruption prediction on JET. Fusion Eng. Des. 130, 62–68 (2018).
Rattá, G. A. et al. An advanced disruption predictor for JET tested in a simulated real-time environment. Nucl. Fusion 50, 025005 (2010).
Rea, C., Montes, K. J., Erickson, K. G., Granetz, R. S. & Tinguely, R. A. A real-time machine learning-based disruption predictor in DIII-D. Nucl. Fusion 59, 096016 (2019).
Yang, Z. et al. A disruption predictor based on a 1.5-dimensional convolutional neural network in HL-2A. Nucl. Fusion 60, 016017 (2020).
Guo, B. H. et al. Disruption prediction using a full convolutional neural network on EAST. Plasma Phys. Control. Fusion. 63, 025008 (2021).
Guo, B. H. et al. Disruption prediction on EAST tokamak using a deep learning algorithm. Plasma Phys. Control. Fusion. 63, 115007 (2021).
Kates-Harbeck, J., Svyatkovskiy, A. & Tang, W. Predicting disruptive instabilities in controlled fusion plasmas through deep learning. Nature 568, 526–531 (2019).
Ferreira, D. R., Carvalho, P. J. & Fernandes, H. Deep learning for plasma tomography and disruption prediction from bolometer data. IEEE Trans. Plasma Sci. 48, 36–45 (2019).
Aymerich, E. et al. Disruption prediction at JET through deep convolutional neural networks using spatiotemporal information from plasma profiles. Nucl. Fusion 62, 66005 (2022).
Churchill, R. M., Tobias, B., Zhu, Y. & Team, D. Deep convolutional neural networks for multi-scale time-series classification and application to tokamak disruption prediction using raw, high temporal resolution diagnostic data. Phys. Plasmas 27, 62510 (2020).
Zhu, J. X. et al. Hybrid deep learning architecture for general disruption prediction across tokamaks. Nucl. Fusion 61, 049501 (2021).
Martin, E. J. & Zhu, X. W. Scenario adaptive disruption prediction study for next generation burning-plasma tokamaks. Nucl. Fusion 61, 114005–1616 (2021).
Windsor, C. G. et al. A cross-tokamak neural network disruption predictor for the JET and ASDEX Upgrade tokamaks. Nucl. Fusion 45, 337–350 (2005).
Murari, A. et al. On the transfer of adaptive predictors between different devices for both mitigation and prevention of disruptions. Nucl. Fusion 60, 56003 (2020).
Shimomura, Y., Aymar, R., Chuyanov, V., Huguet, M. & Parker, R. ITER overview. Nucl. Fusion 39, 1295–1308 (1999).
Shimada, M. et al. Overview and summary. Nucl. Fusion 47, S1–S17 (2007).
Huang, M. et al. The operation region and MHD modes on the J-TEXT tokamak. Plasma Phys. Control. Fusion. 58, 125002 (2016).
Shi, P. et al. First time observation of local current shrinkage during the MARFE behavior on the J-TEXT tokamak. Nucl. Fusion 57, 116052 (2017).
Shi, P. et al. Observation of the high-density front at the high-field-side in the J-TEXT tokamak. Plasma Phys. Control. Fusion. 63, 125010 (2021).
Wang, N., Ding, Y., Rao, B. & Li, D. A brief review on the interaction between resonant magnetic perturbation and tearing mode in J-TEXT. Rev. Mod. Plasma Phys. 6, 26 (2022).
He, Y. et al. Prevention of mode coupling by external applied resonant magnetic perturbation on the J-TEXT tokamak. Plasma Phys. Control. Fusion. 65, 65011 (2023).
Chen, D., Shen, B., Yang, F., Qian, J. & Xiao, B. Characterization of plasma current quench during disruption in EAST tokamak. Chin. Phys. B. 24, 25205 (2015).
Wang, B. et al. Establishment and assessment of plasma disruption and warning databases from EAST. Plasma Sci. Technol. 18, 1162–1168 (2016).
Chen, D. L. et al. Disruption mitigation with high-pressure helium gas injection on EAST tokamak. Nucl. Fusion 58, 36003 (2018).
Zhang, C. et al. Plasma-facing components damage and its effects on plasma performance in EAST tokamak. Fusion Eng. Des. 156, 111616–493 (2020).
Voulodimos, A., Doulamis, N., Doulamis, A. & Protopapadakis, E. Deep learning for computer vision: a brief review. Comput. Intell. Neurosci. 2018, 1–13 (2018).
Young, T., Hazarika, D., Poria, S. & Cambria, E. Recent trends in deep learning based natural language processing. IEEE Comput. Intell. Mag. 13, 55–75 (2018).
Ben-David, S., Blitzer, J., Crammer, K. & Pereira, F. Analysis of representations for domain adaptation. Adv. Neural Inf. Process Syst. 19, 151–175 (2006).
Ding, Y. et al. Overview of the J-TEXT progress on RMP and disruption physics. Plasma Sci. Technol. 20, 125101 (2018).
Liang, Y. et al. Overview of the recent experimental research on the J-TEXT tokamak. Nucl. Fusion 59, 112016 (2019).
Liu, Z. X. et al. Experimental observation and simulation analysis of the relationship between the fishbone and ITB formation on EAST tokamak. Nucl. Fusion 60, 122001 (2020).
Blum, J. & Le Foll, J. Plasma equilibrium evolution at the resistive diffusion timescale. Comp. Phys. Rep. 1, 465–494 (1984).
Chen, Z. Y. et al. The behavior of runaway current in massive gas injection fast shutdown plasmas in J-TEXT. Nucl. Fusion 56, 112013 (2016).
Shen, C. et al. Investigation of the eddy current effect on the high frequency response of the Mirnov probe on J-TEXT. Rev. Sci. Instrum. 90, 123506 (2019).
Wang, C. et al. Disruption prevention using rotating resonant magnetic perturbation on J-TEXT. Nucl. Fusion 60, 102992 (2020).
Shen, C. et al. IDP-PGFE: an interpretable disruption predictor based on physics-guided feature extraction. Nucl. Fusion 63, 46024 (2023).
Shen, Z., Liu, Z., Qin, J., Savvides, M. & Cheng, K. Partial is better than all: revisiting fine-tuning strategy for few-shot learning. Proc. AAAI Conf. Artif. Intell. 2021, 9594–9602 (2021).
Rajpurkar, P., Chen, E., Banerjee, O. & Topol, E. J. AI in health and medicine. Nat. Med. 28, 31–38 (2022).
Singh, J., Hanson, J., Paliwal, K. & Zhou, Y. RNA secondary structure prediction using an ensemble of two-dimensional deep neural networks and transfer learning. Nat. Commun. 10, 5407 (2019).
De Vries, P. C. et al. Requirements for triggering the ITER disruption mitigation system. Fusion Sci. Technol. 69, 471–484 (2016).
Caruana, R. Multitask learning. Mach. Learn. 28, 41–75 (1997).
Gao, X. et al. Experimental progress of hybrid operational scenario on EAST tokamak. Nucl. Fusion 60, 102001 (2020).
Zhang, M., Wu, Q., Zheng, W., Shang, Y. & Wang, Y. A database for developing machine learning based disruption predictors. Fusion Eng. Des. 160, 111981 (2020).
Zheng, W. et al. Overview of machine learning applications in fusion plasma experiments on J-TEXT tokamak. Plasma Sci. Technol. 24, 124003 (2022).
Acknowledgements
The authors are very grateful for the help of J-TEXT team and the EAST team. This work is supported by the National Key R&D Program of China (no. 2022YFE03040004) and the National Natural Science Foundation of China (no. 51821005).
Author information
Authors and Affiliations
Contributions
W.Z. and F.X. conceived the method, the design of the model, as well as the experiments and carried them out, and co-wrote the paper; D.C., B.G., B.S., and B.X. helped with accessing and using the EAST database; C.S. and X.A. helped with accessing J-TEXT database and J-TEXT feature extraction; W.Z. and D.C. offered computational resources; Z.C., Y.D., and C.S. contributed to the initial discussions and provided feedback on the manuscript; Y.P, N.W, M.Z., Z.C., and Z.Y. provided general guidance during the research process.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Communications Physics thanks Cristina Rea, Alessandro Pau, and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. A peer review file is available.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Zheng, W., Xue, F., Chen, Z. et al. Disruption prediction for future tokamaks using parameter-based transfer learning. Commun Phys 6, 181 (2023). https://doi.org/10.1038/s42005-023-01296-9
Received:
Accepted:
Published:
Version of record:
DOI: https://doi.org/10.1038/s42005-023-01296-9
This article is cited by
-
A transfer-based decision-making method based on expert risk attitude and reliability
Applied Intelligence (2025)








