Abstract
Impact sensitivity is a critical property of energetic molecules, indicating their tendency to react when subjected to mechanical stimuli such as impact. Nitro compounds are widely used as explosives across industrial, military, and civilian applications, making their safe handling a significant concern for engineers and scientists working with these materials. Predicting whether a molecule has the potential to pose safety risks is therefore of great importance. This study aimed to develop a QSPR model for predicting the impact sensitivity of 404 nitro compounds using the Monte Carlo algorithm implemented in CORAL-2023 software. The Simplified Molecular Input Line Entry System (SMILES) was employed to represent the molecular structures, while correlation weight descriptors were computed. Four target functions (TF0, TF1, TF2, and TF3) were used to generate robust models. The first model applied Monte Carlo optimization without the inclusion of IIC (information index of correlation) or CII (correlation index of information); the second incorporated IIC; the third incorporated CII; and the fourth applied both IIC and CII. Comparative statistical analyses indicated that the model integrating both IIC and CII demonstrated superior predictive performance, with the best results observed in split 2 (R2Validation = 0.7821, IICValidation = 0.6529, CIIValidation=0.8766, Q2Validation = 0.7715, and \(\:{r}_{m}^{2}\) = 0.7464).
Similar content being viewed by others
Introduction
The sensitivity of energetic molecules is a critical parameter for the explosives industry. These materials may detonate under various stimuli, including shock, impact, electric sparks, friction, and heat, which can occur during their synthesis, storage, handling, or application. Among the different types of sensitivity, impact sensitivity is particularly significant, as it has been implicated in numerous workplace accidents1,2,3,4. Therefore, determining the impact sensitivity of energetic materials is essential for ensuring safety.
Experimentally, impact sensitivity is commonly assessed using the drop-weight impact test. In this method, milligram quantities of the energetic material are placed between a flat steel anvil and a steel striker. A 2.5 kg weight is dropped from a specified height (in centimeters) onto the striker plate, and the occurrence or absence of an explosion is recorded. The height at which there is a 50% probability of an explosion is denoted as H505;6. However, the reproducibility of these experimental measurements is often affected by factors such as sample size, morphology, variations in test apparatus, and differing test conditions7,8,9,10. Moreover, experimental determination of H50 is time-consuming, and it is not feasible for compounds that have not yet been synthesized. Consequently, there is growing interest in theoretical methods to establish quantitative relationships between molecular structure and the impact sensitivity of energetic molecules11.
Quantitative structure–property relationships (QSPR), along with related approaches such as QSAR (quantitative structure–activity relationships) and QSTR (quantitative structure–toxicity relationships), use mathematical equations to predict a desired property (e.g., property, activity, or toxicity) based on molecular descriptors derived from structural features2;6,11;12. These models allow chemists to estimate properties of newly designed or undeveloped compounds13,14. In recent years, the CORAL (CORrelation And Logic: http://www.insilico.eu/coral/). software, which leverages the Monte Carlo algorithm, has emerged as a powerful tool for building QSPR models using Simplified Molecular Input Line Entry System (SMILES) notations15,16,17. Through Monte Carlo optimization, the software extracts optimal descriptors of correlation weight (DCW) from the SMILES representation of chemical structures, and these descriptors are used to calculate the target function18,19,20. Literature indicates that the predictive performance of QSPR and QSAR models can be significantly enhanced through two statistical benchmarks: the index of ideality of correlation (IIC) and the correlation intensity index (CII). These metrics improve the models’ ability to account for both the correlation coefficient and the residual values of the test molecules’ endpoints21,22,23.
The primary objective of this study is to develop robust QSPR models to estimate the impact sensitivity of 404 nitro energetic molecules using the Monte Carlo algorithm implemented in the CORAL program. Four different approaches are employed: the balance of correlation method without IIC or CII, the balance of correlation method with IIC, the balance of correlation method with CII, and the balance of correlation method incorporating both IIC and CII. The performance of these models is compared using four randomly generated dataset subsets: active training, passive training, calibration, and validation sets. To enhance the mechanical interpretation, correlation weights are used to identify structural features associated with increased or decreased impact sensitivity.
Methods
Data
To establish predictive QSPR models, a dataset comprising energetic compounds with known impact sensitivity (H50) values was compiled from previously published literature sources10,24,25. After excluding duplicate compounds and those with ambiguous molecular structures, a total of 404 unique compounds were retained. The molecular structures of these compounds were drawn using the free software BIOVIA Draw and converted into Simplified Molecular Input Line Entry System (SMILES) notations for computational analysis.
The impact sensitivity values, originally expressed as H50 (in cm), were transformed into their corresponding logarithmic scale (log H50), which served as the endpoint for QSPR modeling. The log H50 values ranged from 0.699 to 2.690, encompassing a broad spectrum of sensitivity characteristics. Comprehensive details, including compound IDs, SMILES notations, observed and predicted log H50 values, and information regarding the applicability domain of the constructed model (using the TF3 target function for four splits), are provided in the supplementary information (Table S1).
The dataset was divided into four splits to facilitate robust model development and evaluation. Each split was randomly partitioned into four subsets: active training, passive training, calibration, and validation sets. Table 1 demonstrates that the four splits employed in this study are non-identical, and the percent identity among the splits was calculated as follows:
where \(\:{N}_{i,\:j}\) denote the number of elements that are assigned to the same set in both the \(\:{i}^{th}\)and \(\:{j}^{th}\) splits (i.e., the set can be active training, passive training, calibration, or validation). Similarly, \(\:{N}_{i}\) represents the number of elements assigned to a particular set in the \(\:{i}^{th}\) split, while \(\:{N}_{\:j}\) represents the number of elements assigned to the same set in the \(\:{j}^{th}\) split.
The roles and tasks of each set in QSPR modeling have been thoroughly described in previous studies26,27,28.
Optimal descriptor
The hybrid optimal descriptors, DCW (T, N), were utilized to construct the QSPR model for energetic compounds. These descriptors are derived by integrating molecular attributes obtained from SMILES notations and molecular graphs. By combining these two types of molecular representations, this hybrid approach improves the statistical quality of the model compared to those based exclusively on SMILES or molecular graphs29.
The DCW of the hybrid optimal descriptors is calculated using the following mathematical function:
The following mathematical equations are used to calculate the Descriptors of Correlation Weight (DCW) for both SMILES and Hierarchical Structural Graph (HSG) representations.
The notations employed in Eqs. (3) and (4) are explained in detail in Table 2. Based on the numerical values of the correlation weights for the descriptors, the predictive model for logH50 can be formulated as follows:
where C0 and C1 are regression coefficients, and T and N represent parameters of the Monte Carlo optimization procedure. In this optimization process, the numerical values of T and N are determined to achieve the optimal statistical criteria for the calibration set, denoted as T∗ and N∗ 16,30,31.
Monte Carlo optimization
Equations (3) and (4) require the numerical values of the correlation weights (CW). Monte Carlo optimization is employed to calculate these numerical values using the CORAL software. In this study, to develop robust QSPR models, four target functions for the Monte Carlo optimization are evaluated: TF0 (correlation balance without CII or IIC), TF1 (correlation balance modified with IIC), TF2 (correlation balance modified with CII), and TF3 (correlation balance modified with both IIC and CII). The statistical quality of each target function is then compared. The mathematical expressions for the four target functions are defined as follows:
Where rATRN represents the correlation coefficient between the predicted and observed logH50 values for the active training set, rPTRN represents the correlation coefficient for the passive training set. The constants (empirical constant), WIIC, and WCII generally have fixed values; specifically, these values are 0.1, 0.5, and 0.3, respectively. The flow chart of the optimization of the correlation weight with target function (TF) is cited in the literature32.
A literature survey indicates that the inclusion of IIC and CII enhances the statistical quality of the validation set33,34,35. Data from the calibration set are used to calculate IIC, and the mathematical equations for calculating IIC and CII are well-established in the literature21,33,36.
A flow diagram representation of the QSPR model established and methodology cited in the literature37 and depicted in Fig. 1.
Results and discussion
In the present manuscript, the CORAL program was used to apply various methods for generating 16 QSPR models. These methods include: the balance of correlation approach without IIC or CII (TF0, Eq. 6), the balance of correlation approach with IIC (TF1, Eq. 7), the balance of correlation approach with CII (TF2, Eq. 8), and the balance of correlation approach with both IIC and CII (TF3, Eq. 9). The statistical outcomes of each method are then compared.
Comparison of target methods
In this study, to achieve reliable statistical results, four models were generated for each method using hybrid optimal descriptors derived from SMILES and HSG features. The statistical characteristics of the various QSPR models, constructed with different target functions, are presented in Table 3. Although all the established QSPR models meet the standard metrics and show acceptable statistical results, the models generated using the target function TF3 outperform those based on TF0, TF1, and TF2. The superior predictive performance of the models using TF3 can be attributed to the combined effect of IIC and CII on the system38,39,40. Thus, TF3, computed with Eq. (9), is recognized as the best target function. The following QSPR model equations were obtained using Monte Carlo optimization with TF3 for logH50:
The robustness of a QSPR model is assessed using various validation metrics. Among these, classical metrics such as Q2 and R2 are commonly employed to evaluate the predictive potential of the model. A model is considered acceptable when the numerical values of R2 and Q2 exceed 0.5 and 0.7, respectively41,42. Roy and colleagues43,44,45 proposed that \(\:{r}_{m}^{2}\) may serve as a stricter metric than Q2 and R2. If the value of \(\:{r}_{m}^{2}\) exceeds 0.5, the model can be interpreted as reliable. Additionally, other crucial statistical criteria for assessing various QSPR models include the average (\(\:\stackrel{-}{{r}_{m}^{2}}\)) and ∆r_m2 for the validation set, where (\(\:\stackrel{-}{{r}_{m}^{2}})\:\)should be greater than 0.5 and \(\:\varDelta\:{r}_{m}^{2}\) should be smaller than 0.2. In this study, these metrics, along with others, were utilized for model validation.
Table 3 shows that all four splits created using the target function TF3 meet the statistical quality criteria and fulfill the standards reported in the literature42,43. Among the four models formulated with TF3, Split 2 was identified as the best, as it exhibited the highest R2 and \(\:{r}_{m}^{2}\) values for the validation set (R2 = 0.7821 and \(\:{r}_{m}^{2}\) = 0.7464).
Also, to compare these four models, the Akaike information criteria (AIC) values were calculated and compared. This parameter assessment the prediction error of the models generated from a single dataset and consequently, judges the relative efficiency of the models46,47. It has been reported that among several models built from a data set, the model with the lowest AIC value has higher relative efficiency. Herein, to confirm superiority Split 2, the AIC criterion was computed for all four models formulated with TF3. It has been observed in each of the splits that the AICs of models are as follows: split 1= -325.95, split 2=-326.073, split 3= -323.029, split 4= -308.386, which demonstrates that in terms of performance, split 2 has a relatively better efficiency.
Figure 2 compares the numerical values of R2 and \(\:{r}_{m}^{2}\) for the validation set across the four splits designed using the four target functions. Figure 3 shows the plot of observed logH50 versus calculated logH50 for the six models developed with TF3. The calculated logH50 values exhibit a strong correlation with their observed counterparts for all models constructed using TF3. Additionally, Fig. 4 illustrates the residuals of logH50 versus experimental logH50 for the four models. As shown in Fig. 4, the residuals of logH50 are well-distributed around the horizontal line centered at zero. All these results indicate that the generated QSPR models are robust and well-fitted.
Model validation
To ensure the reliability, predictability, and quality of the constructed models, three validation strategies were employed: (1) internal validation or cross-validation using the training set data, (2) external validation using the test set data, and (3) Y-scrambling or data randomization. The training set was evaluated using various statistical criteria, such as R2, Q2LOO, MAEtrain, and others. The robustness of the validation set was assessed using external statistical parameters, including R2pred or Q2F1, Q2F2, Q2F3, r2mtest (i.e. \(\:\stackrel{-}{{r}_{m}^{2}}\) and \(\:\varDelta\:{r}_{m}^{2}\) ), MAEpred, and the concordance correlation coefficient (CCC), among others. To ascertain statistical significance and eliminate the possibility of random chance influencing the results, a Y-randomization test was performed on the generated models. Additionally, IIC and CII were applied to further validate the reliability and robustness of the QSPR models48. The mathematical equations for these statistical metrics are well-documented in the literature49,50,51.
Applicability domain
The applicability domain (AD) of a QSAR/QSPR model refers to the physicochemical, structural, or biological space within which the training set of the model has been developed. The resulting model is considered reliably applicable only to those molecules that fall within this domain52. It is essential to verify the AD of a QSAR/QSPR model to ensure its predictive reliability. The AD represents a theoretical region within the chemical space, defined using the model’s descriptors and responses, where reliable predictions can be made.
In this study, the AD of both the training and test sets was assessed to determine whether any compounds lie outside the defined AD.
For the Monte Carlo-based QSPR/QSAR models constructed using the CORAL program, the distribution of SMILES or HSG attributes in the active training, passive training, and calibration sets is used to define the AD. A substance is considered an outlier if it falls outside the scope of the AD.
In CORAL, the AD of a constructed model is characterized using the statistical defect of the SMILES attribute (Sk). The defects of the SMILES attribute (DSk) can be computed as follows:
PATRN(Sk), PPTRN(Sk) and PCAL(Sk) represent the probabilities of attributes in the active training set, passive training set, and calibration set, respectively. NATRN(Sk), NPTRN(Sk), and NCAL(Sk), are the frequencies of attributes in the active training set, passive training set, and calibration set, respectively.
The SMILES statistical defect (D) is calculated as the sum of the statistical defects of all attributes.
NA is the number of active SMILES attributes for the given compounds.
In CORAL, a SMILES is an outlier if:
\(\:{\stackrel{-}{\text{D}\text{e}\text{f}\text{e}\text{c}\text{t}}}_{\text{A}\text{T}\text{R}\text{N}}\) D represents the average statistical defect for the dataset of the active training set. While a statistical defect does not necessarily indicate that a compound is an outlier, it highlights concerns regarding the representativeness of the structural elements of the compound. Specifically, a large statistical defect suggests that a significant portion of the compound’s structure is not adequately supported by the correlation weights.
However, various logH50 models constructed with TF3 exhibited different degrees of applicability domain (AD), with coverage ranging from 91 to 93% of the hypothetical space of compounds. Split 1 demonstrated the largest domain, covering 372 compounds out of 404, while Split 3 showed the smallest domain, covering 362 out of 404 compounds. In each model, AD varies with changes in the descriptors, making it model-specific. Nevertheless, if a compound lies outside the AD range of one split, it will typically fall within the AD range of another split (see supporting information). Consequently, all compounds were included in the model development process.
Mechanistic interpretation
In CORAL, the correlation weights (CWs) of structural attributes (Sk) are calculated across three Monte Carlo runs, followed by a mechanistic interpretation based on the numerical statistics of these CWs. Descriptors that either increase or decrease the endpoint (logH50) are identified consistently across all runs. If the CW(Sk) is positive in every Monte Carlo optimization run, the attribute is classified as an enhancing promoter, whereas a consistently negative CW(Sk) indicates a reducing promoter. Attributes with CWs that alternate between positive and negative across runs are considered inconsistent and are excluded as valuable descriptors. The common descriptors that have consistently exhibited either an increasing or decreasing trend across multiple splits are summarized in Table 4.
The graph-based descriptors associated with increased logH50 include the presence of fluorine and chlorine [(F.Cl).Y.Y.], the presence of nitrogen, oxygen, and absence of sulfur, and phosphorus [NOSP11000000], the absence of symmetry (xyzyx) ([xyzyx0]….), and the presence of aromatic carbon, with or without branching [c…1……., c…c…(…, c………., 1…c…(…, c…(…….].
Compounds 261, 264, and 280, which incorporate chlorine atoms (Table 5), demonstrate elevated logH50 values. This observation indicates that the presence of chlorine atoms contributes to reduced impact sensitivity. Similarly, the high logH50 value of compound 136 (Table 5), containing fluorine, further suggests that the incorporation of fluorine atoms also leads to decreased impact sensitivity. These four compounds were the only ones in the dataset that contained fluorine and chlorine. The impact of these descriptors on logH50 values will be further examined in detail, drawing on evidence from relevant literature and documented studies. This comprehensive analysis will help elucidate the relationship between molecular structure and the sensitivity characteristics of explosive compounds, contributing to a deeper understanding of their behavior under mechanical impact.
The influence of halogens on the structure of explosive compounds and their effect on logH50 values is a critical area of study. Halogens, due to their electronegativity and the ability to form strong bonds with carbon, significantly impact the thermal stability and sensitivity of energetic materials. For instance, the incorporation of halogens such as chlorine or fluorine into the molecular structure can reduce impact sensitivity, as reflected by higher logH50 values. This effect arises from the halogen atoms’ ability to enhance the intermolecular interactions and increase the overall bond dissociation energy of the compound, thereby stabilizing the molecular framework. Moreover, halogens may alter the decomposition pathways of explosives, influencing the release of energy during detonation53.
The presence of nitrogen (N) and oxygen (O) in the molecular structure of explosives can have significant effects on impact sensitivity (logH50), stability, and explosive performance. These elements are commonly found in explosive compounds in the form of various groups such as nitro groups (-NO₂) and nitrogen groups (e.g., -NH₂), which can impart different characteristics to the explosive materials. Below are the primary effects of N and O presence in explosives:
Nitro groups are well-known and widely used in explosives, and they have a strong impact on explosive properties and impact sensitivity. Nitro groups generally increase impact sensitivity due to their high electronegativity. These groups weaken the bonds in the molecular structure, requiring less energy for detonation. As a result, materials containing nitro groups tend to have lower H50 values and are more easily affected by impact. Additionally, the presence of nitrogen and oxygen in nitro groups reduces the overall stability of the explosive, making it more prone to decomposition when subjected to impact or heat54.
Nitrogen is often found in combination with other groups, especially in aromatic compounds such as 2,4,6-trinitrobenzene-1,3,5-triamine (TATB) or HMX (high-melting explosive). The presence of nitrogen in these compounds can have varying effects: In some materials, like TATB, the presence of amino groups (NH₂) forms hydrogen bonds that help reduce impact sensitivity, resulting in an increase in logH50. These hydrogen bonds contribute to structural stability, making the material less sensitive to external forces. Amino-based explosives may demonstrate lower impact sensitivity because nitrogen-hydrogen and nitrogen-oxygen bonds enhance stability55. Table 5 presents examples of aromatic compounds containing an amino group that exhibit high logH50 values (compounds no. 223, 341, and 332).
The presence of aromatic carbon and the characteristics of aromaticity in the molecular structure can have significant impacts on the sensitivity of explosives, including logH50. Generally, the aromatic structure, due to its unique features, can increase stability and reduce impact sensitivity. Aromatic structures, containing π bonds and six-membered carbon rings, are typically more stable. These π bonds in aromatic molecules can help distribute and stabilize energy across the molecule, which may lead to reduced sensitivity to impact and an increase in logH50, as the explosive material would require more energy to break its stable structure. Furthermore, aromatic molecules are more resistant to impact due to the high stability of the carbon-carbon bonds in the aromatic rings. These bonds are usually stronger than non-aromatic intramolecular bonds, requiring more energy (i.e., a stronger impact) to initiate an explosion. This directly contributes to the increase in logH50 53.
The SMILES-based descriptors associated with decreased logH50 include presence of double bond with branching [=…(…….], presence of double bond [=……….], presence of double bond and oxygen with branching [ =…O…(…], presence of oxygen and double bond with branching [O…=…(…], and presence of oxygen with double bond [O…=…….]. The graph-based descriptors associated with decreased logH50 include the Nearest Neighbor Codes for Oxygen (NNC-O) equal to 101 [NNC-O…101.], and the presence of the path of length 2 equal to 2 for oxygen atom (PT2-O…2…).
The presence of double bonds can significantly impact the impact sensitivity (H50) of explosive materials, often leading to an increase in sensitivity and a decrease in H50. Double bonds are typically more reactive under mechanical impact due to their relatively weaker bond strength compared to single bonds, making them more susceptible to rupture under pressure or impact. As a result, explosives with double bonds may more readily undergo reactions or detonation with minimal external force. Furthermore, the electronic structure of double bonds can create regions of high electron density or free electron areas, which may serve as vulnerable points in the molecular structure, increasing the likelihood of detonation under impact. Additionally, double bonds often alter the electronic density distribution within a molecule, influencing electrostatic interactions both within and between molecules. These changes can reduce stability and resistance to mechanical stimuli, further contributing to lower logH50 values and higher impact sensitivity8.
The simultaneous presence of oxygen and double bonds in explosive materials can have a complex and multifaceted effect on impact sensitivity (logH50), as each of these features independently influences the molecular structure and reactivity of the material. Oxygen, particularly in nitro groups, can enhance explosive potential by increasing the production of hot gases and high-pressure during detonation reactions. Double bonds, due to their electronic structure, create vulnerable sites in the molecule, such as π bonds, which are more susceptible to rupture under impact or pressure. This combination can decrease the overall stability of the explosive and increase its sensitivity to impact, thereby lowering logH50 values. Furthermore, the presence of both oxygen and double bonds can alter the electronic density distribution within the molecule, potentially weakening the molecular structure and increasing the likelihood of detonation with less external force. As a result, the synergistic effect of oxygen and double bonds can increase explosive reactivity, reduce molecular stability, and lead to a decrease in logH50, thereby enhancing impact sensitivity56. Table 5 presents examples of aliphatic compounds containing an oxygen double bond that exhibit low logH50 values (compounds no. 110, 111, 109, and 33).
Comparison with earlier reported models
In previous studies, several QSPR models have been proposed to predict impact sensitivity. The predictive capabilities of the QSPR models discussed here have been compared with those of other reported models, and some statistical properties of the studied models are summarized in Table 6. Kamlet7,57 developed models based on linear correlations between log50 (H50%) and oxygen balance (OB) for 78 nitro compounds. While his models showed strong correlations, they lacked external validation. Nefati et al.58, Cho et al.25, and Jun et al.59 used artificial neural networks (ANNs) to create prediction models for impact sensitivity for 204, 234, and 33 energetic molecules, respectively. Keshavarz and colleagues4,10,60,61 published multiple multilinear regression (MLR) and ANN models to predict the impact sensitivity of 291, 58, 69, and 46 nitro compounds across independent studies. Although most of the mentioned prediction models exhibited significant correlations, their predictive performance was not validated on an external dataset.
However, there are only a few models that have been validated on an external dataset. In two independent studies, Wang et al.3,62 developed various models using different approaches, including multiple linear regression (MLR), partial least squares (PLS), and artificial neural networks (ANN), for 186 compounds, and backpropagation neural network (BPNN), MLR, and PLS for 156 compounds. For QSPR modeling of 186 non-heterocyclic nitroenergetic compounds, they employed a genetic algorithm-PLS method to select nine optimal descriptors. They then applied a nonlinear ANN method to model the relationship between the selected descriptors and impact sensitivity. The results obtained from the ANN method were superior to those from the PLS and MLR methods (Table 4).
For QSPR modeling of 156 nitroenergetic compounds, the entire dataset was randomly divided into a training set (64 compounds), a validation set (63 compounds), and a prediction set (29 compounds). Electrotopological-state indices (ETSI) were used as molecular structure descriptors to predict the impact sensitivity of the compounds. Although their BPNN models demonstrated high predictive potential, a lack of robustness was observed, likely due to over-parameterization, as the models relied on up to 16 descriptors. Additionally, the applicability domains of the models were not considered, which limits their practical use.
Nefati et al.58 developed a model using thirteen molecular descriptors to predict the impact sensitivity of 204 explosive molecules via ANN. However, their model was not validated using an external dataset, which leaves unclear the prediction capability of the model for new molecules not included in the training process. Furthermore, it was not determined which specific property correlates most strongly with impact sensitivity.
Xu et al.11, using 3D descriptors, developed a QSPR model to predict the impact sensitivity of 156 nitro compounds previously introduced by Wang et al.3. The entire dataset of 156 compounds was divided into a training set (127 compounds) and a test set (29 compounds). Both MLR and ANN were employed to select descriptors and build the models. The MLR-based model achieved an R2 of 0.7222 and an s of 0.177 for the test set, while the ANN-based model showed an R2 of 0.8656 and an s of 0.130. Based on these results, they concluded that the ANN model was more reliable than the linear MLR model.
Morrill and Byrd63 developed a QSPR model based on eight quantum chemical descriptors for 227 nitroenergetic compounds using the best multilinear regression (BMLR) method. After removing seven outliers, the model’s R2 value was 0.8141. However, despite having a large number of compounds, only eight compounds were used as a test set in their model, which raises concerns about the model’s generalizability and reliability.
In 2012, Fayet et al.2 proposed a global QSPR model to predict the impact sensitivity of a dataset consisting of 161 nitro compounds. They used multilinear regression to establish correlations between chemical quantum descriptors and experimental impact sensitivities. The dataset was divided into a training set of 108 nitro compounds, with an R2 of 0.750 and RMS of 0.19, and a test set of 53 nitro compounds, with an R2 of 0.759 and RMS of 0.230. The model was developed using the multilinear regression (BMLR) method based on 15 quantum chemical descriptors.
In 2014, Fayet et al.12 expanded their work by developing a series of QSPR models for 60 nitramines using topological and constitutional descriptors to predict their impact sensitivity. To predict the log h50% endpoint, they applied 17 different training and validation set partitions from the dataset, using two different ratios of compounds (40/20 or 45/15) and two division approaches (based on property ranking or random division). They focused on models derived from partitions 2, 6, 9, and 13, as these exhibited the best performance among the 17 partitions studied. The BMLR model from partition 9 was selected as the best model due to its highest predictive capabilities (R2 Ext = 0.90 and RMS = 0.14). All of Fayet’s models were constructed in accordance with the five OECD principles for the validation of QSPR models intended for regulatory use.
In 2021, Deng Q et al.64 constructed six Machine learning (ML) models to probe the correlation between the impact sensitivity and the molecular structure of 240 nitroaromatic compounds using the ANN and sure independence screening and sparsifying operator (SISSO). The R2 and RMSE their ANN model for the test set were 0.95 and 0.92 respectively while the R2 and RMSE of SISSO model were 0.85 and 1.07 respectively, which is indicate that ANN significantly outperforms SISSO for the impact sensitivity model.
In 2013, Mathieu6 basis on the of single descriptor, built a QSPR model to predict the impact sensitivity of 156 nitro molecules which is used in Wang’s et al. models. The R2 and RMS for an external test set of the model were 0.721 and 0.265.
In 2014, Mathieu and Alaime65 using the same database of 156 nitro compounds suggested a physically based quantitative model for impact sensitivities with a satisfactory reliability (R2 ≃ 0.8). Later, in 2015 66, they proposed a model to estimate impact sensitivities of 308 energetic compounds and it yields values of 0.6934 and 0.248 for the R2 and SE computed for a test set of 74 compounds. In 2016, they followed their research with suggestion a semi empirical model based on physical assumptions and compared it with Fayet’s et al.12 and Prana’s et al.67 models. For construction their model, 129 compounds consist of 50 C-NO2 compounds studied by Prana et al.67 60 N-NO2 compounds studied by Fayet et al.12. and 19 compounds studied by Mathieu and Alaime66 considered. These data splitted into training and test sets and used 74 compounds in the training set (34 C − NO2 and 40 N − NO2 compounds). The R2 and RMS for test set of the model were 0.71 and 0.22. Finally, in 2017, Mathieu24 investigated the correlation between detonation performance and the sensitivity using the regression analysis, and explained there is a significant correlation analysis with R2 close to 0.8 for 156 aliphatic compounds.
In 2021, Deng Q. et al.64 developed six machine learning (ML) models to explore the relationship between impact sensitivity and the molecular structure of 240 nitroaromatic compounds, employing ANN and the Sure Independence Screening and Sparsifying Operator (SISSO). The ANN model showed excellent predictive performance, achieving an R2 of 0.95 and an RMSE of 0.92 for the test set, while the SISSO model had an R2 of 0.85 and an RMSE of 1.07. These results suggest that the ANN model significantly outperforms the SISSO model for predicting impact sensitivity.
In 2013, Mathieu6 developed a QSPR model based on a single descriptor to predict the impact sensitivity of 156 nitro molecules, which were also included in Wang et al.‘s study. The model yielded an R2 of 0.721 and an RMS of 0.265 for an external test set.
In 2014, Mathieu and Alaime65 used the same dataset of 156 nitro compounds to propose a physically-based quantitative model for predicting impact sensitivities, which demonstrated satisfactory reliability (R2 ≈ 0.8). The following year, in 2015 66, they extended their research by developing a model to estimate the impact sensitivities of 308 energetic compounds, which produced an R2 of 0.6934 and a standard error (SE) of 0.248 for a test set of 74 compounds.
In 2016, they furthered their work by proposing a semi-empirical model based on physical principles and comparing it with the models of Fayet et al.12 and Prana et al.67. For their model, they utilized a dataset of 129 compounds, including 50 C-NO₂ compounds studied by Prana et al.67, 60 N-NO₂ compounds studied by Fayet et al.12, and 19 compounds from their own previous study66. The data were divided into training and test sets, with 74 compounds in the training set (34 C − NO₂ and 40 N − NO₂ compounds). This model achieved an R2 of 0.71 and an RMS of 0.22 for the test set.
Finally, in 2017, Mathieu24 conducted a study to explore the correlation between detonation performance and sensitivity using regression analysis. His findings revealed a significant correlation, with an R2 close to 0.8, for 156 aliphatic compounds.
To the best of our knowledge, only two studies12,67 have reported OECD-compliant predictive models for impact sensitivity of nitro compounds. In contrast, all the models developed in the present study were constructed in accordance with all OECD principles for QSPR models intended for regulatory use. These principles include a defined endpoint, an unambiguous algorithm, a measure of predictive potential, a defined applicability domain, and a mechanistic interpretation.
The previously reported models were developed using smaller datasets (n < 308), whereas the current study uses the largest dataset to date, comprising 404 nitro compounds. Utilizing a larger dataset helps mitigate the occurrence of high R2 values, making R2 less reliable as a comparison metric. In earlier studies, QSPR models were typically built using two sets (training and test sets), except for the models by Wang et al., where the dataset was randomly divided into a training set, validation set, and prediction set. In contrast, the QSPR models in the present study were constructed using four sets: training, invisible training, calibration, and validation sets.
Another notable distinction is that none of the previously reported models provided mechanistic interpretations, whereas the models presented here offer mechanistic insights. Specifically, we have identified the key structural attributes responsible for the increase or decrease of logH50, based on the built models.
By utilizing a larger training dataset, the study has achieved acceptable statistical quality for the test set. The robustness and reliability of the constructed QSPR models have been validated using a comprehensive set of robust criteria, including R2, Q2, Q2F1, Q2F2, Q2F3, CCC, CII, IIC, CR2p, RMSE, MAE, \(\:{\varDelta\:r}_{m}^{2}\), \(\:\stackrel{-}{{r}_{m}^{2}}\), F, and Y-test. Consequently, based on these validation criteria, it can be concluded that the models generated in this work are statistically more significant and reliable when compared to the previously reported models.
Conclusion
This study presents the development of global and reliable QSPR models to predict the impact sensitivity of 404 nitro compounds using the Monte Carlo optimization procedure within the CORAL program. The models were established and evaluated according to the OECD principles for QSPR/QSAR acceptance in regulatory settings, demonstrating high internal and external predictivity and a broad applicability domain. A total of four splits were created, resulting in the establishment of 16 QSPR models. For each split, four target functions, TF0 (without IIC or CII), TF1 (with IIC), TF2 (with CII), and TF3 (with both IIC and CII), were employed. Among all the models, those generated using TF3 showed the most robustness and superior predictability. Based on the statistical quality of the model established for split 2 using TF3, it was selected as the best-performing model.
The results highlighted that the simultaneous use of IIC and CII significantly enhances the reliability of the models. Additionally, the structural attributes responsible for the increase or decrease of log H50 in nitro compounds were extracted. In conclusion, this study develops a simple, time- and cost-efficient, and highly effective in silico approach for predicting the impact sensitivity of energetic compounds.
Data availability
The authors declare that the data supporting the findings of this study are available within the paper and its Supplementary Information files. Should any raw data files be needed in another format they are available from the corresponding author upon reasonable request.
References
Fayet, G., Rotureau, P., Prana, V. & Adamo, C. Global and local quantitative structure–property relationship models to predict the impact sensitivity of nitro compounds. Process Saf. Prog. 31(3), 291–303 (2012).
Global and Local Qspr Models To Predict the Impact Sensitivity of Nitro Compounds. Glob Congr Process Saf 2012—Top Conf (2012).
Wang, R., Jiang, J., Pan, Y., Cao, H. & Cui, Y. Prediction of impact sensitivity of nitro energetic compounds by neural network based on electrotopological-state indices. J. Hazard. Mater. 166(1), 155–186 (2009).
Keshavarz, M. H. Prediction of impact sensitivity of nitroaliphatic, nitroaliphatic containing other functional groups and nitrate explosives. J. Hazard. Mater. 148(3), 648–652 (2007).
Coffey, C. S. & De Vost, V. Impact testing of explosives and propellants. Propellants Explos. Pyrotech. 20(3), 105–115 (1995).
Mathieu, D. Toward a physically based quantitative modeling of impact sensitivities. J. Phys. Chem. A. 117(10), 2253–2259 (2013).
Kamlet, M. & Adolph, H. The relationship of impact sensitivity with structure of organic high explosives. Ii Polynitroaromatic Explosives Propellants Explosives Pyrotechnics 4(2), 30–34 (1979).
Rice, B. M. & Hare, J. J. A quantum mechanical investigation of the relation between impact sensitivity and the charge distribution in energetic molecules. J. Phys. Chem. A. 106(9), 1770–1783 (2002).
Siqueira Soldaini Oliveira, R., Borges, I. Jr & Propellants Correlation between molecular charge properties and impact sensitivity of explosives: Nitrobenzene derivatives. Explos. Pyrotech. 46(2), 309–321 (2021).
Keshavarz, M. H. & Jaafari, M. Investigation of the various structure parameters for predicting impact sensitivity of energetic molecules via artificial neural network. Propellants Explosives Pyrotechnics 31(3), 216–225 (2006).
Xu, J. et al. Qspr studies of impact sensitivity of nitro energetic compounds using three-dimensional descriptors. J. Mol. Graph. Model. 36, 10–19 (2012).
Fayet, G. & Rotureau, P. Development of simple Qspr models for the impact sensitivity of nitramines. J. Loss Prev. Process Ind. 30, 1–8 (2014).
Toropov, A. A. et al. Qsar as a random event: modeling of nanoparticles uptake in paca2 cancer cells. Chemosphere 92(1), 31–37 (2013).
Kumar, A. & Kumar, P. Cytotoxicity of quantum Dots: use of quasismiles in development of reliable models with index of ideality of correlation and the consensus modelling. J. Hazard. Mater. 402, 123777 (2021).
Toropova, A. P., Toropov, A. A., Leszczynska, D. & Leszczynski, J. How the coral software can be used to select compounds for efficient treatment of neurodegenerative diseases? Toxicol. Appl. Pharmcol. 408, 115276 (2020).
Lotfi, S., Ahmadi, S., Azimi, A. & Kumar, P. Prediction of second-order rate constants of the sulfate radical anion with aromatic contaminants using the Monte Carlo technique. New J. Chem. 47(42), 19504–19515 (2023).
Kumar, A. & Kumar, P. Identification of good and bad fragments of tricyclic triazinone analogues as potential pkc-θ inhibitors through smiles–based Qsar and molecular Docking. Struct. Chem. 32, 149–165 (2021).
Ahmadi, S., Lotfi, S. & Kumar, P. A Monte Carlo method based Qspr model for prediction of reaction rate constants of hydrated electrons with organic contaminants. SAR QSAR Environ. Res. 31(12):935–950. (2020).
Toropova, A. P. & Toropov, A. A. Whether the validation of the predictive potential of toxicity models is a solved task? Curr. Top. Med. Chem. 19(29), 2643–2657 (2019).
Kumar, A., Kumar, P. & Singh, D. Qsrr modelling for the investigation of gas chromatography retention indices of flavour and fragrance compounds on Carbowax 20 m glass capillary column with the index of ideality of correlation and the consensus modelling. Chemometr. Intell. Lab. Syst. 224, 104552 (2022).
Kumar, P. & Kumar, A. Coral: Predictions of Quality of Rice Based on Retention Index Using a Combination of Correlation Intensity Index and Consensus Modelling. Qspr/qsar Analysis Using Smiles and quasi-smiles 21–462 (Springer, 2023).
Toropova, A. P., Toropov, A. A., Roncaglioni, A. & Benfenati, E. The index of ideality of correlation improves the predictive potential of models of the antioxidant activity of tripeptides from frog skin (litoria rubella). Comput. Biol. Med. 133, 104370 (2021).
Toropov, A. A. & Toropova, A. P. Correlation intensity index: Building up models for mutagenicity of silver nanoparticles. Sci. Total Environ. 737, 139720 (2020).
Mathieu, D. Sensitivity of energetic materials: theoretical relationships to detonation performance and molecular structure. Ind. Eng. Chem. Res. 56(29), 8191–8201 (2017).
Cho, S-G. et al. Optimization of neural networks architecture for impact sensitivity of energetic molecules. Bull. Korean Chem. Soc. 26(3), 399–408 (2005).
Lotfi, S., Ahmadi, S. & Kumar, P. Correction: ecotoxicological prediction of organic chemicals toward Pseudokirchneriella subcapitata by Monte Carlo approach. RSC Adv. 12(53), 34567–34567 (2022).
Toropov, A. A. et al. The study of the index of ideality of correlation as a new criterion of predictive potential of qspr/qsar-models. Sci. Total Environ. 659, 1387–1394 (2019).
Kumar, P. & Kumar, A. Coral: Qsar models of cb1 cannabinoid receptor inhibitors based on local and global smiles attributes with the index of ideality of correlation and the correlation contradiction index. Chemometr. Intell. Lab. Syst. 200, 103982 (2020).
Toropova, A. P., Toropov, A. A., Veselinović, J. B., Miljković, F. N. & Veselinović, A. M. Qsar models for hept derivates as Nnrti inhibitors based on Monte Carlo method. Eur. J. Med. Chem. 77, 298–305 (2014).
Duhan, M. et al. Synthesis, molecular Docking and Qsar study of thiazole clubbed pyrazole hybrid as α-amylase inhibitor. J. Biomol. Struct. Dynamics. 39(1), 91–107 (2021).
Kumar, P., Kumar, A. & Sindhu, J. Design and development of novel focal adhesion kinase (fak) inhibitors using Monte Carlo method with index of ideality of correlation to validate Qsar. SAR QSAR Environ. Res. 30(2), 63–80 (2019).
Iovine, N., Toropova, A. P., Toropov, A. A., Roncaglioni, A. & Benfenati, E. Simulation of the long-term toxicity towards Bobwhite quail (colinus virginianus) by the Monte Carlo method. J. Xenobiotics 15(1), 3 (2024).
Toropova, A. P., Toropov, A. A., Roncaglioni, A. & Benfenati, E. Monte Carlo technique to study the adsorption affinity of Azo dyes by applying new statistical criteria of the predictive potential. SAR QSAR Environ. Res. 33(8), 621–630 (2022).
Ahmadi, S., Lotfi, S., Azimi, A. & Kumar, P. Multicellular target Qsar models for predicting of novel inhibitors against pancreatic cancer by Monte Carlo approach. Results Chem. 10, 101734 (2024).
Toropova, A. P., Toropov, A. A. & Fjodorova, N. Quasi-smiles for predicting toxicity of nano-mixtures to daphnia magna. NanoImpact 28, 100427 (2022).
Lotfi, S., Ahmadi, S., Azimi, A. & Kumar, P. In silico aquatic toxicity prediction of chemicals towards daphnia magna and fathead minnow using monte carlo approaches. Toxicol. Mech. Methods 1–21 (2024).
Kumar, P. & Kumar, A. In Silico enhancement of Azo dye adsorption affinity for cellulose fibre through mechanistic interpretation under guidance of Qspr models using Monte Carlo method with index of ideality correlation. SAR QSAR Environ. Res. 31(9), 697–715 (2020).
Kumar, P., Kumar, A., Sindhu, J. & Lal, S. Qsar models for nitrogen containing monophosphonate and bisphosphonate derivatives as human Farnesyl pyrophosphate synthase inhibitors based on Monte Carlo method. Drug Res. 69(03), 159–167 (2019).
Ahmadi, S., Lotfi, S., Hamzehali, H. & Kumar, P. A simple and reliable Qspr model for prediction of chromatography retention indices of volatile organic compounds in peppers. RSC Adv. 14(5), 3186–3201 (2024).
Bagri, K., Kumar, A., Nimbhal, M. & Kumar, P. Index of ideality of correlation and correlation contradiction index: A confluent perusal on acetylcholinesterase inhibitors. Mol. Simul. 46(10), 777–786 (2020).
Sokolović, D. et al. Monte carlo-based Qsar modeling of dimeric pyridinium compounds and drug design of new potent acetylcholine esterase inhibitors for potential therapy of myasthenia Gravis. Struct. Chem. 27, 1511–1519 (2016).
Golbraikh, A. & Tropsha, A. Beware of q2! J. Mol. Graph. Model. 20(4), 269–276 (2002).
Ojha, P. K., Mitra, I., Das, R. N. & Roy, K. Further exploring rm2 metrics for validation of Qspr models. Chemometr. Intell. Lab. Syst. 107(1), 194–205 (2011).
Pratim Roy, P., Paul, S., Mitra, I. & Roy, K. On two novel parameters for validation of predictive Qsar models. Molecules 14(5), 1660–1701 (2009).
Roy, K. & Kar, S. The rm2 metrics and regression through origin approach: reliable and useful validation tools for predictive Qsar models (commentary on ‘is regression through origin useful in external validation of Qsar models?’). Eur. J. Pharm. Sci. 62, 111–114 (2014).
Chatterjee, M. & Roy, K. Data fusion quantitative read-across structure-activity-activity relationships (q-rasaars) for the prediction of toxicities of binary and ternary antibiotic mixtures toward three bacterial species. J. Hazard. Mater. 459, 132129 (2023).
Bumham, K. P. & Anderson, D. R. Model Selection and Multimodel Inference: A Practical Information-Theoretic Approach (Spnnger-Veflag, 2002).
Nimbhal, M., Bagri, K., Kumar, P. & Kumar, A. The index of ideality of correlation: A statistical yardstick for better Qsar modeling of glucokinase activators. Struct. Chem. 31, 831–839 (2020).
Roy, K., Das, R. N., Ambure, P. & Aher, R. B. Be aware of error measures. Further studies on validation of predictive Qsar models. Chemometr. Intell. Lab. Syst. 152, 18–33 (2016).
Chirico, N. & Gramatica, P. Real external predictivity of Qsar models: how to evaluate it? Comparison of different validation criteria and proposal of using the concordance correlation coefficient. J. Chem. Inf. Model. 51(9), 2320–2335 (2011).
Hamzehali, H., Lotfi, S., Ahmadi, S. & Kumar, P. Quantitative structure–activity relationship modeling for predication of Inhibition potencies of Imatinib derivatives using smiles attributes. Sci. Rep. 12(1), 21708 (2022).
Ojha, P. K. & Roy, K. Development of a robust and validated 2d-qspr model for sweetness potency of diverse functional organic molecules. Food Chem. Toxicol. 112, 551–562 (2018).
Oliveira, M. A. S., Oliveira, R. S. S. & Borges, I. Quantifying bond strengths via a coulombic force model: application to the impact sensitivity of nitrobenzene, nitrogen-rich Nitroazole, and non-aromatic nitramine molecules. J. Mol. Model. 27, 1–17 (2021).
Zhang, C., Shu, Y., Huang, Y., Zhao, X. & Dong, H. Investigation of correlation between impact sensitivities and nitro group charges in nitro compounds. J. Phys. Chem. B. 109(18), 8978–8982 (2005).
Liu, Q. et al. Exchanging of nh2/nhnh2/nhoh groups: an effective strategy for balancing the energy and safety of fused-ring energetic materials. Chem. Eng. J. 466, 143333 (2023).
Li, J. A multivariate relationship for the impact sensitivities of energetic n-nitrocompounds based on bond dissociation energy. J. Hazard. Mater. 174(1–3), 728–733 (2010).
. The relationship of impact sensitivity with structure of organic high explosives. I. Polynitroaliphatic explosives. Proceedings of the 6th Symposium on Detonation, ONR Report ACR (1976).
Nefati, H., Cense, J-M. & Legendre, J-J. Prediction of the impact sensitivity by neural networks. J. Chem. Inf. Comput. Sci. 36(4), 804–810 (1996).
Jun, Z., Xin-lu, C., Bi, H. & Xiang-dong, Y. Neural networks study on the correlation between impact sensitivity and molecular structures for nitramine explosives. Struct. Chem. 17, 501–507 (2006).
Keshavarz, M. H. & Pouretedal, H. R. Simple empirical method for prediction of impact sensitivity of selected class of explosives. J. Hazard. Mater. 124(1–3), 27–33 (2005).
Keshavarz, M. H., Pouretedal, H. R. & Semnani, A. Novel correlation for predicting impact sensitivity of nitroheterocyclic energetic molecules. J. Hazard. Mater. 141(3), 803–807 (2007).
Wang, R., Jiang, J. & Pan, Y. Prediction of impact sensitivity of nonheterocyclic nitroenergetic compounds using genetic algorithm and artificial neural network. J. Energ. Mater. 30(2), 135–155 (2012).
Morrill, J. A. & Byrd, E. F. Development of quantitative structure–property relationships for predictive modeling and design of energetic materials. J. Mol. Graph. Model. 27(3), 349–355 (2008).
Deng, Q. et al. Probing impact of molecular structure on bulk modulus and impact sensitivity of energetic materials by machine learning methods. Chemometr. Intell. Lab. Syst. 215, 104331 (2021).
Mathieu, D. & Alaime, T. Predicting impact sensitivities of nitro compounds on the basis of a semi-empirical rate constant. J. Phys. Chem. A. 118(41), 9720–9726 (2014).
Mathieu, D. & Alaime, T. Impact sensitivities of energetic materials: exploring the limitations of a model based only on structural formulas. J. Mol. Graph. Model. 62, 81–86 (2015).
Prana, V., Fayet, G., Rotureau, P. & Adamo, C. Development of validated Qspr models for impact sensitivity of nitroaliphatic compounds. J. Hazard. Mater. 235, 169–177 (2012).
Author information
Authors and Affiliations
Contributions
S. L was involve in ddata collection, and curation, investigation, methodology incl. statistics, software, visualization, writing of original draft, and revision. S. A was involved in conceptualization, data curation, formal analysis, investigation, methodology incl. statistics, software, review, and editing. A P.T was involved in conceptualization, formal analysis, visualization, review, and editing. A A. T was involved in formal analysis, visualization, review, and editing. All authors reviewed the manuscript.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Lotfi, S., Ahmadi, S., Toropova, A.P. et al. Construction of reliable QSPR models for predicting the impact sensitivity of nitroenergetic compounds using correlation weights of the fragments of molecular structures. Sci Rep 15, 11160 (2025). https://doi.org/10.1038/s41598-025-95129-0
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41598-025-95129-0