Introduction

Development of the reaction capable of reorganizing multiple C–C single bonds to rapidly construct the core, complexity as well as diversity of bioactive organic molecules, is of great importance in the fields of synthesis science, biology and medicine. The angular tricycles are one of the crucial cores which have three carbon-rings fused with one quaternary carbon center1,2. They exist extensively in and have the substantial bioactive contribution to bioactive polycyclic molecules3,4,5,6, such as the terpenoids and their related lactones, lactams and alkaloids7,8,9. Since 1970′s, chemists have been making efforts to synthesize the angular tricyclic molecules and relatives for the chemistry, biology and medicine researches. However, these strategies developed usually suffer from a difficult construction of the key angular tricycles, because of their inherent rigidity and hindrance, and thus stepwise approaches have to be used in most cases10,11,12,13,14,15,16. Recently we have developed so far the most efficient 1-step method to access these scaffolds (Fig. 1), which involves remarkably a synergistic cascade reorganization of several C–C single bonds of 1,3-cycloalkenylidine ketone substrates. Further advantages include: catalytic effective with various Lewis acids, high yields (> 90%) and short reaction time (0.5–12 h) in many cases, broad scope and easy availability of substrates, as well as mild conditions and convenient operation. This reaction has efficiently been applied by our team to synthesize a series of complex natural bioactive polycyclic terpenoids with angular tricycles17,18,19,20. In addition, a qualitative substitution/ring strain effect and reaction mechanism involving an initial rate-determining 4π-electrocyclization have primarily been investigated (Fig. 1). In order to further insight into its structure-property, especially the reactivity and regioselectivity, so as to achieve its broad and efficient utility in synthetic chemistry, herein we continue to conduct an in-depth kinetic study of this reaction by use of in situ IR technology and DFT calculations. As a powerful tool for seeking inner correlation between molecule structural factors and chemical reaction behaviors (i.e., yields, selectivity, reactivity), machine learning is now gradually used for chemical science21,22,23,24,25,26,27. Here this method is used, for the first time, to explore the regioselectivity prediction model of the tittle reaction.

Fig. 1: Cascade 4πe-cyclization/dicycloexpansion reaction.
figure 1

Reaction mechanism proposed and kinetic illustration of the tittle cascade reaction.

For establishing the quantitative relationships of structure-reactivity/regioselectivity, the reaction rate constants k data were measured using in situ IR technique28,29,30 together with 1H qNMR analysis31,32,33. The free energy ΔG1 of the initial 4π-electrocyclization transition state d and formation energy ΔG2 of allylic carbocation e were calculated out using DFT, together with Hammett constant σp34, to express structure variants of substrates. Well-fitted linear relationships of lnk/(ΔG1/T), ΔG1G2, ln(k/kH)/σp and some more crucial kinetic parameters of this reaction were also obtained. The ΔΔG data of all allylic carbocation intermediates e were obtained by DFT calculations to determine priority of the first cycloexpansion and summarized into training set, whereafter Support Vector Regression (SVR), Neural Network (NN) and Linear Regression algorithms (LR) and so on were applied to train the training set24,35,36 to establish a model for predicting the reaction regioselectivity. 120 virtual examples were trained by the algorithms, and six selected substrates 15a, 28a–32a19 were validated for the reliability of this model. This model was verified practical for designing suitable reactants to obtain desired angular tricyclic skeletons and foreseeing reaction processes and experimental results. Here, we show a kinetic study based on in situ IR and DFT, and regioselectivity prediction by combination of DFT with machine learning algorithms.

Results

Design of substrates′ scope

Our broad research began with the preparation of substrates 1a–30a via simple aldol condensation/dehydration from ketone and cycloalkanone material17,19, which were classified as 6 represented groups according to their structure variation. The 1st (1a–5a) group was designed to examine the σ→π electronic donating effect of alkyls R1/R2 on π-system; the 2nd (10a–15a, 26a–27a) group with aryls, carbonate esters, or cyano for examining the π→π conjugative effect; the 3rd with arylthios (16a–21a) and 4th group with halogen (F, Cl, Br) (23a–25a) for examining the heteroatomic substituent′s n→π and σ←π effect; the 5th group (1a, 6a–9a) for examining the ring′s strain/size effect (m = 4, n = 3–7). Six examples (15a, 28a–32a) were subjected to subsequent validation of prediction model of regioselectivity (i.e., cycloexpansion priority of m vs. n). All rate constants and corresponding ΔG1 values were summarized in Table 1 (see Supplementary Table 4 in SI for details).

Table 1 Rate constants k of various substituted substrates

Kinetic study for acquisition of rate constants k

Rate constants k data were indirectly obtained through reaction rate equation. Thus the reaction order of this cascade reaction should be firstly studied. Taking the reaction of 1a to 1b (R1 = R2 = Me, m = n = 4, Fig. 1) to exemplify the procedure, we obtained initially the characteristic IR absorption of 1a and 1b in CHCl3 at 1618 cm1 and 1659 cm−1 (Fig. 2a), respectively. By converting the absorbance-time relationship of 1b into its concentration-time relationship, the cross-section could be obtained (black curve in Fig. 2b), representing yield of 1b rising along with reaction time increasing.

Fig. 2: Rate constant acquisition via in situ IR.
figure 2

a 3D reaction profiles of 1a to 1b tracked via in situ IR technique; b Reaction profiles for yielding 1b with different initial concentration of In(SbF6)3 and 1a, indicated as [In(SbF6)3]0 and [1a]0, respectively.

The kinetic reaction order could be obtained by varying the initial concentrations [In(SbF6)3]0 and [1a]0 of catalyst and substrate 1a, respectively. In the tests, [In(SbF6)3]0 in the black and red curves were the same but their [1a]0 data were different. And the calculated reaction rates for yielding 1b in the black and red curves were 4.82 × 104 M•s1 and 5.06 × 104 M•s−1, respectively. Their small difference suggested that the reaction was of a zero-order kinetic dependence on [1a] of substrate a. By varying [In(SbF6)3]0 from 0.02 to 0.01 M, the calculated reaction rate for yielding 1b in the blue curve was 2.39 × 10−4 M•s1, which was about 1/2 of the reaction rate in the black curve, indicating that the reaction was of a first-order kinetic dependence on [In(SbF6)3] of catalyst. Accordingly, the rate equation could be expressed as kobs = k[In(SbF6)3]1[a]0, where kobs was the reaction rate for generating product b and k was the rate constant. The detail experimental operation and calculation procedure, and the full data obtained were summarized in SI (Supplementary Information, Supplementary Table 4) attached. Altering the reaction temperature, activation enthalpy ΔH was calculated as 18.4 kcal/mol and activation entropy ΔS as −7.2 cal•(K•mol)−1 from Eyring equation. Notably, the negative ΔS value indicated the 4π electrocyclization process was rate-determining step.

Quantitative structure-reactivity relationship

All the rate constants k were measured via in situ IR experiments. DFT calculations of all examples also presented the highest energy barriers ΔG1 for the initial 4π-electrocyclization in their corresponding reaction process, demonstrating this cyclization was the rate-limiting step. Thus the relationship between lnk and ΔG1/T was plotted in Fig. 3a, which revealed that all substrates 1a–30a followed two well-fitted (R-Squares: 0.938 and 0.947) linear relationships in blue and red with slopes of −343.2 and −155.2, respectively, though they were initially classified as six different groups. In detail, the reactivity of substrates 16a–21a, 23a–25a and 28a with n→π electron donating but σ←π withdrawing effects of heteroatom-based substituents (phenylthio, halogen and thienyl) followed the blue linear dots with a bigger slope of −343.2, which indicated that their reactivity shall show more sensitive to reaction temperature. Interestingly, in this series, the reactivity for the stronger electron-withdrawing arylthio-substitutions was reversely higher than that of the stronger electron-donating (e.g., p-CF3C6H4S > p-ClC6H4S ≈ p-FC6H4S > PhS > p-MeC6H4S > 2-Thiofenyl > p-MeOC6H4S). While the reactivity of other substrates 1a–15a, 22a, 26a–27a with various carbon-based substitutions (alkyls, aryls, carbonate ester, cyano) or with ring-size′s variation of m/n (= 3–7) followed the red linear dots with a smaller slope of −155.2. And the stronger electron-donating substituents R1/R2 showed higher activation ability, displaying the order: i-Pr > Et > Me; p-OMeC6H4 > p-MeC6H4 > Ph > p-FC6H4 > p-ClC6H4 > p-CF3C6H4). Electron-donating effect of substituent on benzene rings of aryl (red) or thiophenyl (blue) substrates showed a quite different tendency on chemical reactivity, possibly due to the substituents′ disparate electronic induction directions. For the red line, carbon-based substituents always showed the π→π electron donating effect. While for blue line, different heteroatom-based substituents showed different total electron effect of σ←π and n→π. Hammett analysis was done to explore the linear relationship between ln(k/kH) of the aryl and thiophenyl substrates and σp of respective Rp, which was displayed in SI. When R2 = Me, m = 4 and n varied from 3 to 7, the highest reactivity was observed for the substrate with n = 319.

Fig. 3: Linear fitting of structure-reactivity relationship.
figure 3

a Relationship of lnk to ΔG1/T, indicating reactivity of a varying with R2′s property and (m, n) ring′s size; b Relationship of ΔG1 to the formation energy ΔG2 of allylic carbocation e. Unless otherwise specified, m or n = 4.

Two substrates 29a and 30a with the same Ph and different cycloalkyl sizes were applied to foresee their reactivity (green dots). The DFT calculation results indicated their reactivity shall follow the red dots, which was in agreement with latter experimental fact. Comparison of k data (29a vs 8a, 30a vs 7a) showed that the Ph′s substitution could highly promote reactivity regardless of Ph was at same or different side of larger rings (m/n = 5, 6). Two sets of kinetic experiments with substrates bearing active C3-MeO or BocHN failed to give reaction rate constant k. In the case of R1 = Me, R2 = OMe, m = n = 4, reaction gave the overlapped signals of in situ IR. While R2 = BocHN, substrate was too inactive under the standard condition to give the detectable signals within the measurable region of IR-spectrometer, possible due to the association of Lewis acid with the active N-atom of BocHN, that deactivated the catalysis.

Moreover, the stability of allylic carbocations e was evaluated by their formation energy ΔG2. The linear correlation between ΔG1 and ΔG2 (R-Square = 0.874) was found (Fig. 3b), indicating that the more stable allylic carbocation intermediate e was more likely to be formed and proceeding the subsequent ring expansion, which was to some extent more favorable to conduct higher reactivity of 4π-electrocyclization processes. Generally, weak electron donating effect of aryl, alkyl groups could stabilize allylic carbocation e, while the σ←π electron withdrawing effect of heteroatomic groups (blue curve in Fig. 3b) might reduce the reactivity of substrates by weakening the stability of allylic carbocations. This accounts for why two different fitting curves were found in Fig. 3a.

Prediction of cycloexpansion priority

In order to predict the reaction regioselectivity (cycloexpansion priority of m vs. n), DFT calculations combined with ML algorithms modeling were performed to construct a prediction model, which was then validated by experiment. As shown in Fig. 4a, the cycloexpansion of allylic carbocation intermediate e could be initiated either from left ring (m) or the right (n), leading to two different regioisomeric products (in blue or dark red, respectively). Thus training sets including the features ΔΔG-m, ΔΔG-n, ΔΔG-R1 and ΔΔG-R2 (or the features m-rse, m-ra, n-rse, n-ra, R1p, R2p, see SI for details) of these allylic carbocation intermediates were obtained by DFT calculations. For the substitution effect of R1, provided that R2 = Me, m = n = 4, the energy barrier ΔΔG-R1 ( = ΔG-R1– ΔG-Me, Fig. 4b) would determine if its nearest m ring will take the first expansion. Similarly, for ring-strain′s effect of m, provided that R1 = R2 = Me, n = 4, the energy barrier ΔΔG-m ( = ΔG-m – ΔG-4) would determine if the m ring will take the first expansion. The training sets containing 120 virtual examples by DFT calculations were further trained by Random Forest (RF), Neural Network (NN), Support Vector Regression (SVR) algorithms and so on, and the mathematical model could be constructed to predict the more favorable product. When the algorithms output the positive (or negative) value, the first cycloexpansion of the right n (or left m) ring was predicted to take place. As shown in Fig. 4c, the 5-fold cross-validation of all the ML algorithms suggested SVR method showed best performance in R-square (0.979) and smallest MAE (0.691). Thus SVR could be used to predict priority of the first cycloexpansion. In Fig. 4d, experimental validation of six substrates (15a, 28a32a) was carried out as test set to examine the reliability of the prediction model. Fortunately, all the outputs predicted by SVR algorithm well matched the experimental results, as directional distributions of the horizontal bars suggested (Fig. 4d). Additionally, the predicted result based on SVR algorithm was consistent with that calculated out by DFT. The left (or right) distribution meant left ring m (or right ring n) took the first cycloexpansion (Fig. 4d). Particularly applicable was the regioselectivity prediction for 30a, which had the favorable Me and cyclobutyl located at the competing left and right sites, respectively, and its cycloexpansion priority was hard to predict based on our general experience.

Fig. 4: Establishment and validation of regioselectivity prediction model.
figure 4

a Framework for predicting priority of cycloexpansion (m vs n) by DFT calculations combined with ML algorithms; b Explanations of generating features ΔΔG-R1/R2, ΔΔG-m/n and ΔΔG, LA = In(SbF6)3; c Performance of ML algorithms for regioselectivity predicting; d Reliability testing of SVR algorithm′s outputs.

Discussion

Based on the research evidences obtained above using in situ IR technology, DFT calculations and machine learning algorithms, together with some results we previously reported, we have achieved the linear structure-reactivity relationship and structure-regioselectivity prediction model of tittle cascade reaction generating angular tricycles. The structure affections mainly involve the σ-π, n-π, and/or π-π electron effect of C1, C3-substituents (R1, R2) on the 4π- system, as well as the ring′s strain (or size) of two rings (m, n) to be expanded. The combination work of in situ IR measurement and DFT calculations reveals that reactivity of this 4π electrocyclization process is mainly affected by electronic properties of R1/R2 and steric hindering of m/n. While machine learning modeling based on DFT calculations suggests the cycloexpansion priority is primarily determined by competitive effect of electronic property of R1/R2 versus the ring′s strain of m/n. Actually, in the case m, n = 6, 7 or 8, reactivity of the tittle cascade reaction is rather low and no expected reaction product can be detected19. For these large size ring substituted ketones, 4π electrocyclization might not be rate-determining step, and cycloexpansion processes were hard to occur. When smaller size ring presents as m or n = 4, 5, reaction is active enough. While cyclopropyl (m or n = 3) is designed to induce first cycloexpansion, no reaction is observed, as it releases only small amount of energy from 3- to 4-membered ring, and the latter is still an unfavorable strained ring system. Fortunately, the relationship of lnk/(ΔG1/T), ΔG1G2, ln(k/kH)/σp and ΔΔG/(ΔΔG-R1, ΔΔG-R2, ΔΔG-m, ΔΔG-n) or ΔΔG/(m-rse, m-ra, n-rse, n-ra, R1p, R2p) are able to explain and predict the reaction reactivity and regioselectivity of ordinarily used 4-6-membered ring systems. Further study of this project is going in our group.

Methods

General information

The reactions were performed using oven-dried glassware equipped with a magnetic stir bar under an atmosphere of argon if without otherwise noted. Reagents purchased from commercial suppliers were directly used without further purification. Extra dry solvents used for preparation of starting materials were obtained by standard operating method: toluene, tetrahydrofuran, diethyl ether (Et2O) were distilled from sodium and dichloromethane (DCM) was distilled from calcium hydride, excerpt for commercially available solvents. Analytically pure chloroform (CHCl3) was treated with concentrated sulfuric acid (V(CHCl3): V(H2SO4) = 20: 1) to remove the stabilizer, dried by anhydrous CaCl2 grains overnight, distilled under argon protection and stored in a dark place to give freshly prepared chloroform (CHCl3) solvent for in situ IR measurements. Thin-layer chromatography was performed with EMD silica gel 60 F254 plates eluting with solvents indicated, visualized by a 254 nm UV lamp and stained with phosphomolybdic acid (PMA). 1H NMR, 13C NMR spectra were obtained on Bruker AM-400, Bruker AM-500, or Bruker AM-600. Chemical shifts (δ) were quoted in ppm relative to tetramethylsilane or residual un-deuterated solvent as internal standard (CDCl3: 7.26 ppm for 1H NMR, 77.00 ppm for 13C NMR), and multiplicities were as indicated: s = singlet, d = doublet, t = triplet, q = quartet, m = multiplet, dd = doublet of doublet, td = triplet of doublet, ddd = doublet of doublet of doublet. The IR spectra were recorded on Nicolet FT-170SX spectrometer. High-resolution mass spectral analysis (HRMS) data were measured on the Bruker ApexII by means of ESI technique. The in situ IR experiments were recorded on Mettler Toledo React IR 4000 spectrometer using a 6.3 mm AgX diamond comb.

General procedure for the in situ IR experiments

Purified starting material a (0.20 mmol, 1.0 equiv) was loaded into reaction tube without solvent, filled with pure argon and cooled to 0 °C in ice-bath, then the background spectrum was collected keeping no contact between starting material and probe. CHCl3 (0.20 mL) was injected into the reaction tube and newly prepared In(SbF6)3 solution (4.4 mg InCl3 and 20.6 mg AgSbF6 dissolved in 0.80 mL CHCl3, 0.02 mmol, 0.1 equiv) was subsequently cooled in ice-bath for 10 mins. The In(SbF6)3 solution was rapidly injected into solution of a under argon protection, simultaneously the record started. The measurement finished when formation of b reached chemical equilibrium. The reaction mixture was filtered in air to remove solid phase, washed by CH2Cl2 (5 × 10 mL), and 1,3,5-trimethoxybenzene (16.8 mg, 0.10 mmol) was added as internal standard. The reaction mixture was concentrated under vacuum to determine the content of b by quantitative NMR method (γ was obtained from 1H qNMR yield31,32,33 and k was calculated from Supplementary Equation (2)).