Fig. 1: The integrated framework navigating polymorph generation and distilled potential development guided by entropy-symmetry landscapes.

a Schematic diagram of the entropy-symmetry landscape, using instantaneous pair entropy (sS) and the sixth-order Steinhardt parameter (Q6) as global and local order parameters, respectively, for topological analysis of polymorphs. Using snapshots of silicon melting from cubic diamond (CD) to liquid as an example, the progression through CD → CD + the first neighbour CD (1st CD) → CD + 1st CD + the second neighbour CD (2nd CD) → CD + 1st CD + 2nd CD + liquid → liquid is visualized based on energy states. Dark red shades indicate low-energy crystalline phases, while light red shades represent high-energy disordered states. Inset atomic structures are coloured according to diamond‑structure type: blue for cubic diamond, cyan for cubic diamond (1st neighbour), light green for cubic diamond (2nd neighbour), orange for hexagonal diamond, yellow for hexagonal diamond (1st neighbour), lime green for hexagonal diamond (2nd neighbour), and light grey for other. b Schematic diagram of the PolymorphGen, which enables efficient construction of configuration libraries on sS-Q6 landscapes, incorporates polymorph generations inspired by genetic mutation. PolymorphGen bypasses the reliance on time-consecutive integration for structure generation, moving from the conventional structure-temperature-pressure (S-T-P) ab initio molecular dynamics (AIMD) framework to the entropy-symmetry-displacement-volume-shape (sS-Q6-D-V-S) framework. c Schematic diagram of the machine learning potential knowledge distillation (MLPKD) framework, which enables multi-resolution classification and, when combined with the PolymorphGen configuration library, facilitates the transfer of cross-scale accuracy from density functional theory (DFT) through message passing neural network (MPNN) to deep neural network (DNN), thereby achieving a 106-fold speed enhancement. Auto-DFT scheduling platform is detailed in Supplementary Fig. 4. The number of standard thermodynamics configurations after multi-resolution classification ranges from 101 to 103, while supplementary configurations range from 102 to 104. d Comparison between previous active-learning-based structure-property relationship frameworks and our integrated PolymorphGen-MLPKD framework, visually demonstrating the non-iterative nature of our streamlined approach that overcomes the accuracy-efficiency limitations. n, number of iterations; TPre, initial data preparation time; TMLP, machine learning potential training time; TExp, configuration exploration time; TDFT, DFT computation time; TPG, PolymorphGen processing time; TMPNN, MPNN model training time; TDNN-KD, DNN model trained via knowledge distillation (DNN-KD) training time. Source data are provided as a Source Data file.