Figure 1

The generic workflow of point group classification from the chemical formula. It starts from the generation of the material space based on the common and uncommon oxidation states of the constituent elements. All elements of the periodic table till atomic number 85 are considered except H, Au, Pt, and noble gases. Then, the material space is matched with the harvested structural data from the open access NOMAD repository48. After that, the coefficient, oxidation number, ionic radius and the first ionization energy of each constituent element are processed as features for learning purposes. Next, the influence of imbalance distribution of data is mitigated using minority oversampling. Then, the model is trained using one-vs-rest classifier. Finally, the point groups of the chemical formulae are predicted which can possess more than one label.