Fig. 1: Workflow for the construction and analysis of a Magnetic Materials Database. | Nature Communications

Fig. 1: Workflow for the construction and analysis of a Magnetic Materials Database.

From: The northeast materials database for magnetic materials

Fig. 1

Scientific articles are processed via three pathways based on their format. Articles retrieved through the Journal’s API in XML format are parsed using both a text parser and a table parser. Standard PDF documents are handled by a PDF parser, which converts the content into markdown text. For older, scanned or image based PDFs and historical handbooks, we use Google Gemini’s OCR capabilities to extract text and tables accurately. For longer documents like handbooks, the content is processed page by page and converted into markdown format. All markdown outputs are then converted into CSV files. These files are passed through GPT-4o with structured prompts to extract relevant materials data in a consistent JSON format. After cleaning and standardizing the extracted information, we compile it into the NEMAD database. The curated dataset is used to train machine learning models for classification and for predicting Curie and Néel temperatures. The trained models are then applied to screen for high-performance magnetic compounds.

Back to article page