Background & Summary

The term ‘cryogenics’ is defined for applications at T < 120 K1. A wide spectrum of technologies is deployed in low-temperature (low-T) environments (Fig. 1), such as quantum technologies (e.g., quantum computing, quantum communication, quantum sensing), superconducting-enabled applications (e.g., magnetic resonance imaging or MRI, magnetic levitation or maglev) (1–4.2 K), aerospace or transportation technologies powered by liquid hydrogen (LH2, 14–20 K) or oxygen (LOX, 54–90 K), and liquefied natural gas (LNG, 91–112 K) (Fig. 1a). In these low and discrete temperature ranges, metallic alloys may behave differently from those at room temperature (RT) where material behaviors are better understood (Fig. 1b,c).

Fig. 1
figure 1

Low-temperature (low-T) technology and applications. (a) Cryogenic engineering applications, which include a quantum computer (https://commons.wikimedia.org/w/index.php?search=Dmitrmipt&title=Special:MediaSearch&type=image), magnetic resonance imaging (MRI) scanner employing NbTi superconducting magnets (https://commons.wikimedia.org/wiki/File:MRI-Philips.jpg), liquid hydrogen storage tank for rockets (https://commons.wikimedia.org/wiki/File:SLS_Liquid_Hydrogen_Tank_Test_Article_(NASA_20181029-SLS_LH2_STA_lift).jpg), high-temperature superconducting maglev trains (https://commons.wikimedia.org/wiki/File:Shanghai_Maglev_2.jpg), high-temperature superconducting cables (https://commons.wikimedia.org/wiki/File:High-Temperature_Superconducting_Cables_(5884863158).jpg), and liquefied natural gas storage tank (https://commons.wikimedia.org/wiki/File:National_Grid_LNG_Tank.jpg). The temperature range is not to scale. (b) Literature data map (1991-2023) ranked by the alloy types and temperatures21. (c) Crystal structures, strength and Charpy impact tests, and representative mechanical properties6. (d) Refrigerants and their temperature ranges21. The lengths correspond to the number of data points in the dataset.

Achieving precise control of low temperatures poses various challenges for refrigeration techniques. In low-T experiments, materials are generally either directly immersed in refrigerants or cooled by refrigeration systems. The use of refrigeration systems provides more precise temperature control and can attain lower temperatures. Dilution refrigerators can achieve base temperatures in the milli-Kelvin range2. The effective temperature range of these refrigeration methods is largely determined by the working medium and is not continuous, such as liquid helium (1–4.2 K), liquid hydrogen (14–20 K), and liquid nitrogen (63–77 K) (Fig. 1).

Extreme cold presents significant challenges for maintaining certain material performance, such as toughness, ductility, and fracture resistance, making material selection critical in the design of low-T technologies3,4 (Fig. 1d). At low temperatures, materials generally exhibit increased strength but reduced ductility and toughness5,6. A ductile-to-brittle transition (DBT) may occur, which limits the yield of materials and is sensitive to the atomic-level structures7. This transition is more pronounced in body-centered cubic (BCC) materials, which are less ductile compared to hexagonal closely-packed (HCP) and face-centered cubic (FCC) materials8 (Fig. 1c). Multi-principal element alloys (MPEAs) such as medium-entropy alloys (MEAs) and high-entropy alloys (HEAs) exhibit comparable strength-ductility and strength-toughness balances to conventional low-T alloys (such as FCC-structured steels)9,10,11,12. Owing to their vast compositional design space, these complex alloys demonstrate remarkable potential for material selection. A standardized dataset containing the key material properties and a wide range of temperature covering the full, discontinuous spectrum of aforementioned refrigeration techniques could effectively guide the development of low-T applications.

For instance, hydrogen energy has drawn notable attention recently for its significance in high energy density and sustainability13,14. LH2 is a key propellant in space rockets, often combined with liquid oxygen for its high energy density and specific impulse, and its high specific heat allows it to effectively cool rocket engine nozzles and other critical parts15. LH2 is also used in civil applications such as long-range vehicles, ships and aircraft for the high energy density, low working pressure, and reduced CO2 emission16,17,18,19. The storage and transportation of LH2 in metallic containers can lead to hydrogen embrittlement, which poses potential risks of catastrophic failure under load. The same material and interface challenges are observed in superconductors and quantum technology, where the reliability and packaging of quantum circuits at low temperatures are of critical concern20. To address these issues, a large number of experimental studies have been conducted at high costs in low-T environments. As a representative example, these openly released data notably demonstrate that 300-series austenitic stainless steels show minimal hydrogen embrittlement at LH2 temperatures21, supporting their safe use as LH2 containers22,23.

Due to the varied temperature ranges of specific refrigeration techniques, the scientific studies and technical reports on the mechanical properties of metallic alloys are quite diverse and the reported data are often heterogeneous. Each temperature regime exhibits distinct characteristics. For instance, research at temperatures between 1–4.2 K typically investigates the low-T mechanical performance of structural materials used in superconducting systems. In contrast, studies focusing on the temperature range of 14–20 K often explore hydrogen effects. Collecting these research data could broaden our understanding of the composition-processing-microstructure-properties relationship of materials in current cryogenic applications and highlight potential challenges in the advancement of future technologies.

The FAIR principles (Findability, Accessibility, Interoperability, and Reusability)24 are increasingly vital in materials science and engineering. Standardized reporting formats for experimental reports and datasets are necessary to produce comprehensive, clean, and usable data, preventing the waste of resources and advancing material research in the data science and machine intelligence era25,26. Following these concepts, we construct a comprehensive dataset21 on the mechanical properties of metallic alloys at low-T conditions. The codes and scripts used to construct and manage the dataset21 are also openly released. Our dataset21 includes tensile and impact test data reported in 715 scientific articles and serves as a reference for the development and deployment of future low-T technologies.

Methods

The workflow follows our previous work on fatigue data of advanced alloys26,27,28, which includes three key steps of content acquisition, data extraction, and dataset construction. In this study, we enhance the pipelines by incorporating a feature to extract textual content from Portable Document Format (PDF) files and utilizing advanced language models to process the full text in a single request, thereby improving both efficiency and quality. The product dataset21 consists of both research metadata and scientific data. Metadata includes information such as titles, authors, the sources of publication, the years of publication, and digital object identifiers (DOIs). Scientific data includes research data such as material types, chemical compositions, processing and test conditions, and mechanical properties. For the lack of a unified data description for low-T mechanical properties reported in figures and tables, manual extraction and correction are needed. The final data are published in standardized form for reuse.

Content acquisition

With the compiled keywords of ‘Temperature’, ‘Property’, and ‘Alloy’ into a search formula (Table 1), Web of Science (WoS, https://www.webofscience.com/wos) returned 8,439 records of journal articles. The metadata are obtained through the ‘export’ function. WoS applies stemming rules to the search queries. Stemming removes suffixes such as ‘-ing’ and ‘-es’ from words in a search query to expand the search and retrieve additional, relevant records. The articles are classified through the natural language processing (NLP) model, Robustly Optimized Bidirectional Encoder Representations from Transformers (BERT) Pretraining Approach (RoBERTa)29, according to their abstracts. After manual inspection and correction, the valid results for the current study include 715 articles. The original documents are collected through their DOIs in Extensible Markup Language (XML), Hypertext Markup Language (HTML), and PDF formats under valid licenses through Tsinghua University Library (https://lib.tsinghua.edu.cn). The full-text articles from Elsevier are retrieved using its Developer Portal via an official Application Programming Interface (API, https://dev.elsevier.com/documentation/FullTextRetrievalAPI.wadl). Articles from other publishers are automatically downloaded using the open-source code article-downloader30 (https://github.com/olivettigroup/article-downloader), or manually downloaded from their official websites (e.g., https://www.mdpi.com, https://onlinelibrary.wiley.com, https://link.springer.com). XML/HTML documents can be automatically parsed and converted into structured text through computer codes, while PDF requires manual processing in a large part.

Table 1 Keywords used for article search in the citation dataset.

Data extraction

Images in the articles are directly collected from the XML/HTML documents or extracted from the PDF documents using PyMuPDF31 (https://pypi.org/project/PyMuPDF/). Figures with multiple panels are automatically segmented using a rule-based code. Figures presenting mechanical properties are screened by the convolutional neural network (CNN) model, ResNet32 (https://pytorch.org/hub/pytorch_vision_resnet/). Strength and fracture elongation data in images are extracted by using our in-house MATLAB code, IMageEXtractor28 (https://github.com/xuzpgroup/ZianZhang/tree/main/FatigueData-AM2022/IMEX).

Table data in XML/HTML files are collected in their original forms by using table extractor33 (https://github.com/olivettigroup/table_extractor) and saved as a JavaScript Object Notation (JSON)34 files, while table data in PDF files are extracted manually. Only the relevant data, screened using Generative Pre-Trained Transformers (GPT-3.535, https://platform.openai.com/docs/models/gpt-3.5-turbo), Generative Language Model (GLM-436, https://github.com/THUDM/GLM-4), and manual efforts, respectively, are retained in the product dataset21.

Text in the XML/HTML and PDF documents are extracted using TEXTract28 (https://github.com/xuzpgroup/ZianZhang/tree/main/FatigueData-AM2022/TEXTract) and PDFDataExtractor37 (https://github.com/cat-lemonade/PDFDataExtractor), respectively. For literature prior to 1991, only scanned image PDFs are available, which were converted into TXT files for text mining tasks using two large-scale pre-trained language models, GPT-3.535 and GLM-436, for comparative studies. The tasks are batch processed through their official APIs.

The detailed steps of text mining using GPT-3.5 are explained as follows. The section titles are filtered by the keywords (Table 2) to retrieve paragraphs related to the experimental methods and data. GPT-3.5 is used to extract the information of materials, processing and testing conditions, and mechanical properties from relevant paragraphs. The prompts in GPT include task descriptions, examples, and the text to be processed. The task description asks GPT to extract data from the text and return them in the JSON format. Each example is a text-completion pair to inform the content (the paragraphs) and format (JSON) of the data to be extracted. The text to be processed is placed at the end of the prompt. As limited by the maximum tokens (4,096) in GPT-3.5, we provide only two examples per request and extract data from only one paragraph at a time. The data extracted from each paragraph are manually matched into aligned material-processing-experiment entries, and associated with figure and table data.

Table 2 Keywords used for paragraph screening.

The same prompts are chosen in text mining using GLM-4. GLM-4 supports significantly more input and output tokens compared to GPT-3.5. This capability allows it to process the entire article text in a single session. The text and table data can be automatically aligned, simplifying the subsequent steps of dataset integration. However, the association with figure data must be performed manually.

This two-model approach demonstrates the generalizability of our methodology, which can be incorporated into automated platforms with further development. However, large language model (LLM) based text processing faces challenges such as incorrect or missing content, diverse data formats, and the inability to extract the composition-processing-testing-performance relationship. These issues cannot be resolved with rules in the current study. To mitigate the issues, we manually process the data through content proofreading and data formatting, a step that is time-intensive and requires specialized domain knowledge. In the future, prompt engineering38 and fine-tuning39 techniques in the scientific and engineering domains should be developed to enhance efficiency.

Dataset integration and data correction

In the construction of the product dataset21, the mechanical data extracted from text, figures and tables are paired with the type and composition of materials, processing and test conditions (e.g., temperatures, sample geometries, refrigeration techniques), and stored as distinct data entries. To ensure data quality that is essential for scientific and engineering uses, manual inspection and correction are conducted on the raw data generated by GPT-3.5 and GLM-4. A unified language for ‘low-T alloys’ (ULLTA) is proposed to standardize the data representation (Fig. 2). The product dataset21 is exported to a JSON file and proofread by comparing it to the PDF files of all source documents.

Fig. 2
figure 2

Dataset structures21. The dataset of low-T alloys is formatted into a hierarchical tree structure. The name of each tree node is highlighted in yellow. Keys are defined for easy access by scripts. Each node has its specific data type.

Data Records

The AlloyData-2024LT dataset is available as a JSON file at figshare (https://doi.org/10.6084/m9.figshare.2591226721), serving as a valuable resource for the understanding and advancement of low-T alloys. The JSON file is formatted into a hierarchical tree structure (Fig. 2). The tree node where the data value is stored is called the data entry. Data entries include string and numeric data types. Text data is stored as a string. The release years of publications, rating scores and mechanical properties are defined as numbers, and other numeric data such as processing parameters, ingot size, and experimental temperature are stored in the form of a numeric array. The tree nodes used to group data entries are called data structures. Numerous structures, encompassing articles and data records, are compiled into a structure array. To facilitate programming implementation and data acquisition, keys are defined for data entries, structures, and structure arrays (Fig. 2, Tables 3, 4, and 5).

Table 3 Evaluation metrics of automated data processing.
Table 4 Contents of the ‘metadata’ struct.
Table 5 Contents of the ‘datasets’ struct array.

The root node of the top-level data structures is AlloyData-2024LT, containing child nodes of articles and a default unit system (e.g., MPa for stress, K for temperature). Raw numeric data are converted to the default units of data entries. Articles are stored in a struct array. Each of them contains two structs, which are the metadata and scientific data. Metadata contains data entries such as the publication information of the articles. Scientific data stores a struct array of data records. Each data record corresponds to a specific condition of experimental tests. A scientific data record contains 4 sub-structs (‘Materials’, ‘Processing’, ‘Testing’, and ‘Mechanical properties’) as defined in Table 5. Incomplete data specifications in literature hinder data mining and reduce data credibility and reusability. A rating score, based on the weighted sum of non-empty entries, quantifies the completeness of data records in source documents27. The processing parameters are organized as the ‘proc_para’ sub-struct array, and the content of each sub-struct depends on the types of material processing. For surface treatments, the array ‘surf_para’ is defined in the same way as processing parameters.

The terminology of data types is largely inherited from MATLAB40 (https://www.mathworks.com/help/matlab/data-types.html). For the JSON file, the struct is defined as a dictionary, and all types of arrays are defined as lists. With the dataset structures outlined above, the data entries are explained here in more detail. The sub-struct array ‘mech_prop’ stores the mechanical properties of the alloy, including yield strength, ultimate tensile strength, elongation at fracture, and Charpy impact energy. ‘ingot_desc’, ‘spec_desc’, and ‘spec_shape’ describe the shape and cross-section of the ingot and specimen, respectively. ‘spec_size’ defines the specimen dimensions, including the longitudinal length, diameter for round specimens, outer and inner diameters for annular specimens, and width and thickness for rectangular cross-sections. The ingot dimensions are defined in the same way and stored in ‘ingot_size’. In the numeric arrays of other data entries, a single value stands for a specific value or the mean, and two values stand for the lower and upper bounds, respectively. For the convenience of comparison between the string data, a unified nomenclature is used for data entries such as types of processing, materials, and machines. In our dataset21, data entries not reported explicitly are recorded as empty lists and strings.

Technical Validation

For technical validation of the dataset quality, we take a two-step strategy combining automated processing and manual correction. Firstly, data in the source text and table are extracted using LLMs35,36, for comparative studies, and data in the figures were collected using our in-house tool28. The performance metrics of figure, text, and table processing show that the F1 scores of automated extraction are 55–92% (Table 3). GPT-3.5 achieves an F1 score of 89%, while GLM-4 achieves an F1 score of 91%, indicating that the LLMs are capable of understanding most of the text and can help reduce the human effort required to construct extraction rules or build NLP models. The F1 score for multi-panel image segmentation is 92%, but the precision of image auto-classification through code is low (39%), suggesting that the results of automatic image classification were unreliable. Therefore, the classification results are manually reviewed and corrected prior to image data extraction to ensure the reliability of the outcomes.

The low precision in image classification arises from the lack of standardized image formats for mechanical properties of low-T alloys. This variability in image content and features negatively impacts the performance of ResNet32, particularly on small datasets. The use of IMageEXtractor28 allows automated processing and assisted calibration of the coordinate axes and data points. To ensure full accuracy of the dataset for end uses in low-T alloy applications, the data records extracted from figures, tables, and texts are manually checked and corrected by two individuals in two rounds. The accuracy after the first round of verification reached 97%, and after the second round, 50 randomly selected documents are checked, which confirms that the accuracy of extraction is improved to be 100%. The manual checking and correction process requires substantial human effort, averaging  ~ 30 minutes per article.

Our dataset21 compiles literature data from WoS up to May 31, 2024 (Fig. 1b). Compared to materials handbooks6,41 and commercial databases, our dataset21 offers several key advantages. It is freely accessible, adheres to the FAIR principles24 for data sharing, contains comprehensive metadata and scientific data from the literature, often omitted in other sources. Notably, data in materials handbooks typically present only fitted curves rather than original data points6,41, restricting their use in engineering references and data-centric research. In contrast, our dataset21 includes original experimental data points along with processing and testing conditions, enabling better statistical analysis of the composition-processing-microstructure-performance relationship. Moreover, unlike published materials handbooks, our dataset21 includes the latest research developments, such as those involving multi-principal element alloys, and is dynamic and updatable using the codes and scripts openly released with this paper21. This comprehensive dataset21 captures the temporal evolution of research on alloys at low temperatures, associated cooling technologies, and temperature ranges of exploration, thereby offering macro-level guidance for researchers in setting research directions, optimizing resource allocation, and improving research and development efficiency. These capabilities further facilitate advanced material screening, design, discovery, and automated laboratory studies enabled by state-of-the-art artificial intelligence techniques25,26.