Introduction

Analyzing research trends is a critical first step in scientific activities as it enables researchers to identify existing research activities in a specific research field and trace the history of research. Through analyzing the research field, researchers can expand their background knowledge and identify challenges and opportunities within the field1,2. However, with over millions of scientific papers published annually3, accurately and efficiently analyzing research trends has become a challenging task. To address this issue, diverse approaches have been developed to structure research fields and analyze trends.

Traditionally, literature review has been conducted manually through the collection, interpretation, and organization of journal papers related to the research field. There are two approaches to conducting a literature review: narrative review (NR) and systematic review (SR). NR aims to identify and summarize previously published works while avoiding duplication and identifying new areas for study4. Researchers often prefer this method for writing review papers as it does not impose restrictions on the literature selection and review process. However, the lack of restrictions in NR can lead to reliability weaknesses, which led to the development of the SR method5. SR involves conducting a rigorous and organized review process with specific research objectives in mind. This method has been successfully employed to provide an integrated, synthesized overview of the historical development and future research directions in diverse fields, including marketing, medicine, and sociology, etc6. However, both approaches suffer from severe time costs and bias problems as researchers subjectively judge literature selection7. To overcome these limitations, bibliometrics which analyze research trends quantitatively based on bibliographic information has been developed.

Bibliometrics is a data-driven statistical method that analyzes the quantity and temporal changes of scientific publications. This method initially collects the information of publication indexes and citations from bibliographic databases such as Web of Science, Scopus, and Google Scholar. Researchers then use two approaches to analyze research trends: performance analysis and science mapping analysis. Performance analysis statistically evaluates the number of scientific activities such as investigating the research field, funding the researchers, or publishing the articles8,9. In contrast, science mapping analysis focuses on the topological relationships between scientific constituents10,11. Typically, both approaches are employed in bibliometrics to provide a comprehensive analysis of research trends over time and across scientific networks12,13. Bibliometrics is a cost-effective and quantitative method to identify high-impact research activities and structure the research field since it uses bibliographic databases from journals12. However, bibliometrics is usually weak in understanding and classifying research structures in specific fields because it mainly focuses on evaluating the importance of articles based on citations14. Co-word analysis can compensate for the weakness of bibliometrics by identifying subfields of research15,16, but the words identified are not specialized in that field as bibliographic databases provide just a few representative keywords. With a recent advance in machine learning (ML), however researchers are currently focusing on developing ML-based approaches to better interpret the research structures and trends.

The ML-based methods predict the research trends by interpreting the correlation between words from massive amounts of literature data. For example, Tshitoyan et al., identified the high correlations between materials and properties by embedding words in the abstracts from millions of materials scientific papers and have suggested new candidates for various applications based on the predicted relationship17. Krenn and Zeilinger developed the neural network model that can establish a linkage between un-correlated keywords by constructing a semantic network of human-verified keyword lists from 750,000 titles and abstracts relating to quantum physics. Using this model, they were able to predict keywords that might emerge in the field of quantum physics18. Besides, the ML-based methodology is being expanded to predict trends in various research fields19,20,21, or to assist other research trend analysis methods22,23. The ML-based methods provide unique insight and high-level predictions by interpreting the contextual texts. However, the ML-based methods have a problem of lack of generality across various research fields as the model can only be applied to specific research fields trained from text data. Recently, as an enormous amount of research papers are published every day, the need to develop a general method that can analyze complex interdisciplinary research and be widely applicable to various research fields is increasing.

In this study, we present a new method for keyword-based research trend analysis that can automatically classify research trends in a specific field using a systematic approach. Our method involves collecting the research articles and extracting the keywords using natural language processing (NLP)-based tokenizer. Then, the keyword network is constructed to structure the research field and filter out the representative keywords. We verified our method by applying it to resistive random-access memory (ReRAM) research, which is one of the candidates for the next-generational in-memory devices that encompass the interdisciplinary researches of materials science, electrical and electronic engineering, and computer sciences. Our method successfully categorizes keywords into research communities in the field of ReRAM, and the predicted annual trend aligns with the analysis presented in ReRAM review papers. Thus, we believe that our methodology is a cost-effective and quantitative approach for analyzing research trends and is widely applicable to various research fields.

Methods

Our method automatically structures a research field by following sequential processes: article collection, keyword extraction, and research structuring.

Fig. 1
Fig. 1
Full size image

(a) Article collection process from the bibliographic databases. (b) Keyword extraction process from the collected article titles. (c) Research structuring process using the extracted keywords.

Article collection is a process of searching journal articles about the research field and saving the bibliographic information about the articles. Figure 1a describes the process of article collection used in this study. First, we collect the bibliographic data of ReRAM-related articles by searching for the device name and the switching mechanism through application programming interfaces (APIs) of Crossref and Web of Science. Next, only papers are filtered out of the document types, excluding books and reports, and the published year is selected from 1971, the year when the term “memristor” was first proposed. Finally, we remove duplicates by comparing article titles, excluding unnecessary articles that contain stopwords. As a result, 12,025 ReRAM articles were collected, and the detailed keywords and constraints used in this study during article collection are described in the Supplementary Table S1 online.

As a next step, we perform keyword extraction, a process of selecting meaningful words from article titles. Figure 1b shows an example of the keyword extraction process. For the keyword extraction, we utilize the NLP pipeline “en_core_web_trf”, which is a RoBERTa-based pre-trained model implemented in spaCy24. First, we tokenize the title of an article into words using spaCy. Next, we use spaCy’s Lemmatization feature to convert the tokens to their base form, and spaCy’s Universal Part-of-Speech (UPOS) Tagging feature to consider only adjectives, nouns, pronouns, or verbs as keywords. As a result, 122,981 words and 6,763 keywords were extracted from the entire dataset of ReRAM articles, and we labeled the keywords with the article’s published year.

Lastly, the research structuring process is performed to classify research fields by building and modularizing a keyword network. Figure 1c shows an example of how we build and modularize the keyword network. We first construct all possible keyword pairs in each article title and count the frequency of all keyword pairs. Then, we repeat this process for all titles and build a keyword co-occurrence matrix, where the rows and columns are keywords and the elements are the frequencies of keyword pairs. Then, the graph analyzer Gephi is used to transform the matrix into a keyword network where nodes are keywords and edges represent the counted number of keyword pairs25. To further simplify the keyword network, we select 516 representative keywords that account for 80% of the total word frequency using the weighted PageRank scores of the nodes26. The network is then segmented using the Louvain modularity algorithm, taking edge weights and resolution constraints into account27. Figure 2 shows the result of modularization, dividing the keyword network into three communities. The details of constructing the keyword network are available in the Supplementary Fig. S1 and S2 online.

Results and discussion

Fig. 2
Fig. 2
Full size image

The three research communities of ReRAM field. The size is a PageRank score, which represents an importance of the keyword, and the color is a modularized community, which is automatically classified by the Louvain modularity algorithm.

To further structure the modularized network, we categorize the meaning of keywords based on terms in the ReRAM research field. For the study, we first select the top 20 keywords from each of the three communities (Fig. 2) and then merge synonyms. ReRAM (RRAM), Resistive (Resistance), Switching (Switch), and Memristor (Memristive) are merged synonyms in the keyword list. Although the keywords Filament and Bridge are different by definition, they are merged into a single keyword, Filament (Bridge), because they have a same meaning in ReRAM research. Next, we combine some keywords with similar yearly keyword frequency trends. For example, the keywords neuromorphic and computing have the same annual trend, so we combine them as neuromorphic computing. Resistive (Resistance) switching (switch), conductive filament (bridge), random access, metal oxide, hybrid perovskite, and neural network are combined in the same way.

Table 1 PSPP + M classification of the top 20 keywords in the three research communities.

Next, we classified the combined keywords into processing-structure-properties-performance (PSPP) categories, which are general four components in materials science28,29. In addition to PSPP, we further categorize Material (M) to distinguish between material studies with different chemical compositions or names. Some keywords are meaningless or too broad to determine their category, so we assign them to the Stopwords category. Finally, we determine the main research focus of the three communities by examining the distribution of the keywords in PSPP + M categories. The keyword classification results for the three communities are listed in Table 1.

The first yellow community consists of keywords for ReRAM performance derived from various structures of traditional materials. The keywords, Pt, HfO2, TiO2 and ZnO, in the Materials category represent the traditional oxides that are commonly used for the ReRAM device. The keywords, Thin film, Layer, Structure and Electrode, in the Structure category represent the structural variation of those materials. The keywords, Resistive (Resistance) switching (switch), Bipolar and Oxygen, in the Performance category represent the components related to switching characteristics of ReRAM device. Therefore, we conclude that the yellow community is focusing on improving ReRAM performance by modifying the structure of existing materials commonly used in ReRAM devices. Thus, we name this community as Structure-induced performance (SIP). The second green community comprises keywords about the ReRAM performance induced by materials change. The keywords, Flexible in the Properties category and Conductive filament (bridge), Random access, Nonvolatile and Volatile in Performance category, represent the different characteristics of ReRAM devices for diverse application areas. The keywords, Graphene, Organic and Hybrid perovskite, in the Materials category represent the materials that are different from the traditional metal oxide. So, the green community is focusing on extending the application areas of ReRAM devices by using different types of materials. We name this community as Material-induced performance (MIP). The third purple community consists of keywords about components of the ReRAM circuit and neuromorphic computation. All keywords relate to the performance of ReRAM devices for neuromorphic applications, so we categorize them into the Performance category. We name this community as Neuromorphic application (NA). Further detailed descriptions on the meaning of each keyword and the rationale for classification are provided in Supplementary Table S2 online.

Fig. 3
Fig. 3
Full size image

The yearly trend of keyword frequencies in the three research communities as a cumulative sum. The gray bridges on the top are year ranges of research stages, and the gray dot lines are boundaries between research stages. The colored area is the yearly trend of keyword frequencies about Neuromorphic Application (NA), Material-induced Performance (MIP), and Structure-induced Performance (SIP) communities.

Temporal analysis of keywords allows us to predict how future research will proceed by understanding how the research has progressed so far in the field of ReRAM research. Therefore, we examine ReRAM research trends by analyzing the yearly variation of research keywords. Figure shows the yearly trend of keyword frequencies in three communities as a cumulative sum. The total frequencies represent the overall trend of the ReRAM research field and show a half-bell-shaped curve. Assuming that keyword frequencies correspond to academic demands, these yearly trends can be interpreted as patterns in the product lifecycle (PLC) model30. Thus, we divide ReRAM research into four stages: Development, Introduction, Growth, and Maturity, and identify that ReRAM research field is currently in the Maturity stage.

During the Development stage (1971–2005), ReRAM is little studied because the resistive switching phenomenon is academically examined. ReRAM research then increases rapidly in the Introduction stage with the expectation that it will substitute 2-D NAND Flash devices, and the expectation remains upward even in the Growth stage. However, as 3-D NAND technology eventually dominates the storage market, ReRAM research turns to storage class memory (SCM) for intermediate between DRAM and NAND and computing in memory (CiM) for machine learning. As a result, the upward trend converges in the maturity stage. Several reviews of ReRAM research are also in good agreement with this division of research stages (see Supplementary Table S3 online)31,32,37,40.

Fig. 4
Fig. 4
Full size image

(a) The percentage change of keyword frequencies among the three research communities. Each year’s total is the sum of keyword frequencies in all three communities. The percentage change of keyword frequencies from the Introduction stage to the Maturity stage in (b) SIP, (c) MIP, and (d) NA communities.

Although total keyword frequencies are well matched with the PLC model, the variation trends of keyword frequencies in the three communities are somewhat different. Thus, we further look at the yearly trends of keyword frequencies in three different communities. In this regard, the number of articles in the Development stage is quite limited, so we focused on analyzing the yearly trends from 2006 to 2021.

Figure 4a shows the yearly trends in keyword frequency percentages among the three communities. The SIP community is the most considerable portion in the Introduction stage but gradually declines over the years. The MIP community consistently maintains around 25% of the annual keyword proportion. On the other hand, the NA community is the least at the Introduction stage but gradually increases over the years. The gradual increase of the NA community leads to exceeding the annual keyword population of the MIP community in 2013 and that of the SIP community in 2019. Thus, we can expect that research belonging to the NA community will continuously grow in the future among the three different communities.

To understand why the three communities have different yearly trends, we further look at how the percentage of keywords varies in each community. Figure 4b, 4c, and 4d show the change in keyword frequency from the Introduction stage to the Maturity stage in SIP, MIP, and NA communities. Keywords in the left (right) column represent the keywords whose frequency decreased (increased) as the research stage was changed.

For the SIP community, the keywords, Resistive switching and Thin film have greatly decreased in frequency. (Fig. 4b) Other keywords related to the conventional device structure are also reduced. On the other hand, no keywords have significantly increased in frequency in the SIP community. This represents that the researches revealing the effect of device structure on resistive switching are mainly conducted in the Introduction stage (see Supplementary Table S3 online)31,32,37,40 As a result, the SIP community shows a gradual decrease in frequency over the years.

For the MIP community, the keywords, Metal oxide and Nonvolatile have significantly decreased in frequency. (Fig. 4c) On the other hand, keywords such as Device, Graphene, Organic and Hybrid perovskite appeared and the frequency increased greatly. This change is the result of a shift in the trend of ReRAM research from applying conventional metal oxide as active layers to finding superior materials for active layers (see Supplementary Table S3 online)33,34,38,39,40 Consequently, the research portion of the MIP community remains constant as new materials for the active layer are developed.

For the NA community, the keywords, Neuromorphic computing and Neural network have significantly increased in frequency. (Fig. 4d) The growth of the NA community is correlated to the recent increase in interest for deep learning systems. As the ReRAM is considered as a core element for neuromorphic computing, many researchers are focusing their research on optimizing the performance of ReRAM for neural network tasks (see Supplementary Table S3 online)33,35,36,39,40 As a result, the research portion of the NA community is growing continuously and significant portion of ReRAM research is related to the keywords in the NA community.

Fig. 5
Fig. 5
Full size image

The percentage change of top 5 journals in the Introduction stage and the Maturity stage.

Finally, we list the top 5 journals with the most publications in the Introduction stage and the Maturity stage (Fig. 5). In the Introduction stage, the top 5 journals account for 36% of publications, and Appl. Phys. Lett. is the most published journal, with nearly 20% of total publications. This represents that ReRAM research in the Introduction stage is focused on elucidating physics related to ReRAM operation. However, in the Maturity stage, ReRAM research is so diversified that no particular journal accounts for a significant portion of total publications. Instead, each journal shows a similar percentage of publications, so the top 5 journals account for only 11% of total publications. Additionally, the proportion of device-related journals increases relatively in the Maturity stage. This represents that research is more focused at the device level with the uptrend of the NA community. Thus, analysis of changes in publication trends allows researchers to predict changes in research trends and emerging research fields, as well as popular journals.

Conclusion

In this study, we developed a keyword-based research trend analysis method that can systematically classify specific research fields using bibliographic data. Our method automatically collects articles using an API and then extracts keywords from journal titles using a RoBERTa-based NLP pipeline. Then, the keywords are classified into different communities by using a keyword co-occurrence network. We employed this method to validate its reliability and to understand changes in ReRAM research trends. Our study shows that ReRAM research can be structured into three research communities based on PSPP + M classification: SIP, MIP, and NA. Further analysis of the yearly variation in research keywords shows that ReRAM research is in the Maturity stage, and the NA community is on the uptrend due to growing interest in using ReRAM for neural network systems. Our results are well-aligned with the recent review papers on ReRAM research trends31,32,33,34,35,36,37,38,39,40.

It is evident that there are various tools that can provide information on research fields by analyzing published journal records. However, most of the existing tools cannot offer detailed analyses of the research field by analyzing the variation of research trends and predicting future research directions through classification among different journal articles. Our method can efficiently provide changes in detailed research trends and predict future research directions by automatically classifying and categorizing various research keywords collected from journal articles. We believe that this method can be applied to various research fields and provide valuable insights into the current state of research with future research directions.