Fig. 5: A schematic showing the application of skip-gram variation of Word2vec for predicting context words. | npj Computational Materials

Fig. 5: A schematic showing the application of skip-gram variation of Word2vec for predicting context words.

From: Recent advances and applications of deep learning methods in materials science

Fig. 5: A schematic showing the application of skip-gram variation of Word2vec for predicting context words.

a Network for training word embeddings for natural language processing application. A one-hot encoded vector at left represents each distinct word in the corpus; the role of a hidden layer is to predict the probability of neighboring words in the corpus. This network structure trains a relatively small hidden layer of 100–200 neurons to contain information on the context of words in the entire corpus, with the result that similar words end up with similar hidden layer weights (word embeddings). Such word embeddings can transform wordsin text form into numerical vectors that may be useful for a variety of applications. b projection of word embeddings for various materials science words, as trained on a corpus scientific abstracts, into two dimensions using principle components analysis. Without any explicit training, the word embeddings naturally preserve relationships between chemical formulas, their common oxides, and their ground state structures. [Reprinted according to the terms of the CC-BY license ref. 259].

Back to article page