Fig. 2: Conceptual comparison of spectral embedding generation strategies.

A Schematic representation of the Spec2Vec method. A Word2vec model is trained using relationships derived from a mass spectral (MS) library to learn the spectral embeddings. B Schematic representation of the proposed LLM4MS approach. A large language model, pre-trained on extensive and diverse knowledge domains, undergoes fine-tuning using the MS library. This process leverages the LLM's embedded knowledge to generate chemically informed spectral embeddings for improved mass spectral matching.