A universal language for finding mass spectrometry data patterns

Damiani, Tito; Jarmusch, Alan K.; Aron, Allegra T.; Petras, Daniel; Phelan, Vanessa V.; Zhao, Haoqi Nina; Bittremieux, Wout; Acharya, Deepa D.; Ahmed, Mohammed M. A.; Bauermeister, Anelize; Bertin, Matthew J.; Boudreau, Paul D.; Borges, Ricardo M.; Bowen, Benjamin P.; Brown, Christopher J.; Chagas, Fernanda O.; Clevenger, Kenneth D.; Correia, Mario S. P.; Crandall, William J.; Crüsemann, Max; Fahy, Eoin; Fiehn, Oliver; Garg, Neha; Gerwick, William H.; Gilbert, Jeffrey R.; Globisch, Daniel; Gomes, Paulo Wender P.; Heuckeroth, Steffen; James, C. Andrew; Jarmusch, Scott A.; Kakhkhorov, Sarvar A.; Kang, Kyo Bin; Kessler, Nikolas; Kersten, Roland D.; Kim, Hyunwoo; Kirk, Riley D.; Kohlbacher, Oliver; Kontou, Eftychia E.; Liu, Ken; Lizama-Chamu, Itzel; Luu, Gordon T.; Luzzatto Knaan, Tal; Mannochio-Russo, Helena; Marty, Michael T.; Matsuzawa, Yuki; McAvoy, Andrew C.; McCall, Laura-Isobel; Mohamed, Osama G.; Nahor, Omri; Neuweger, Heiko; Niedermeyer, Timo H. J.; Nishida, Kozo; Northen, Trent R.; Overdahl, Kirsten E.; Rainer, Johannes; Reher, Raphael; Rodriguez, Elys; Sachsenberg, Timo T.; Sanchez, Laura M.; Schmid, Robin; Stevens, Cole; Subramaniam, Shankar; Tian, Zhenyu; Tripathi, Ashootosh; Tsugawa, Hiroshi; van der Hooft, Justin J. J.; Vicini, Andrea; Walter, Axel; Weber, Tilmann; Xiong, Quanbo; Xu, Tao; Pluskal, Tomáš; Dorrestein, Pieter C.; Wang, Mingxun

doi:10.1038/s41592-025-02660-z

Download PDF

Article
Open access
Published: 12 May 2025

A universal language for finding mass spectrometry data patterns

Nature Methods volume 22, pages 1247–1254 (2025)Cite this article

12k Accesses
16 Citations
125 Altmetric
Metrics details

Subjects

An Author Correction to this article was published on 08 August 2025

This article has been updated

Abstract

Despite being information rich, the vast majority of untargeted mass spectrometry data are underutilized; most analytes are not used for downstream interpretation or reanalysis after publication. The inability to dive into these rich raw mass spectrometry datasets is due to the limited flexibility and scalability of existing software tools. Here we introduce a new language, the Mass Spectrometry Query Language (MassQL), and an accompanying software ecosystem that addresses these issues by enabling the community to directly query mass spectrometry data with an expressive set of user-defined mass spectrometry patterns. Illustrated by real-world examples, MassQL provides a data-driven definition of chemical diversity by enabling the reanalysis of all public untargeted metabolomics data, empowering scientists across many disciplines to make new discoveries. MassQL has been widely implemented in multiple open-source and commercial mass spectrometry analysis tools, which enhances the ability, interoperability and reproducibility of mining of mass spectrometry data for the research community.

MS2Query: reliable and scalable MS² mass spectra-based analogue search

Article Open access 29 March 2023

MSFC: a new feature construction method for accurate diagnosis of mass spectrometry data

Article Open access 21 September 2023

Trackable and scalable LC-MS metabolomics data processing using asari

Article Open access 11 July 2023

Main

Innovation in mass spectrometry (MS) has enabled tremendous progress in life sciences, and advances in MS instrumentation have led to the widespread adoption of omics disciplines (for example, metabolomics, lipidomics and proteomics). Despite the broad application of MS to characterize proteins, peptides, polymers, small molecules and nucleic acids across research disciplines, the ability for scientists to flexibly search for known chemical classes within and across MS datasets remains a challenge. The interrogation of MS data for the presence of specific chemicals or classes of molecules utilizes patterns in MS peaks representing intact analytes¹ isotopic signatures or characteristic mass differences (MS1), associated fragmentation patterns in tandem MS data (MS/MS), chromatographic retention time, collisional cross-section or combinations thereof. This search for specific patterns is usually performed through a slow and error-prone manual inspection of the data. Alternatively, specialized software tools have been developed for this purpose but are often limited to search for a specific compound¹ or a limited set of class-specific MS patterns². Although bespoke one-off scripts provide the necessary flexibility to search for specific MS data patterns³, most noncomputational researchers and laboratories lack the computational skills to develop or customize them^4,5. This skill gap limits biologists and chemists from effectively searching across MS datasets, potentially leaving many biologically important molecules hidden and undiscovered in the data. To address this gap, here we introduce the Mass Spectrometry Query Language (MassQL), an open-source language for flexible and mass spectrometer manufacturer-independent searching. MassQL aims to enable noncomputational researchers to easily search their MS (across MS1 and MS/MS) data for patterns of interest without the need for programming skills or a dedicated computational collaborator. In this Article, we describe the MassQL language, showcase its accompanying computational ecosystem to increase accessibility and highlight two application examples that demonstrate how MassQL can be used on entire public repositories (such as Global Natural Products Social Molecular Networking (GNPS)/MassIVE⁶, Metabolomics Workbench⁷ and MetaboLights⁸).

Results

The versatility of MS to capture unique characteristics of chemical structures, such as isotopic patterns (for example, bromination), diagnostic fragmentation (for example, product ion of sulfur trioxide) and neutral losses (for example, loss of sugar moieties), makes it a powerful analytical tool but also presents challenges to effectively interpret and utilize the richness of the data. Specifically, the ability to simultaneously utilize some or all of these different dimensions of MS data in an integrated fashion is currently out of reach. To complement the versatility of MS data, the MassQL language implements a succinct and expressive grammar to search for chemically and biologically relevant molecules in the MS data by leveraging these patterns (Fig. 1a,b). The MassQL language enables searching for patterns in MS1 data (for example, isotopic patterns and adduct mass shift) and MS/MS fragmentation spectra (for example, presence/absence of fragments and neutral losses), as well as applying chromatographic and ion mobility constraints. In addition, MassQL provides language support for user-defined tolerances, such as ion intensity and mass accuracy (Fig. 1b). Moreover, each of these query elements can be combined with Boolean operators (for example, AND, OR and NOT) to form more complex queries. These properties and patterns are common to nearly all MS data types, thus making MassQL agnostic to the instrument vendor, mass detector (for example, Orbitrap and quadrupole time-of-flight), ionization source (for example, electrospray ionization and matrix assisted laser desorption/ionization) and separation method (for example, liquid chromatography, gas chromatography and ion mobility). Together, the MassQL language provides users the flexibility and expressiveness to query simple and complex MS patterns regardless of their computational expertise, thus lowering the barriers of entry to MS data interrogation. Finally, as a language, new MassQL terms can be defined, which enables grammar and syntax evolution to maintain compatibility of queries to advancing MS technologies.

**Fig. 1: Schematic representation of the MassQL ecosystem.**

The MassQL computational ecosystem is made of several key components that ensure its extensibility and usability for the community. These components are a formal definition of the MassQL grammar, a MassQL language parser, a reference implementation of the MassQL query engine, interactive web interfaces to enhance accessibility for noncomputational users and a NextFlow computational workflow for parallelized querying of very large datasets (Methods). Together, this enables MassQL searches on a single MS data file, within and across whole MS datasets, up to entire data repositories, including GNPS/MassIVE⁶, Metabolomics Workbench⁷ and MetaboLights⁸.

The MassQL formal grammar builds upon common MS terminology (Methods), which makes MassQL queries easy to write and alter for scientists with basic familiarity with MS. To further help new users, we have created extensive documentation (https://mwang87.github.io/MassQueryLanguage_Documentation/), instructional videos (https://www.youtube.com/playlist?list=PLkDps_-pcYZ5D3rhas208dsMg66lCGmcs) and an interactive MassQL sandbox (https://massql.gnps2.org/). The sandbox enables users to interactively develop and test MassQL queries on demonstration data to check for queries’ correctness before applying to their own data. Moreover, the sandbox automatically translates each query into English, Portuguese, Spanish, German, French, Mandarin Chinese, Japanese, Korean and Russian, which helps the interpretability of queries in manuscripts and grants. In addition, we have developed and deployed a large-language-model-powered conversational assistant (https://massql-analysis.gnps2.org/MassQL_Chatbot) to support new users with real-time MassQL query writing and troubleshooting. Finally, as a community effort, we created a wiki-like compendium of 35 applications of MassQL (https://massql.gnps2.org/compendium/; Fig. 1d) that serves as a reference and inspiration for new users. Queries in the compendium can be (re)used as they are, or as a starting point to develop queries tailored to the chemicals or compound classes of interest. The compendium is regularly updated with new examples of MassQL queries successfully used by community members in published works and will function as an ‘app store’ for a centralized MassQL query deposition and sharing.

While MassQL was originally implemented within the GNPS environment^6,9, this was limiting for the accessibility to a broader audience. To further enhance usability, we have engaged with the wider metabolomics software community and, through these efforts, the MassQL language has been adopted and natively supported in a variety of MS data analysis software and infrastructure (Fig. 1c), both open-source (MZmine¹⁰, pyOpenMS¹¹, MS-DIAL¹², UniDec¹³ and Metabolomics Workbench⁷) and commercial (Bruker’s MetaboScape). It must be noted that these software tools use the same MassQL language grammar but implement their own MassQL query engine backend. This provides the possibility to develop optimized query engines for improved query performance, while maintaining query semantic consistency across tools. Finally, to facilitate the integration of MassQL into other platforms and pipelines, MassQL is available as Python and R (ref. ¹⁴) libraries and as a web application programming interface (API).

Discovery of siderophores at repository scale

Iron-binding small molecules play essential roles across biology, including microbial or mammalian siderophores that facilitate iron homeostasis^15,16. We recently developed a native metabolomics method that includes infusion of iron to identify iron-binding compounds from complex samples based on the identification of retention time and peak shape correlations along with an Fe³⁺-characteristic m/z delta of 52.91 (ref. ¹⁷). As a complementary strategy to our native metabolomics method, which requires hardware modification in instrument setup and data acquisition, we mined all existing public metabolomics data, encompassing over 230 million analytes, to discover putative iron-binding compounds using MassQL. Although iron is often stripped from iron-binding compounds in liquid chromatography, we hypothesized that some iron-binding compounds will remain bound to iron at detectable levels in the MS data.

To develop the MassQL query, we used an MS dataset collected from Eutypa lata supernatant extracts that were treated with a post-liquid chromatography iron addition, as described by Aron et al.¹⁷ (Supplementary Notes 3.3.1 and 3.3.2). The refined MassQL query searches MS1 spectra for the characteristic isotopic pattern of iron along with a distinctive iron-binding mass shift (Fig. 2a). Specifically, the MassQL query searches for MS1 precursor ions with an m/z of x, m/z of x − 1.993 at 0.063% intensity of x (the stable isotope ratio of ⁵⁴Fe to ⁵⁶Fe), and m/z of x + 1.0034 (the ¹³C peak), in addition to a proton-bound adduct (apo) peak at m/z of x − 52.91. Combined MassQL queries for apo and bound MS2 spectra identified seven out of the eight putative siderophores identified using ion-identity molecular networking (IIMN)¹⁸ in the published analysis of the post-liquid chromatography iron addition of E. lata extracts. The unique compound that was found using IIMN but not by the MassQL query was missed because the ⁵⁴Fe peak intensity fell outside of the expected intensity tolerance of 25%, which is probably due to the low intensity of this peak. We used strict m/z (10 ppm) and expected intensity percentage (25%) tolerances to minimize false-positive retrieval. Using MassQL, an additional four molecules were found that were not found using IIMN (Supplementary Fig. 3.3.2-1); these molecules are probably iron-binding, as manual inspection revealed that they exhibit the expected iron-bound isotopic pattern. The published IIMN analysis may have missed these molecules owing to requirements for peak shape and retention time correlations or owing to low-intensity peaks falling below feature finding thresholds.

**Fig. 2: Insights from siderophores at a repository scale.**

After validation of the siderophore query on the E. lata dataset, we extended the search for iron-binding compounds to all public high-resolution Thermo Fisher Q Exactive data available in the GNPS/MassIVE repository⁶ (Supplementary Note 3.3.3). In searching over 230 million MS/MS spectra in 97,109 public data files, we retrieved 26,944 MS/MS spectra associated with the iron-characteristic isotope pattern in their MS1 data. We used MS-Cluster¹⁹ on the retrieved MS/MS spectra to collapse redundant observations of the candidate iron-binding molecules. This resulted in 7,504 consensus MS/MS spectra. Using these consensus spectra, we created a molecular network in GNPS. We could putatively identify 441 (5%) of the consensus spectra by spectral library search against the public MS/MS libraries in GNPS⁶. The putatively identified known compounds were further filtered to remove duplicates and adducts of ethylenediaminetetraacetic acid (a common anticoagulant). After filtering, 52% of annotated spectra are known iron binders, while an additional 25% are lipids, bile acids, polyphenols and peptide/amino acids—classes that have been shown to bind iron and other divalent metals^20,21,22,23. Notably, because such a large fraction (>95%) of the analytes in the molecular network could not be annotated to known molecules (Fig. 2d), this molecular network is probably a rich resource for the discovery of new siderophores.

OPEs in the environments

Organophosphate esters (OPEs) are widely used flame retardants and plasticizers and are ubiquitously detected in environmental samples^24,25,26. Recent studies in environmental chemistry have reported several novel OPEs by searching for a conserved, unique fragmentation pattern (O = P(OR)₃) in MS/MS data^27,28,29. The characteristic fragment, frequently leveraged in literature, is the phosphate product ion (H₄O₄P⁺, m/z 98.9842). The traditional low-throughput methodologies, which includes manual analysis and targeted lists of known OPEs, limits the ability to discover novel potentially harmful chemicals. Here, we demonstrate how MassQL can enable high-throughput screening and discovery of novel OPE without a predefined suspect list.

Utilizing the characteristic phosphate product ion, we formulated a MassQL query to search for an MS/MS peak at m/z 98.9847 with 50 ppm mass error tolerance and a peak intensity >50% of the base peak (Supplementary Notes 3.2.3 and 3.2.4). Similarly to siderophores example, a MassQL query was developed on a test marine water dataset to verify the utility of the MassQL for identifying putative OPEs in complex samples (Supplementary Note 3.2.3). In this test dataset, where three OPEs were previously identified by manual analysis³⁰, MassQL returned 589 MS/MS spectra belonging to ~60 unique molecular features. MS/MS library search against the GNPS spectral library of the MassQL retrieved MS/MS spectra putatively annotated four OPE molecules, including all three previously identified OPEs, and one new putatively identified OPE. We additionally putatively identified two non-OPE molecules. The first molecule contained a phosphate group that resulted in the characteristic phosphate fragment in the MassQL query. The second putative non-OPE molecule did not contain a phosphate group, but further investigation revealed a potentially false-positive library match due to large mass errors in the library MS/MS peaks. These results suggested that the MassQL query was useful in retrieving OPEs but may also capture a broader range of phosphate-containing compounds. For this reason, when scaling up to repository searches (see below), we leverage the ability of molecular networking to group together similar structures and segregate OPEs from phosphate-containing compounds more generally.

To identify OPEs in public data, we scaled the MassQL query to all Q Exactive data in the GNPS/MassIVE⁶ data repository (which included >230 million MS/MS spectra). The MassQL query found 338,439 MS/MS matching the query criteria. Only 15% (51,310) of the MS/MS found by MassQL could be explained (precursor m/z match with 20 ppm mass error) by known OPEs based on a comprehensive OPEs list (n = 95) compiled by Ye et al.²⁹. We extracted all MS/MS spectra and created consensus MS/MS spectra using Falcon-MS³⁰, resulting in 2,777 consensus spectra. We used these consensus spectra to create a molecular network. Combining the library annotation results, we propose one additional OPE that was not included in the GNPS library and the comprehensive OPE list by Ye et al.²⁹ (Fig. 3). It is important to reemphasize that, in the search for OPEs, the MassQL query was not designed to specifically look for OPEs but phosphate-containing molecules more generally. The molecular networking strategy complemented the MassQL results to organize OPE molecules in their own families. This combination of MassQL and molecular networking was critical because MassQL greatly reduced the data size (from 230 million to ~338,000 MS/MS, making molecular networking computationally tractable), while molecular networking helped to focus attention on families of specifically organophosphate molecules.

**Fig. 3: Discovering OPEs at a repository scale.**

False discovery rate estimation and query validation strategies

Overall, the key challenge when using MassQL is to define queries that are sensitive toward the target compound(s) (that is, effectively retrieve the desired spectra) but do not retrieve too many false-positive hits. Due to the flexibility and broad applicability of MassQL, a universal method for false discovery rate (FDR) estimation is difficult to establish. Rather, tailored strategies for query validation can be designed case by case by the user, depending on the research context.

As demonstrated in a recent publication that used MassQL to mine liquid chromatography–MS/MS raw data in the public domain and discover new, unreported bile acids³¹, one possible strategy to estimate the FDR of MassQL queries over MS/MS spectra is to putatively identify the MS/MS spectra retrieved by MassQL using reference MS/MS spectral libraries. In their study, Mohanty et al. first used the GNPS spectral libraries (which contained 4,533 reference spectra of bile acids) to design and refine MassQL queries for bile acid spectra and estimate the queries’ selectivity. Such selectivity was measured by counting the number of retrieved bile acids and the number of retrieved non-bile acids (false positives). Thereafter, when using the refined MassQL queries to search repository data, more than 594,000 putative bile acids MS/MS spectra were retrieved. Among these, 270,437 MS/MS spectra were putatively identified by MS/MS library search, with 726 MS/MS matching to non-bile acids (0.27%). It is important to highlight that this FDR estimation approach is limited to the compounds that are deposited in the reference MS/MS libraries.

It should be noted that the same validation approach cannot be universally applied, for example, to the repository-scale discovery of siderophores described in the present manuscript (‘Discovery of siderophores at repository scale’). Siderophore molecules do not belong to a single compound class and can exhibit very diverse chemical structures. Therefore, queries for diagnostic MS/MS fragment ions cannot be used, and the search was instead performed on the MS1 level (‘Discovery of siderophores at repository scale’ section). The adopted strategy was to first develop and refine the query on a reference dataset of E. lata extracts known to contain iron-binding molecules¹⁷. After satisfactory results were obtained on the reference dataset (Supplementary Note 3.3.2), the query was performed on a repository scale. In this specific example, we used a strategy based on a ‘decoy’ query to get a sense for potential false discoveries (Supplementary Note 2), estimated at 21.7% for this repository scale query. Although a relatively large number of false-positive hits may be expected when performing searches at a repository scale, these false positives can be mitigated by utilizing additional computational tools in a downstream analysis (for example, molecular networking and spectral library search) to validate the query results. In these cases, MassQL acts more as a prefilter to reduce the data to a more tractable size (from hundreds of millions of spectra to a few thousands) and ‘enrich’ them with putative leads for further investigation and confirmatory experiments (‘Discovery of siderophores at repository scale’ and ‘OPEs in the environments’ sections). Overall, we encourage users to critically inspect query results and develop tailored validation strategies that are fit for their research purposes.

Discussion

Here, we introduced MassQL, a platform- and manufacturer-independent query language to search for MS data patterns within and across MS datasets. The main goal of MassQL is to provide noncomputational scientists with the flexibility to encode complex MS patterns into concise and expressive queries, reducing the need to write bespoke programming scripts. Together with the accompanying software infrastructure created through community efforts, the MassQL ecosystem lowers the barrier of entry to MS data interrogation for chemists and biologists lacking programming skills. This goal is being realized by the adoption of MassQL by the community. Of note, MassQL has already been used by researchers for MS data mining³², microbiome research³¹, exposomic and biomonitoring³³, infectious disease research³⁴ and natural product discovery^{35,36,37,38,39,40,41,42}.

While traditional MS/MS similarity/search tools are a powerful technique for matching related or similar compounds within libraries or repositories (for example, spectral library search⁴³ and MASST⁴⁴), MassQL provides a complementary set of capabilities with enhanced flexibility and precision. Specifically, traditional MS/MS search tools rely on a full MS/MS similarity measure to retrieve a match, whereas MassQL queries are a set of user-defined constraints to retrieve a spectrum. This provides the flexibility to search for more specific and complex patterns (for example, combine MS1 and MS/MS patterns, retention and drift time constraints; Supplementary Notes 3.3 and 3.4) and empowers scientists to leverage their domain knowledge of the chemical or compound class under investigation. A specific example where MassQL can complement MS/MS similarity is when small structural modifications can result in large changes in the overall fragmentation patterns. This situation can cause relevant analog molecules to evade discovery by MS/MS similarity-based search. MassQL has been shown to complement MS/MS comparison strategies by enabling the searching for conserved key fragments or neutral losses in the MS/MS spectrum without requiring a full MS/MS similarity match (for example, ref. ⁴²).

While MassQL-based querying of small and large datasets can be an effective way to prioritize data, the utility of this querying paradigm can be enhanced when paired with complementary analysis tools (before or after MassQL). First, in this Article, we showcased how discovery can be enhanced by combining upstream MassQL searches with downstream molecular networking and spectral library analysis (see ‘Discovery of siderophores at repository scale’ and ‘OPEs in the environments’ sections in the Results). The use of MassQL as a prefiltering tool was essential in making the analysis possible, both from a computational tractability and data interpretability/prioritization perspective. Second, MassQL has been shown in the literature as a downstream tool to enhance the analysis of molecular networking—specifically, to aid in the prioritization of relevant compounds and, as highlighted above, to overcome shortcomings in MS/MS alignment^{35,37,38,39,41,42}. We do acknowledge limitations with MassQL as available today—specifically, MassQL has limited capabilities to leverage more than a handful of MS spectra, for example, consecutive MS spectra arising from the elution of chromatographic peaks that can be grouped as a chromatographic feature.

We envision that the MassQL computational ecosystem will grow in adoption and capability. We recognize areas for improvement within MassQL, such as enhancing the query performance of MassQL and expanding the expressiveness of the language. We have designed and scaffolded the MassQL ecosystem to be extensible in both of these respects. We have defined a format context-free grammar for MassQL that is separate from any query engine that implements the MassQL semantics (Methods). Even though the reference implementation of the MassQL query engine is not explicitly optimized for speed (Supplementary Note 1), this architecture enables the wider community to develop new query algorithms to improve search speed, while using the same formal grammar. This paradigm has already been demonstrated in the popular software tools that implemented their own query engine: MS-DIAL¹², Mzmine¹⁰ and Bruker’s MetaboScape. Finally, MassQL derives its strength from a vibrant user open-source community, and we expect this community-guided evolution of the language to continue in the future. As both the language and software ecosystem evolve, MassQL will become more capable and versatile to meet the scientific community’s growing needs in mining MS data.

Methods

MassQL language description

MassQL condition description

The following table gives a short summary of the most common ways that we can set a condition to find the data we want. Each of the following conditions may have more than one way to qualify each condition, to modulate the tolerance, intensity and so on to help improve the specificity a user desires. Further details can be found in the official documentation at https://mwang87.github.io/MassQueryLanguage_Documentation/.

Data type	MassQL syntax	Example
MS1 peak m/z	MS1MZ=<m/z value>	MS1MZ=163.1
MS2 precursor m/z	MS2PREC=<m/z>	MS2PREC=488.1
MS2 precursor charge	CHARGE=<Value>	CHARGE=2
Fragmentation product ion m/z	MS2PROD=<m/z>	MS2PROD=163.1
Ionization polarity	POLARITY=<Value>	POLARITY=Positive
Retention time (minimum)	RTMIN=<Value in Minutes>	RTMIN=5
Retention time (maximum)	RTMAX=<Value in Minutes>	RTMAX=10
Scan number (minimum)	SCANMIN=<Value>	SCANMIN=5
Scan number (maximum	SCANMAX=<Value>	SCANMAX=5
Ion mobility	MOBILITY=range (min=<min>, max=<max>)	MOBILITY=range (min=1, max=2)

MassQL qualifier description

The following gives a short summary of the most common qualifiers for conditions that users can specify in MassQL. Further details can be found in the official documentation at https://mwang87.github.io/MassQueryLanguage_Documentation/.

Qualifier type	Condition	Example
m/z tolerance	MS2PROD, MS1MZ, MS2PREC	MS2PROD=163.1:TOLERANCEMZ=0.1
m/z ppm tolerance	MS2PROD, MS1MZ, MS2PREC	MS2PROD=163.1:TOLERANCEPPM=50
Peak intensity minimum	MS2PROD, MS1MZ	MS2PROD=163.1:INTENSITYVALUE=1000
Peak intensity minimum percent of base peak	MS2PROD, MS1MZ	MS2PROD=163.1:INTENSITYPERCENT=10
Peak intensity minimum percent of TIC	MS2PROD, MS1MZ	MS2PROD=163.1:INTENSITYTICPERCENT=10
Mass defect of peak	MS2PROD, MS1MZ, MS2PREC	MS2PROD=ANY:MASSDEFECT=massdefect (min=0.1, max=0.2)
Exclusion of condition	MS2PROD, MS1MZ, MS2PREC	MS2PROD=163.1:EXCLUDED

MassQL reference implementation

The reference implementation is a fully working version of the MassQL software ecosystem for the community to use. It also serves as a guide for future MassQL implementations that may optimize speed and/or introduce new functions in other systems. Specifically, the reference implementation includes the following pieces:

MassQL formal grammar. The grammar is defined using the extended backus-naur form and builds upon common MS terminology for improved expressiveness and interpretability (see ‘MassQL language description’ section). During the development, community input and feedback shaped the vocabulary and capabilities of the language.
MassQL parser. The parser transforms a query into an internal data structure that can be used by any programming language. The parsing is done by using the lark Python library (https://github.com/lark-parser/lark) and specific Python code to transform a MassQL query to a parse tree and into the internal data structure that organizes all query conditions and qualifiers.
MassQL query engine. The MassQL reference query engine is written in Python and utilizes pyteomics⁴⁵ to read open MS data files from mzML, mzXML and MGF formats into data frames. Such MS spectra in data frame format can optionally be saved as Apache feather files to cache data for repeated querying. The query engine itself processes the query over these data frames using the Python pandas library to perform data filtering and manipulations. Output results are data frames that can be exported as a tabular format. Optionally, the retrieved MS spectra can be exported in JSON format, MGF and mzML⁴⁶.

The MassQL reference NextFlow⁴⁷ workflow is designed as an automated high-throughput tool for querying multiple files simultaneously on a computational cluster. This workflow utilizes the reference query engine and parallelizes the querying of multiple files across a multicore processor or a batch cluster, depending on the compute resources available. All results are then merged together, including extracted MS spectra in JSON format, MGF and mzML format, if desired.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Data availability

All data used in this paper are deposited at MassIVE (massive.ucsd.edu). The relevant dataset accessions are provided together with the relevant description in the Supplementary Notes.

Code availability

Reference Engine Implementation (Python), language formal grammar, GNPS Workflow, NextFlow Workflow and interactive web interface are available via GitHub at https://github.com/mwang87/MassQueryLanguage and Zenodo at https://doi.org/10.5281/zenodo.14419767 (ref. ⁴⁸). MassQL is also available via Python API (https://pypi.org/project/massql/), R API (https://github.com/rformassspectrometry/SpectraQL), mzmine (https://github.com/mzmine/mzmine), OpenMS (https://pyopenms.readthedocs.io/en/latest/massql.html), MS-DIAL 5 (http://prime.psc.riken.jp/compms/index.html) and UniDec (https://github.com/michaelmarty/UniDec). The official online documentation for MassQL is available at https://mwang87.github.io/MassQueryLanguage_Documentation/.

Change history

08 August 2025
A Correction to this paper has been published: https://doi.org/10.1038/s41592-025-02785-1

References

Stein, S. E. & Scott, D. R. Optimization and testing of mass spectral library search algorithms for compound identification. J. Am. Soc. Mass. Spectrom. 5, 859–866 (1994).
Article CAS PubMed Google Scholar
Baars, O., Morel, F. M. M. & Perlman, D. H. ChelomEx: isotope-assisted discovery of metal chelates in complex media using high-resolution LC–MS. Anal. Chem. 86, 11298–11305 (2014).
Article CAS PubMed Google Scholar
Huber, F. et al. matchms—processing and similarity evaluation of mass spectrometry data. J. Open Source Softw. 5, 2411 (2020).
Article Google Scholar
Chang, H.-Y. et al. A practical guide to metabolomics software development. Anal. Chem. 93, 1912–1923 (2021).
Article CAS PubMed PubMed Central Google Scholar
Matsuda, F. Regular expressions of MS/MS spectra for partial annotation of metabolite features. Metabolomics 12, 113 (2016).
Article Google Scholar
Wang, M. et al. Sharing and community curation of mass spectrometry data with Global Natural Products Social Molecular Networking. Nat. Biotechnol. 34, 828–837 (2016).
Article CAS PubMed PubMed Central Google Scholar
Sud, M. et al. Metabolomics Workbench: an international repository for metabolomics data and metadata, metabolite standards, protocols, tutorials and training, and analysis tools. Nucleic Acids Res. 44, D463–D470 (2016).
Article CAS PubMed Google Scholar
Haug, K. et al. MetaboLights: a resource evolving in response to the needs of its scientific community. Nucleic Acids Res. 48, D440–D444 (2020).
CAS PubMed Google Scholar
Petras, D. et al. GNPS Dashboard: collaborative exploration of mass spectrometry data in the web browser. Nat. Methods 19, 134–136 (2022).
Article CAS PubMed PubMed Central Google Scholar
Schmid, R. et al. Integrative analysis of multimodal mass spectrometry data in MZmine 3. Nat. Biotechnol. 41, 447–449 (2023).
Article CAS PubMed PubMed Central Google Scholar
Pfeuffer, J. et al. OpenMS 3 enables reproducible analysis of large-scale mass spectrometry data. Nat. Methods 21, 365–367 (2024).
Article CAS PubMed Google Scholar
Tsugawa, H. et al. MS-DIAL: data-independent MS/MS deconvolution for comprehensive metabolome analysis. Nat. Methods 12, 523–526 (2015).
Article CAS PubMed PubMed Central Google Scholar
Kostelic, M. M., & Marty, M. T. Deconvolving native and intact protein mass spectra with UniDec. Methods Mol. Biol. https://doi.org/10.1007/978-1-0716-2325-1_12 (2022).
Rainer, J. et al. A modular and expandable ecosystem for metabolomics data annotation in R. Metabolites 12, 173 (2022).
Article CAS PubMed PubMed Central Google Scholar
Hider, R. C. & Kong, X. Chemistry and biology of siderophores. Nat. Prod. Rep. 27, 637–657 (2010).
Article CAS PubMed Google Scholar
Sandy, M. & Butler, A. Microbial iron acquisition: marine and terrestrial siderophores. Chem. Rev. 109, 4580–4595 (2009).
Article CAS PubMed PubMed Central Google Scholar
Aron, A. T. et al. Native mass spectrometry-based metabolomics identifies metal-binding compounds. Nat. Chem. 14, 100–109 (2022).
Article CAS PubMed Google Scholar
Schmid, R. et al. Ion identity molecular networking for mass spectrometry-based metabolomics in the GNPS environment. Nat. Commun. 12, 3832 (2021).
Article CAS PubMed PubMed Central Google Scholar
Frank, A. M. et al. Clustering millions of tandem mass spectra. J. Proteome Res. 7, 113–122 (2008).
Article CAS PubMed Google Scholar
Cruz-Huerta, E. et al. Short communication: identification of iron-binding peptides from whey protein hydrolysates using iron (III)-immobilized metal ion affinity chromatography and reversed phase-HPLC-tandem mass spectrometry. J. Dairy Sci. 99, 77–82 (2016).
Article CAS PubMed Google Scholar
Nalini, S. & Balasubramanian, K. A. Studies on iron binding by free fatty acids. Indian J. Biochem. Biophys. 30, 224–228 (1993).
CAS PubMed Google Scholar
Sanyal, A. J., Hirsch, J. I. & Moore, E. W. Premicellar taurocholate avidly binds ferrous (Fe⁺⁺) iron: a potential physiologic role for bile salts in iron absorption. J. Lab. Clin. Med. 116, 76–86 (1990).
CAS PubMed Google Scholar
Tamilmani, P. & Pandey, M. C. Iron binding efficiency of polyphenols: comparison of effect of ascorbic acid and ethylenediaminetetraacetic acid on catechol and galloyl groups. Food Chem. 197, 1275–1279 (2016).
Article CAS PubMed Google Scholar
Reemtsma, T., Quintana, J. B., Rodil, R., García-López, M. & Rodríguez, I. Organophosphorus flame retardants and plasticizers in water and air I. Occurrence and fate. Trends Anal. Chem. 27, 727–737 (2008).
Article CAS Google Scholar
van der Veen, I. & de Boer, J. Phosphorus flame retardants: properties, production, environmental occurrence, toxicity and analysis. Chemosphere 88, 1119–1153 (2012).
Article PubMed Google Scholar
Yao, C., Yang, H. & Li, Y. A review on organophosphate flame retardants in the environment: occurrence, accumulation, metabolism and toxicity. Sci. Total Environ. 795, 148837 (2021).
Article CAS PubMed Google Scholar
Meng, W. et al. Functional group-dependent screening of organophosphate esters (OPEs) and discovery of an abundant OPE bis-(2-ethylhexyl)-phenyl phosphate in indoor dust. Environ. Sci. Technol. 54, 4455–4464 (2020).
Article CAS PubMed Google Scholar
Wang, L., Jia, Y. & Hu, J. Nine alkyl organophosphate triesters newly identified in house dust. Environ. Int. 165, 107333 (2022).
Article CAS PubMed Google Scholar
Ye, L., Meng, W., Huang, J., Li, J. & Su, G. Establishment of a target, suspect, and functional group-dependent screening strategy for organophosphate esters (OPEs): “into the unknown” of OPEs in the sediment of Taihu Lake, China. Environ. Sci. Technol. 55, 5836–5847 (2021).
Article CAS PubMed Google Scholar
Bittremieux, W., Laukens, K., Noble, W. S. & Dorrestein, P. C. Large-scale tandem mass spectrum clustering using fast nearest neighbor searching. Rapid Commun. Mass Spectrom. https://doi.org/10.1002/rcm.9153 (2021).
Mohanty, I. et al. The underappreciated diversity of bile acid modifications. Cell 187, 1801–1818 (2024).
Article CAS PubMed PubMed Central Google Scholar
El Abiead, Y. et al. Heterogeneous multimeric metabolite ion species observed in LC–MS based metabolomics data sets. Anal. Chim. Acta 1229, 340352 (2022).
Article CAS PubMed Google Scholar
Oesterle, I. et al. Exposomic biomonitoring of polyphenols by non-targeted analysis and suspect screening. Anal. Chem. 95, 10686–10694 (2023).
Article CAS PubMed PubMed Central Google Scholar
Liu, Z. et al. Localized cardiac small molecule trajectories and persistent chemical sequelae in experimental Chagas disease. Nat. Commun. 14, 6769 (2023).
Article CAS PubMed PubMed Central Google Scholar
Ahmed, M. M. A., Tripathi, S. K. & Boudreau, P. D. Comparative metabolomic profiling of Cupriavidus necator B-4383 revealed production of cupriachelin siderophores, one with activity against Cryptococcus neoformans. Front. Chem. 11, 1256962 (2023).
Article CAS PubMed PubMed Central Google Scholar
Ahmed, M. M. A. & Boudreau, P. D. LCMS-metabolomic profiling and genome mining of Delftia lacustris DSM 21246 revealed lipophilic delftibactin metallophores. J. Nat. Prod. 87, 1384–1393 (2024).
Article CAS PubMed PubMed Central Google Scholar
Allard, P.-M. et al. Open and reusable annotated mass spectrometry dataset of a chemodiverse collection of 1,600 plant extracts. GigaScience 12, giac124 (2023).
Article PubMed Central Google Scholar
Berger, T. et al. A MassQL-integrated molecular networking approach for the discovery and substructure annotation of bioactive. Cycl. Pept. J. Nat. Prod. 87, 692–704 (2024).
Article CAS Google Scholar
Gaudry, A. et al. A sample-centric and knowledge-driven computational framework for natural products drug discovery. ACS Cent. Sci. 10, 494–510 (2024).
Article CAS PubMed PubMed Central Google Scholar
Leão, T. F. et al. NPOmix: a machine learning classifier to connect mass spectrometry fragmentation data to biosynthetic gene clusters. PNAS Nexus 1, pgac257 (2022).
Article PubMed PubMed Central Google Scholar
Quiros-Guerrero, L.-M. et al. Comprehensive mass spectrometric metabolomic profiling of a chemically diverse collection of plants of the Celastraceae family. Sci. Data 11, 415 (2024).
Article CAS PubMed PubMed Central Google Scholar
Selegato, D. M., Zanatta, A. C., Pilon, A. C., Veloso, J. H. & Castro-Gamboa, I. Application of feature-based molecular networking and MassQL for the MS/MS fragmentation study of depsipeptides. Front. Mol. Biosci. 10, 1238475 (2023).
Article CAS PubMed PubMed Central Google Scholar
Bittremieux, W. et al. Comparison of cosine, modified cosine, and neutral loss based spectrum alignment for discovery of structurally related molecules. J. Am. Soc. Mass. Spectrom. 33, 1733–1744 (2022).
Article CAS PubMed Google Scholar
Wang, M. et al. Mass spectrometry searches using MASST. Nat. Biotechnol. 38, 23–26 (2020).
Article PubMed PubMed Central Google Scholar
Goloborodko, A. A., Levitsky, L. I., Ivanov, M. V. & Gorshkov, M. V. Pyteomics—a Python framework for exploratory data analysis and rapid software prototyping in proteomics. J. Am. Soc. Mass. Spectrom. 24, 301–304 (2013).
Article CAS PubMed Google Scholar
Martens, L. et al. mzML—a community standard for mass spectrometry data. Mol. Cell. Proteom. 10, R110.000133 (2011).
Article Google Scholar
Di Tommaso, P. et al. Nextflow enables reproducible computational workflows. Nat. Biotechnol. 35, 316–319 (2017).
Article PubMed Google Scholar
Wang, M. et al. mwang87/MassQueryLanguage: release 2024.12.12. Zenodo https://doi.org/10.5281/zenodo.14419767 (2024).

Download references

Acknowledgements

We thank A. Leung for initial discussions and guidance on language design. This research was supported in part by the BBSRC-NSF award 2152526 (Dorrestein), the Intramural Research Program of National Institute of Environmental Health Sciences of the NIH (ES103363-01, Jarmusch), the National Institute of General Medical Sciences of the NIH (R01 GM125943, Sanchez; R01 GM107550, Gerwick/Dorrestein; R35GM128690, Phelan), the National Institute of Allergy and Infectious Diseases of the NIH (R21AI156669, McCall; R15AI137996, Stevens), National Science Foundation (2128044, Sanchez; CHE-1845230, Marty), NSF CAREER Award (2047235 Garg), the Burroughs Wellcome Fund (1021280, McCall), Fundação de Amparo à Pesquisa do Estado de São Paulo (2018/24865-4), Fundação de Amparo à Pesquisa do Estado do Rio de Janeiro (E-26/201.260/2021, Borges; E-26/211.314/2019, Chagas), NIH (R35GM146934, Kersten), National Research Foundation of Korea (NRF-2020R1C1C1004046, RS-2022-NR068419, RS-2022-NR070845, and RS-2024-00436674, Kang), NIH U2C-DK119886 and NIH U24-DK141185 (Subramaniam), NIH R01 GM155383 (Fiehn), the German Research Foundation (EXC 2124, Petras), the Swedish Research Council (VR 2020-04707, Globisch), Fund for Financing Science and Innovation Support under the Ministry of Innovative Development of the Republic of Uzbekistan (Kakhkhorov), the National Research Foundation of Korea (NRF-2020R1C1C1004046, Kang; NRF-RS-2023-00211868, Kim), the German Research Foundation (DFG) TRR 261 (project 398967434, Walter), Biological Sciences Initiative at the University of Michigan (Tripathi), Heisenberg program (project 495740318, Crüsemann), the German Ministry for Education and Research de.NBI, BMBF FKZ031 A 534A and EPIC-XS, project number 823839, Federal Ministry of Education and Research in the frame of de.NBI/ELIXIR-DE (W-de.NBI-022), the Ministry of Science, Research and Arts Baden-Württemberg and EPIC-XS, project number 823839, funded by the Horizon 2020 program of the European Union (Kohlbacher, Sachsenberg), the Czech Science Foundation (GA CR, project number 21-11563M, Pluskal), the European Regional Development Fund (P JAC, project number CZ.02.01.01/00/22_010/0002733, Damiani), US Department of Energy Joint Genome Institute (https://ror.org/04xm1d337; a DOE Office of Science User Facility, is supported by the Office of Science of the US Department of Energy operated under contract number DE-AC02-05CH11231 (Northen and Bowen) and subcontract number 7601660, Wang), JSPS KAKENHI (21K18216, Tsugawa), the National Cancer Center Research and Development Fund (2020-A-9, Tsugawa), JST ERATO Grant (JPMJER2101, Tsugawa), AMED Japan Program for Infectious Diseases Research and Infrastructure (21wm0325036h0001, Tsugawa), JST NBDC (JPMJND2305, Tsugawa), the Novo Nordisk Foundation, Denmark (NNF20CC0035580, NNF16OC0021746, Weber), Betty and Gordon Moore Foundation (Aron), NIH (R35GM155026-01, Aron), DNRF137 (S. Jarmusch), NIH (5U24DK133658-02, Wang) and NIH (R35GM128690, Phelan) and the ALSAM Foundation (Phelan). M.W. was partially supported by the US Department of Energy Joint Genome Institute (https://ror.org/04xm1d337), a DOE Office of Science User Facility, and is supported by the Office of Science of the US Department of Energy under contract no. DE-AC02-05CH11231.

Author information

These authors contributed equally: Tito Damiani, Alan K. Jarmusch, Allegra T. Aron, Daniel Petras, Vanessa V. Phelan, Haoqi Nina Zhao.

Authors and Affiliations

Institute of Organic Chemistry and Biochemistry of the Czech Academy of Sciences, Prague, Czech Republic
Tito Damiani, Robin Schmid & Tomáš Pluskal
Metabolomics Core Facility, Immunity, Inflammation, and Disease Laboratory, Division of Intramural Research, National Institute of Environmental Health Sciences, National Institutes of Health, Research Triangle Park, NC, USA
Alan K. Jarmusch & Kirsten E. Overdahl
Department of Chemistry and Biochemistry, University of Denver, Denver, CO, USA
Allegra T. Aron
Functional Metabolomics Lab, CMFI Cluster of Excellence, University of Tuebingen, Tuebingen, Germany
Daniel Petras
Department of Biochemistry, University of California Riverside, Riverside, CA, USA
Daniel Petras
Department of Pharmaceutical Sciences, Skaggs School of Pharmacy and Pharmaceutical Sciences, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
Vanessa V. Phelan
Collaborative Mass Spectrometry Innovation Center, Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, CA, USA
Haoqi Nina Zhao, Wout Bittremieux, William H. Gerwick, Helena Mannochio-Russo, Robin Schmid & Pieter C. Dorrestein
Biologicals and Natural Products Discovery, Crop Protection R&D, Corteva Agrisciences, Indianapolis, IN, USA
Deepa D. Acharya
BioMolecular Sciences, School of Pharmacy, University of Mississippi, Oxford, MS, USA
Mohammed M. A. Ahmed & Paul D. Boudreau
Pharmacognosy, Faculty of Pharmacy, Al-Azhar University, Nasr City, Egypt
Mohammed M. A. Ahmed
Department of Fundamental Chemistry, Institute of Chemistry, University of São Paulo, São Paulo, Brazil
Anelize Bauermeister
Department of Chemistry, Case Western Reserve University, Cleveland, OH, USA
Matthew J. Bertin
Walter Mors Institute of Research on Natural Products, Federal University of Rio de Janeiro, Rio de Janeiro, Brazil
Ricardo M. Borges & Fernanda O. Chagas
Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Lab, Berkeley, CA, USA
Benjamin P. Bowen & Trent R. Northen
The Joint Genome Institute, Lawrence Berkeley National Lab, Berkeley, CA, USA
Benjamin P. Bowen & Trent R. Northen
Mass Spectrometry Center of Expertise, Regulatory and Stewardship, Corteva Agrisciences, Indianapolis, IN, USA
Christopher J. Brown & Jeffrey R. Gilbert
Biologicals and Natural Products, Crop Protection R&D, Corteva Agrisciences, Indianapolis, IN, USA
Kenneth D. Clevenger
Department of Chemistry – BMC, Science for Life Laboratory, Uppsala University, Uppsala, Sweden
Mario S. P. Correia & Daniel Globisch
Clinical Biomarkers Laboratory, School of Medicine, Emory University, Atlanta, GA, USA
William J. Crandall & Ken Liu
Institute of Pharmaceutical Biology, University of Bonn, Bonn, Germany
Max Crüsemann
Institute of Pharmaceutical Biology, Goethe University Frankfurt, Frankfurt, Germany
Max Crüsemann
Department of Bioengineering, University of California San Diego, La Jolla, CA, USA
Eoin Fahy & Shankar Subramaniam
West Coast Metabolomics Center, University of California Davis, Davis, CA, USA
Oliver Fiehn & Elys Rodriguez
School of Chemistry and Biochemistry, Center for Microbial Dynamics and Infection, Georgia Institute of Technology, Atlanta, GA, USA
Neha Garg
Scripps Institution of Oceanography and Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, CA, USA
William H. Gerwick
Faculty of Chemistry, Institute of Exact and Natural Science, Federal University of Para, Belem, Brazil
Paulo Wender P. Gomes
Institute of Inorganic and Analytical Chemistry, University of Münster, Münster, Germany
Steffen Heuckeroth
Center for Urban Waters, University of Washington, Tacoma, WA, USA
C. Andrew James
Department of Biotechnology and Biomedicine, Technical University of Denmark, Kongens Lyngby, Denmark
Scott A. Jarmusch
Laboratory of Physical and Chemical Methods of Research, Center for Advanced Technologies, Tashkent, Uzbekistan
Sarvar A. Kakhkhorov
College of Pharmacy, Sookmyung Women’s University, Seoul, Republic of Korea
Kyo Bin Kang
SW R&D Bioinformatics, Life Science Mass Spectrometry, Bruker Daltonics GmbH & Co. KG, Bremen, Germany
Nikolas Kessler & Heiko Neuweger
Department of Medicinal Chemistry, College of Pharmacy, University of Michigan, Ann Arbor, MI, USA
Roland D. Kersten & Ashootosh Tripathi
College of Pharmacy and Integrated Research Institute for Drug Development, Dongguk University-Seoul, Goyang, Republic of Korea
Hyunwoo Kim
College of Pharmacy, University of Rhode Island, Kingston, RI, USA
Riley D. Kirk
Applied Bioinformatics, Department of Computer Science, University of Tuebingen, University of Tuebingen; Institute for Bioinformatics and Medical Informatics, University of Tuebingen; Institute for Translational Bioinformatics, University Hospital Tuebingen, Tübingen, Germany
Oliver Kohlbacher
The Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, Kongens Lyngby, Denmark
Eftychia E. Kontou & Tilmann Weber
Department of Chemistry and Biochemistry, UC Santa Cruz, Santa Cruz, CA, USA
Itzel Lizama-Chamu, Gordon T. Luu & Laura M. Sanchez
Department of Marine Biology, The Leon H. Charney School of Marine Sciences, University of Haifa, Haifa, Israel
Tal Luzzatto Knaan & Omri Nahor
Department of Chemistry and Biochemistry, University of Arizona, Tucson, AZ, USA
Michael T. Marty
Department of Biotechnology and Life Science, Tokyo University of Agriculture and Technology, Koganei, Japan
Yuki Matsuzawa, Kozo Nishida & Hiroshi Tsugawa
School of Chemistry and Biochemistry, Georgia Institute of Technology, Atlanta, GA, USA
Andrew C. McAvoy
Department of Chemistry and Biochemistry, San Diego State University, San Diego, CA, USA
Laura-Isobel McCall
Pharmacognosy Department, Faculty of Pharmacy, Cairo University, Cairo, Egypt
Osama G. Mohamed
Natural Products Discovery Core, Life Sciences Institute, University of Michigan, Ann Arbor, MI, USA
Osama G. Mohamed & Ashootosh Tripathi
Institute of Pharmacy, Freie Universität Berlin, Berlin, Germany
Timo H. J. Niedermeyer
Institute for Biomedicine, Eurac Research, Bolzano, Italy
Johannes Rainer & Andrea Vicini
Department of Pharmacy, University of Marburg, Marburg, Germany
Raphael Reher
Applied Bioinformatics, Department of Computer Science, University of Tuebingen, University of Tuebingen, Tübingen, Germany
Timo T. Sachsenberg & Axel Walter
Department of BioMolecular Sciences, School of Pharmacy, University of Mississippi, Oxford, MS, USA
Cole Stevens
Chemistry and Chemical Biology, Northeastern University, Boston, MA, USA
Zhenyu Tian
RIKEN Center for Integrative Medical Sciences, Tsurumi-ku, Japan
Hiroshi Tsugawa
RIKEN Center for Sustainable Resource Science, Tsurumi-ku, Japan
Hiroshi Tsugawa
Bioinformatics Group, Wageningen University & Research, Wageningen, the Netherlands
Justin J. J. van der Hooft
Department of Biochemistry, University of Johannesburg, Johannesburg, South Africa
Justin J. J. van der Hooft
Crop Protection R&D, Corteva Agrisciences, Indianapolis, IN, USA
Quanbo Xiong
Data Science and Bioinformatics, Corteva Agrisciences, Dublin, OH, USA
Tao Xu
Department of Computer Science, University of California Riverside, Riverside, CA, USA
Mingxun Wang

Authors

Tito Damiani
View author publications
Search author on:PubMed Google Scholar
Alan K. Jarmusch
View author publications
Search author on:PubMed Google Scholar
Allegra T. Aron
View author publications
Search author on:PubMed Google Scholar
Daniel Petras
View author publications
Search author on:PubMed Google Scholar
Vanessa V. Phelan
View author publications
Search author on:PubMed Google Scholar
Haoqi Nina Zhao
View author publications
Search author on:PubMed Google Scholar
Wout Bittremieux
View author publications
Search author on:PubMed Google Scholar
Deepa D. Acharya
View author publications
Search author on:PubMed Google Scholar
Mohammed M. A. Ahmed
View author publications
Search author on:PubMed Google Scholar
Anelize Bauermeister
View author publications
Search author on:PubMed Google Scholar
Matthew J. Bertin
View author publications
Search author on:PubMed Google Scholar
Paul D. Boudreau
View author publications
Search author on:PubMed Google Scholar
Ricardo M. Borges
View author publications
Search author on:PubMed Google Scholar
Benjamin P. Bowen
View author publications
Search author on:PubMed Google Scholar
Christopher J. Brown
View author publications
Search author on:PubMed Google Scholar
Fernanda O. Chagas
View author publications
Search author on:PubMed Google Scholar
Kenneth D. Clevenger
View author publications
Search author on:PubMed Google Scholar
Mario S. P. Correia
View author publications
Search author on:PubMed Google Scholar
William J. Crandall
View author publications
Search author on:PubMed Google Scholar
Max Crüsemann
View author publications
Search author on:PubMed Google Scholar
Eoin Fahy
View author publications
Search author on:PubMed Google Scholar
Oliver Fiehn
View author publications
Search author on:PubMed Google Scholar
Neha Garg
View author publications
Search author on:PubMed Google Scholar
William H. Gerwick
View author publications
Search author on:PubMed Google Scholar
Jeffrey R. Gilbert
View author publications
Search author on:PubMed Google Scholar
Daniel Globisch
View author publications
Search author on:PubMed Google Scholar
Paulo Wender P. Gomes
View author publications
Search author on:PubMed Google Scholar
Steffen Heuckeroth
View author publications
Search author on:PubMed Google Scholar
C. Andrew James
View author publications
Search author on:PubMed Google Scholar
Scott A. Jarmusch
View author publications
Search author on:PubMed Google Scholar
Sarvar A. Kakhkhorov
View author publications
Search author on:PubMed Google Scholar
Kyo Bin Kang
View author publications
Search author on:PubMed Google Scholar
Nikolas Kessler
View author publications
Search author on:PubMed Google Scholar
Roland D. Kersten
View author publications
Search author on:PubMed Google Scholar
Hyunwoo Kim
View author publications
Search author on:PubMed Google Scholar
Riley D. Kirk
View author publications
Search author on:PubMed Google Scholar
Oliver Kohlbacher
View author publications
Search author on:PubMed Google Scholar
Eftychia E. Kontou
View author publications
Search author on:PubMed Google Scholar
Ken Liu
View author publications
Search author on:PubMed Google Scholar
Itzel Lizama-Chamu
View author publications
Search author on:PubMed Google Scholar
Gordon T. Luu
View author publications
Search author on:PubMed Google Scholar
Tal Luzzatto Knaan
View author publications
Search author on:PubMed Google Scholar
Helena Mannochio-Russo
View author publications
Search author on:PubMed Google Scholar
Michael T. Marty
View author publications
Search author on:PubMed Google Scholar
Yuki Matsuzawa
View author publications
Search author on:PubMed Google Scholar
Andrew C. McAvoy
View author publications
Search author on:PubMed Google Scholar
Laura-Isobel McCall
View author publications
Search author on:PubMed Google Scholar
Osama G. Mohamed
View author publications
Search author on:PubMed Google Scholar
Omri Nahor
View author publications
Search author on:PubMed Google Scholar
Heiko Neuweger
View author publications
Search author on:PubMed Google Scholar
Timo H. J. Niedermeyer
View author publications
Search author on:PubMed Google Scholar
Kozo Nishida
View author publications
Search author on:PubMed Google Scholar
Trent R. Northen
View author publications
Search author on:PubMed Google Scholar
Kirsten E. Overdahl
View author publications
Search author on:PubMed Google Scholar
Johannes Rainer
View author publications
Search author on:PubMed Google Scholar
Raphael Reher
View author publications
Search author on:PubMed Google Scholar
Elys Rodriguez
View author publications
Search author on:PubMed Google Scholar
Timo T. Sachsenberg
View author publications
Search author on:PubMed Google Scholar
Laura M. Sanchez
View author publications
Search author on:PubMed Google Scholar
Robin Schmid
View author publications
Search author on:PubMed Google Scholar
Cole Stevens
View author publications
Search author on:PubMed Google Scholar
Shankar Subramaniam
View author publications
Search author on:PubMed Google Scholar
Zhenyu Tian
View author publications
Search author on:PubMed Google Scholar
Ashootosh Tripathi
View author publications
Search author on:PubMed Google Scholar
Hiroshi Tsugawa
View author publications
Search author on:PubMed Google Scholar
Justin J. J. van der Hooft
View author publications
Search author on:PubMed Google Scholar
Andrea Vicini
View author publications
Search author on:PubMed Google Scholar
Axel Walter
View author publications
Search author on:PubMed Google Scholar
Tilmann Weber
View author publications
Search author on:PubMed Google Scholar
Quanbo Xiong
View author publications
Search author on:PubMed Google Scholar
Tao Xu
View author publications
Search author on:PubMed Google Scholar
Tomáš Pluskal
View author publications
Search author on:PubMed Google Scholar
Pieter C. Dorrestein
View author publications
Search author on:PubMed Google Scholar
Mingxun Wang
View author publications
Search author on:PubMed Google Scholar

Contributions

M.W. conceived the project. M.W. and A.K.J. designed the language. M.W., R.S., J.R., A.V., H.T., M.T.M., T.T.S., N.K. and H.N. developed the software. M.W. and P.C.D. supervised the development of the project. D.P., A.T.A., A.K.J., A.B., M.T.M., W.B., N.G., V.V.P. and P.C.D. provided feedback. A.B., J.J.J.v.d.H., K.B.K., L.I.M., R.S., M.T.M., S.A.K., T.P., W.B., R.M.B., F.O.C. and Q.X. tested the software. A.B., J.J.J.v.d.H., K.B.K., K.L., L.I.M., W.C., T.D., T.P., M.S.P.C., D.G., A.C.M., N.G., M.C., M.J.B., R.M.B., F.O.C., O.G.M., A.T., E.E.K., T.W., M.M.A.A., P.D.B., S.H., Q.X., A.T.A., D.D.A., E.F., O.F., W.H.G., J.R.G., C.A.J., R.D.K., H.K., I.L.C., G.T.L., T.L.K., H.M.R., Y.M., O.N., T.H.J.N., K.N., T.R.N., K.E.O., R.R., E.R., L.M.S., C.S., S.S., Z.T., H.N.Z., and A.W. contributed a use case. M.W., A.T.A. and D.P. wrote the documentation. T.D., M.W., A.T.A., H.N.Z., A.K.J., V.V.P. and P.C.D. wrote the paper. All authors edited and approved of the paper.

Corresponding author

Correspondence to Mingxun Wang.

Ethics declarations

Competing interests

P.C.D. is an advisor to Cybele and a co-founder and scientific advisor to Ometa and Enveda with prior approval by UC San Diego. M.W. is a co-founder of Ometa Labs LLC. R.S., S.H. and T.P. are co-founders of mzio GmbH. T.R.N. is an advisor of Brightseed Bio. J.J.J.v.d.H. is a member of the Scientific Advisory Board of NAICONS Srl., Milano, Italy, and is consulting for Corteva Agriscience, Indianapolis, IN, USA. O.K. and T.S. are officers in OpenMS Inc., a non-profit foundation that manages the international coordination of OpenMS development. The other authors declare no competing interests.

Peer review

Peer review information

Nature Methods thanks Gary Patti and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Primary Handling Editor: Arunima Singh, in collaboration with the Nature Methods team. Peer reviewer reports are available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Supplementary Notes 1–4 for highlighting the speed of MassQL, FDR and application examples and Supplementary Figures and Tables.

Reporting Summary

Peer Review File

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Damiani, T., Jarmusch, A.K., Aron, A.T. et al. A universal language for finding mass spectrometry data patterns. Nat Methods 22, 1247–1254 (2025). https://doi.org/10.1038/s41592-025-02660-z

Download citation

Received: 07 April 2023
Accepted: 03 March 2025
Published: 12 May 2025
Issue date: June 2025
DOI: https://doi.org/10.1038/s41592-025-02660-z

This article is cited by

Enabling pan-repository reanalysis for big data science of public metabolomics data
- Yasin El Abiead
- Michael Strobel
- Mingxun Wang
Nature Communications (2025)