Abstract
The domestic dog, Canis lupus familiaris, plays a vital role as companion or service animal, and the well-being and healthy aging of dogs is gaining importance. Proteins are key actors in cells, tissues and body fluids, and the ability to measure their composition and abundance is crucial to understand biological processes. We report a mass spectrometry-based proteomics approach analyzing canine tissues, plasma and urine obtained from Labrador retrievers to develop two comprehensive resources to advance the study of the dog proteome. Firstly, we developed a Labrador PeptideAtlas covering 49% of the predicted UniProtKB proteome and secondly, a Labrador spectral assay library for targeted applications by data-independent acquisition that enables the identification and quantification of 11,792 proteins (11,564 protein groups) of the dog proteome (56%). We demonstrate the performance of the library with gradients of different length and quantify 10,140 proteins in tissues and 385 proteins in plasma.
Similar content being viewed by others
Background & Summary
The domestic dog, Canis lupus familiaris, plays an important role in many people’s lives. Dogs are companion animals for families or individuals, and a dog and its owner often develop a special relationship. Beyond companionship, dogs have several other important roles including traditional herding, hunting, assisting the police and military, and increasingly as specifically trained service dogs. Service dogs support people with impaired vision or hearing, assist individuals in wheelchairs, help autistic people or veterans with post-traumatic stress disorders, or even may alert their owners of an arising medical condition such as a seizure. Given these diverse and significant tasks over their average lifespan of 10–13 years1, it is not surprising that the well-being and healthy aging of dogs, including their optimal nutrition, is gaining importance. Dog breeds have the most diverse body size of any vertebrate2 and a differential life expectancy, with larger breeds having shorter lifespans3. Dogs are susceptible to a range of lifestyle-related diseases such as tick-borne diseases or obesity, but also age-related health conditions such as cancer, strokes and cognitive decline1,4,5,6. As canines reside in the same environment as their human companions, they are also considered excellent model animals to study diseases, discover biomarkers and to find avenues for early intervention4,5,7. Several recent studies investigated various aspects of aging in canines8,9 including efforts to establish a better dog-to-human aging rate10, and initiatives to establish canine biobanks collecting genetic, behavior and health data are underway to study long term transitions from health to disease11,12,13. While the dog genome has been studied extensively14,15 and proteomics studies for dogs are only slowly emerging16,17,18, we clearly lack a detailed and comprehensive analysis of the dog proteome.
Proteins are the key actors in a cell. Originating from a relatively simple code, proteins constitute complex macromolecules that carry out a vast array of functions: they provide structural support, act as enzymes, take part in cell signaling, enable wound healing or get secreted into the extracellular matrix. Collectively, proteins define the phenotype of cells, tissues and organs19. The ability to measure the composition and abundance of proteins in a tissue or a body fluid is crucial to understand biological processes and their alterations, which may indicate the onset of a disease. Consequently, proteins and protein signatures have been increasingly proposed as clinically useful diagnostic predictors across a range of common and rare diseases, including myeloma, non-Hodgkin lymphoma, motor neuron disease, pulmonary fibrosis and dilated cardiomyopathy20.
Mass spectrometry (MS)-based proteomics is a continuously advancing, sensitive technique that supports the system-wide identification and quantification of peptides and proteins. MS Data-Dependent Acquisition (DDA) is the proteomics method of choice to discover proteins in a biological sample. This acquisition approach offers a profound proteome coverage while the identification of proteins is generally attained by searching the data against a reference proteome. However, the stochastic selection of precursor ions for peptide fragmentation and the dynamic range of proteins in complex matrices may lead to incomplete peptide measurements and particularly low abundant peptides may remain undetected.
Data-Independent Acquisition (DIA, also known as Sequential Window Acquisition of all Theoretical Mass Spectra (SWATH)) provides a more comprehensive coverage by fragmenting all precursor ions within a specified mass isolation window in an unbiased manner and independent of their abundance. DIA delivers accurate quantification and high reproducibility, and its performance has been successfully benchmarked in a multi-laboratory study21. The highly complex MS/MS spectra acquired in DIA measurements need to be decoded with targeted data analysis approaches that typically rely on peptide-centric scoring methods. This is primarily achieved with spectral ion libraries, also referred to as DIA or spectral assay libraries, which are used to extract specific ion chromatograms. The quality and proteome coverage of a library is crucial to obtain accurate results22. Experimentally-derived, comprehensive spectral ion libraries have previously been developed for several small and larger organisms23,24,25,26,27.
The relatively small number of canine proteomic studies reported so far focused on serum, plasma, urine and saliva samples, biospecimens that can be collected through minimal-invasive procedures and that were typically studied in context of a specific disease16,28,29,30,31,32. However, there is a noticeable gap in the analysis of tissue samples, which remain largely unexplored. Further, while raw data from various studies are frequently accessible, there is currently no comprehensive or centralized resource that consolidates the dog proteome. A similar limitation exists for spectral assay libraries, only few canine SWATH-MS data have been described to date and these studies used project-specific libraries that are not always accessible to the broader research community.
Here, we report two resources to study the canine proteome. First, we developed a Labrador PeptideAtlas from different tissue, plasma and urine samples (Fig. 1a,b). This resource provides a first map of the dog proteome from more than 13.7 million MS spectra that represent 10,264 proteins (49%) of the annotated proteome in UniProtKB and 23,377 proteins (54%) of the Ensembl proteome. The Labrador PeptideAtlas provides a system-wide overview of the dog proteome, it serves as a reference and can be mined for targeted proteomics and numerous other studies. Second, we developed a spectral ion library that supports the quantification of 128,850 peptides representing 11,792 proteins (11,564 protein groups) or 56% of the predicted canine proteome (UniProtKB) (Fig. 1a,c). This Labrador library has been evaluated with DIALib-QC33, a quality assessment tool for spectral ion libraries (Fig. 1c). We demonstrate the performance of this library with gradients of different length and quantified 10,140 proteins in Labrador retriever tissues and 385 proteins in plasma (Fig. 1d). Both resources and associated data are publicly available at PeptideAtlas (https://peptideatlas.org/builds/canine), SWATHAtlas (www.swathatlas.org, SAL00328-SAL00330) as well as the PRIDE data repository (PXD05293234, PXD05290235).
Schematic overview. (a) Preparation of canine tissue, plasma and urine samples followed by mass spectrometry analysis in data-dependent (DDA) and data-independent acquisition (DIA) modes. (b) Data analysis steps to generate a Labrador PeptideAtlas build from DDA data. Raw data were converted, searched with Comet and MSFragger, and further analyzed with the Trans-Proteomic Pipeline including PeptideProphet, iProphet, reSpect and ProteinProphet. (c) DIA spectral assay library generation with Spectronaut, quality control with DIALib-QC, a software tool to evaluate libraries for shortcomings, and sharing of the DIA library through the SWATHAtlas resource. (d) Application of the Labrador DIA library. Analysis of tissue and plasma employing DIA-MS and targeted data extraction with the developed Labrador DIA library.
Methods
Donor information and ethics statement
Thirteen different canine tissues (adipose tissue, brain, heart, kidney, large intestine, liver, lung, muscle, pancreas, skin, small intestine, spleen, and stomach) from two Labrador dogs (26 samples total) were provided from post-mortem material banked at the Waltham Petcare Science Institute. The samples derived from a female stillborn and a male 6 months old, euthanized on welfare grounds due to untreatable hip dysplasia. Plasma from eleven dogs and urine from one dog were derived from remnant material from healthy, adult Labrador retrievers (4 males, 7 females, one to eight years old) housed at the WALTHAM Petcare Science Institute, in housing and with care consistent with Home Office regulations, from a study approved by the WALTHAM Animal Welfare and Ethical Review Body and performed under authority of the UK Regulation of Animals (Scientific Procedures) Act of 1986.
Tissue sample preparation for DDA and DIA MS analysis
Excised tissues were placed in RNAlater (Thermo Fisher Scientific), stored at 4°C for 24 hours to allow penetration through tissue prior to storage at −80 °C. For proteomics analysis, tissues were thawed and rinsed five times with 1 mL 0.1 M triethylammonium bicarbonate (TEAB, pH 8.5). Each tissue was homogenized in 200 µL 0.1 M TEAB using three 1.4 mm ceramic beads in a Precellys 24 homogenizer (Bertin Corp) at 8500 rpm for 9 × 20 s with resting for 30 s between cycles. Protein content was determined by bicinchoninic acid assay (BCA, Thermo Fisher Scientific). An aliquot of 200 µg protein was added to 5% sodium dodecyl sulfate (SDS), 0.1 M TEAB to make a final volume of 50 µL in a KingFisher deep well plate (Thermo Fisher Scientific), samples heated for 2 min at 90 °C and allowed to cool. Proteins were reduced with 5 mM tris(2-carboxyethyl)phosphine (TCEP, 15 min, 55 °C) and alkylated with 15 mM iodoacetamide (IAM, 15 min, 21 °C, darkness), shaking at 800 rpm on a Thermomixer (Eppendorf). Each lysate was diluted with 450 µL 80% ethanol followed by protein aggregation capture (PAC)36 using a KingFisher purification instrument (Thermo Fisher Scientific). Briefly, magnetic carboxylate beads (hydrophobic to hydrophilic bead ratio of 1:1; final bead to protein ratio of 4:1) were washed in 800 µL 80% ethanol for 5 min and collected. Lysate and beads were mixed for 15 min, followed by three wash steps in 500 µL 80% ethanol for 2 min per wash and with beads collected at each step. Beads were mixed in 300 µL 50 mM TEAB for 10 min and samples with beads transferred to 2 mL microcentrifuge tubes (Eppendorf). Proteins were digested with trypsin (Trypsin-Gold, Pierce Thermo Fisher) at a ratio of 1:50 trypsin to protein for 16 h at 37 °C. The digestion was stopped by adding formic acid (FA) to a final concentration of 1% and the supernatant was separated from the beads. Peptides were dried under centrifugal evaporation (Savant, Thermo Fisher Scientific) and reconstituted in 2% ACN/0.1% FA for MS analysis.
In addition, a subset of tissue samples including heart, kidney, liver, pancreas, stomach, brain and adipose tissue was processed using the S-Trap™ mini spin column digestion protocol (ProtiFi, USA). Briefly, tissue (~50 mg) was homogenized in 5% SDS, 50 mM TEAB, pH 8.5 and samples clarified by centrifugation (25,000 rpm, 30 min, 4 °C). Protein content was determined by BCA. Proteins were reduced with 5 mM TCEP (60 min, 20 °C) and alkylated with 10 mM IAM (30 min, 20 °C, darkness). An aliquot of each tissue sample was acidified with 12% phosphoric acid in water (pH ≤ 1) and binding/wash buffer (100 mM TEAB (final) in 90% methanol) was added to each sample at 7 x of the sample volume. Samples were vortexed and transferred to S-Trap spin columns. After sample loading, S-Trap columns were centrifuged for 1 min at 4,000 g. Samples were cleaned by adding 3 × 400 µL binding/wash buffer and centrifugation after each wash (1 min, 4,000 g). Proteins were digested on the S-Trap columns for 16 h at 37 °C by adding trypsin (Trypsin-Gold) in 125 µL 50 mM TEAB. Peptides were eluted sequentially with 80 µL each of 50 mM TEAB in water, 0.2% formic acid in water, and 50% acetonitrile in water. Peptide eluates were pooled and dried under centrifugal evaporation (Savant). Peptides were purified using C-18 spin columns (Thermo Fisher Scientific) per manufacturer’s instructions. Purified peptides were dried under centrifugal evaporation (Savant).
Plasma sample preparation
Plasma obtained from Labrador retrievers was thawed on ice, 200 µL were centrifuged at 4000 × g for 10 min, and the pellet of tissue debris was discarded. The protein content of the plasma samples was determined by BCA. The supernatant was split into two aliquots. 100 µL of each sample were depleted of the 14 most abundant canine plasma orthologue proteins using the human multiple affinity removal system (MARS Hu-14, 4.6 × 100 mm, Agilent Technologies) by HPLC according to the manufacturer’s protocol to exploit any human to dog protein cross-reactivity of the antibodies within the MARS Hu-14 column. The depleted fraction was collected in 1.25 mL of MARS Buffer A (Agilent). The undepleted aliquots from two samples were diluted with MARS Buffer A to a final volume of 1.25 mL. Each sample was denatured by adding 600 mg urea to 8 M final concentration. Samples were reduced with 5 mM dithiothreitol (DDT, 30 min, 55 °C) and alkylated with 14 mM IAM (30 min, room temperature, darkness). Samples were desalted using a GE HiPrep 26/10 desalting column (GE HealthCare Life Sciences) and 1100 HPLC system (Agilent Technologies), and collected in 2 mL of 1% ammonium bicarbonate. The protein concentration of the desalted samples was determined by BCA. The depleted samples were digested with 4 µg of trypsin (Cat no. TRTPCK, Worthington) and undepleted samples with 20 µg of trypsin at 37 °C for 16 h. Digested samples were acidified with 5 µL 1% formic acid and dried under centrifugal vacuum evaporation.
Urine sample preparation
Urine was centrifuged at 4,000 x g for 10 min at room temperature, the pellet was discarded. The supernatant was concentrated and divided into two aliquots. One aliquot remained undepleted and one aliquot was depleted from the 14 most abundant plasma orthologue proteins using the MARS Hu-14 column according to the manufacturer’s protocol. The depleted fraction was collected in 1.25 mL of MARS Buffer A and the undepleted aliquot diluted to 1.25 mL with MARS Buffer A. Both samples were denatured, reduced, alkylated and desalted as described above for the plasma samples. The protein content of the desalted samples was determined by BCA. Samples were digested with sequencing grade modified trypsin (Promega) at 1:50 enzyme to substrate ratio at 37 °C for 16 h. The digest was dried under centrifugal evaporation (Savant).
DDA of tissue and plasma for PeptideAtlas and spectral assay library generation using a timsTOF Pro mass spectrometer
Tissue and plasma samples were reconstituted in 0.1% formic acid, spiked with iRT standard peptides37 (Biognosys AG, Schlieren, Switzerland) and analyzed using a Vanquish Neo HPLC system (Thermo Fisher Scientific) in nanoflow mode coupled to a CaptiveSpray (CS) source (Bruker) and a timsTOF Pro mass spectrometer (Bruker). The system was operated with 99.9% water, 0.1% formic acid (v/v, buffer A), and 99.9% ACN, 0.1% formic acid (v/v, buffer B). Peptides were loaded onto a 0.5 cm × 0.3 mm trap cartridge Chrom XP C18, 3 μm (Thermo Fisher Scientific) at 10 μL/min and separated on a C18 UHP 15 cm × 0.15 mm × 1.5 μm dp column (Bruker/PepSep) at 45 °C and 600nL/min using a 90 min stepwise gradient from 3% to 25% B in 70 min, 25% to 35% B in 20 min, 35% to 80% B in 1 min and isocratic flow at 80% B for 4 min followed by fast washing and equilibration for three column volumes. The CS ion source was equipped with a 20 μm emitter (Bruker), and the parameters were as follows: 1700 V capillary voltage, 3.0 L/min dry gas, and temperature set to 180 °C.
For all samples, three technical replicates were acquired in DDA-Parallel Accumulation Serial Fragmentation (PASEF) scan mode, covering 100–1700 m/z with 10 PASEF ramps. The trapped ion mobility spectrometry (tims) settings were 100 ms for ramp and accumulation time (100% duty cycle), resulting in 1.38 s of total cycle time. Active exclusion was enabled with a 0.3 min release. The default collision energy with a base of 0.6 1/K0 [V s/cm2] was set at 20 eV and 1.6 1/K0 [V s/cm2] at 63 eV was used. Isolation widths were set at 2 m/z at <700 m/z and 3 m/z at >800 m/z.
DIA of tissue and plasma for spectral assay library generation using a timsTOF Pro mass spectrometer
To achieve a more comprehensive coverage in the assay library, all samples were acquired utilizing the aforementioned LC-MS setup in dia-PASEF mode encompassing an m/z range of 400–1200 and 1/K0 range of 0.57 to 1.57. This was achieved through 32 × 25 Da windows with a mass overlap of 1 Da (Bruker py3 dia-PASEF scheme) resulting in a total cycle time of 1.8 s. Selected tissue samples were also acquired using the dia-PASEF preformed py5 scheme for 22 min and 45 min gradient length and the py8 scheme for 11 min gradient length.
DDA of plasma and urine for PeptideAtlas generation using a Q Exactive HF mass spectrometer
Plasma and urine samples were reconstituted in 0.1% formic acid and analyzed on a high-resolution Q Exactive HF mass spectrometer (Thermo Fisher Scientific) coupled with an Easy-nLC 1100 system. Peptides were loaded on an Acclaim PepMap 100 trap (2 cm, 75 μm ID, C18 3 μm, Thermo Fisher Scientific) and washed with 0.1% formic acid in water for 10 min. Peptides were separated on analytical column (PicoChip, 105 mm, 1.9 µm, REPROSIL Pur C-18-AQ, 120 Å, New Objective, USA) with 0.1% formic acid in water (A), 0.1% formic acid in acetonitrile (B) (v/v) using a gradient from 5% to 35% B in 120 min, followed by 80% B for 5 min and equilibration of the column for 10 min at a flow rate of 500 nL/min. Survey full-scan MS spectra were acquired in DDA mode in the mass range 375–1375 m/z at 120,000 resolution, automatic gain control (AGC) target set to 3e6 with a 50 ms maximum ion injection time (IT). Peptides were fragmented above a threshold of 4e4 by collision induced dissociation (CID) at a resolution of 15,000, AGC target 1e5, maximum IT 25 ms, TopN of 20, an isolation width of 1.8 m/z, and a normalized collision energy of 28%. Charge state z = 1, unassigned charges and z ≥ 6 were rejected; dynamic exclusion was set to 45 s. A spray voltage of 1900 V in positive mode and an RF lens at 30% were used.
DDA Data Analysis and PeptideAtlas Assembly
Instrument-native raw data acquired in DDA mode were converted to mzML format using the tdf2mzmL converter (version 0.3, https://github.com/mafreitas/tdf2mzml) for Bruker timsTOF Pro data files (.d.) and ProteoWizard msconvert (version 3.0.19205)38,39 for Thermo Fisher Scientific Q Exactive HF files (.raw). MS/MS spectra were associated with peptide sequences using Comet (version 2023.01 rev. 2)40 and MSFragger (version 4.0)41 as search engines. In addition, MS/MS spectra acquired on the Q Exactive HF were Monocle42 modified for improved monoisotopic peak and accurate precursor m/z detection, and then searched with Comet and MSFragger. In total 138 files were searched (Table 1) and the protein search space included (i) the UniProtKB Canis lupus familiaris (dog) proteome (UP000805418, Labrador retriever, downloaded January 2024) covering 42,621 protein sequences (including isoforms) of which 20,991 entries represent one protein sequence per gene, (ii) the Ensembl ROS_Cfam_1.0 proteome with 43,525 sequences (NCBI RefSeq assembly GCF_014441545.1, Labrador retriever, Ensembl version 110, September 2020), (iii) 499 contaminant protein sequences frequently observed in proteome samples (e.g., keratins, trypsin, BSA), and (iv) a sequence-shuffled decoy counterpart. Searched peptides were allowed to be semi-tryptic with up to two missed cleavages. The search parameters included a fixed modification of +57.021464 for carbamidomethylated cysteine and variable modifications of +15.9949 for oxidized methionine and tryptophan, +42.0106 for protein n-terminal acetylation, −17.02650 for pyroglumatic acid formation from glutamine and for cyclization of n-terminal S-carbamoylmethyl-cysteine, −18.01060 for pyroglumatic acid formation from glutamic acid and +0.984016 deamidation of aspargine. A monoisotopic precursor mass error tolerance of 20 ppm for Q Exactive HF and 50 ppm for timsTOF Pro data were used in Comet40 and MSFragger41 with the isotope error setting activated.
The search results were processed and statistically validated with the open-source proteomics data analysis package Trans-Proteomic Pipeline (TPP, version v6.4.x Pillar, Build 202401040133-9088)43,44 including PeptideProphet45 and iProphet46 (Fig. 1b). Peptide spectrum matches (PSM) generated by each search engine were analyzed with PeptideProphet to assign each PSM a probability of being correct. PeptideProphet was run using accurate mass, variable modification count and semi-parametric models. The expectation values were used as the only contributor to the f-value for modeling. The ppm mass errors instead of Daltons were used for mass modeling. Decoy hits were reported with a probability based on the model learned. PeptideProphet results were further processed with iProphet to refine the PSM-level probabilities and compute peptide-level probabilities based on corroborating information from the ensemble of identifications, and to combine the results from the same and then different search engines. All data were further processed using two rounds of reSpect47 within the TPP. For the first round of reSpect, the minimum probability was set to 0 and for the second round the minimum probability was set to 0.5. The new set of.mzML files generated by both rounds of reSpect were searched using MSFragger with precursor mass tolerance set to 3.0 Daltons and isotope_error off, and processed using PeptideProphet as described above except for the accurate mass option. iProphet was used to process the PeptideProphet validated reSpect results. ProteinProphet48 was run after iProphet. To assemble a Labrador PeptideAtlas build from the different experiments, each experiment was thresholded at an iProphet probability that yielded a model-based PSM FDR of 0.001. Decoy identifications were retained and used to compute final decoy-based FDRs. Contaminant protein and peptide identifications were included in the FDR calculation.
DIA spectral assay library generation
A comprehensive hybrid DIA library was generated from a total of 300 DDA and DIA raw data folders (.d format) encompassing all tissue and plasma runs acquired on the timsTOF Pro (Table 2) using the Spectronaut software package (version 18.4.231017.55695, Biognosys). The raw data were searched against the UniProtKB Canis lupus familiaris proteome using one protein sequence per gene (UP000805418, 20,991 entries, January 2024) with common contaminant proteins and iRT peptides (Biognosys) appended. Carbamidomethyl was used as fixed modification (C) and acetyl (protein N-term) and oxidation (M) as variable modification. Enzyme digestion trypsin/P with up to two missed cleavages were allowed, and PSMs with a precursor charge outside minimum 2 and maximum 5 were removed. Mass tolerances were automatically determined by Spectronaut, otherwise default settings were used. Identification search results were filtered at a 1% false discovery rate on the PSM, peptide, and protein levels, and a consensus assay library was constructed and exported.
Spectral assay library quality assessment
The quality of the Labrador spectral assay library was evaluated with DIALib-QC (version 1.2, http://www.swathatlas.org/DIALibQC.php)33, a software tool that considers various parameters with regard to a library’s complexity, characteristics, modifications, completeness and correctness.
DIA data analysis
DIA data were analyzed with Spectronaut (version 18.4.231017.55695, Biognosys). The developed library and default settings were used for the targeted data extraction of three analytical replicates of each tissue sample and respective gradient length as well as for the plasma runs. Briefly, data extraction mass tolerance (MS1 and MS2) was set to dynamic with a correction factor of 1, dynamic extracted ion chromatogram (XIC) RT window was enabled with a correction factor of 1 and local (non-linear) RT regression used. Decoy assays were dynamically generated using the mutated decoy method with a set size of 1 as a fraction of the input library size. Identification was performed using the kernel density distribution estimator with precursor and protein identification results filtered with a q-value (FDR) of <0.01. For quantification, MS2 ion peak areas of quantified peptides were averaged to estimate the protein peak areas and area under the curve within integration boundaries as quantity type selected. Identified precursor quantities were normalized using the cross-run normalization function built-in Spectronaut.
Principal component analysis
Sample grouping and reproducibility were determined by Principal Component Analysis (PCA) using the Spectronaut processed data and the ‘prcomp’ function in the R statistical software (version 4.1.2) with scaling and centering set to TRUE. The PCA plot was generated from plasma and tissue samples acquired in replicates with the 90-minute gradient and analyzed with the developed spectral assay library using the same processing settings as described above.
Data Records
Data record 1
The raw mass spectrometry DDA files acquired on the timsTOF Pro (.d) and on the Q Exactive HF (.raw), converted files (.mzML), results from the database searches with Comet and MSFragger (.pep.xml), Comet and MSFragger search results on the Monocle modified Q Exactive HF data (pep.xml), the respect-derived (.mzML) and MSFragger searches (.pep.xml) files on timsTOF Pro and Q Exactive HF raw data, and the search database (.fasta) used to generate the Labrador PeptideAtlas have been deposited with a data overview (.excel) to the ProteomeXchange Consortium49 (http://proteomecentral.proteomexchange.org) via the PRIDE partner repository50 with dataset identifier PXD05293234. The Labrador PeptideAtlas is available under https://peptideatlas.org/builds/canine.
Data record 2
The raw mass spectrometry DDA and DIA timsTOF Pro (.d) data, the search database (.fasta), the DIA assay library (Canine-Tissues-Plasma-T6-Spectral-Library-Max20-SN_18_4v_2024.tsv) and DIALib-QC report (.tsv) are uploaded together with the identified peptides and proteins in the tissue and plasma samples as Spectronaut reports (.tsv) to the ProteomeXchange Consortium49 (http://proteomecentral.proteomexchange.org) via the PRIDE partner repository50 with dataset identifier PXD05290235. The spectral assay library and the DIALib-QC report are also available at www.swathatlas.org (identifier SAL00328-SAL00330).
Technical Validation
Assembly and validation of a Labrador PeptideAtlas
PeptideAtlas is a multi-organism compendium of peptides and their respective proteins identified in discovery MS experiments, and a valuable resource for data-mining, experiment planning, refinement of genome annotation and the development of targeted proteomic assays51,52. Large data sets from an organism are analyzed in a consistent manner and the results are shared in the form of PeptideAtlas builds with the community as previously shown for bovine53, Candida albicans54 and other organisms. To develop a proteome map for Canis lupus familiaris, we collected 13 different tissue types (adipose tissue, brain, heart, kidney, large intestine, liver, lung, muscle, pancreas, skin, small intestine, spleen, and stomach) from two Labrador retrievers (26 samples total), urine from one and plasma from eleven Labrador retrievers (Table 1, Fig. 1a). These samples were processed and acquired in DDA mode on a Q Exactive HF and a timsTOF Pro mass spectrometer in replicates resulting in 138 MS runs. The acquired data were searched against a reference database that included the UniProtKB Canis lupus familiaris and the Ensembl ROS_Cfam_1.0 proteome to provide added proteome coverage in the PeptideAtlas build (Methods, Data Record 1). The search results were processed with the Trans-Proteomic Pipeline43,44, a standardized suite of software tools for the analysis of MS/MS data, including PeptideProphet45, iProphet46 and reSpect47 for statistical validation of peptide spectrum matches (PSMs) and distinct peptide identifications (Fig. 1b). Reported proteins were inferred from confident peptide identifications using ProteinProphet45. The data were grouped by tissue type, with each tissue type being defined as an experiment, and stringent false discovery rates (FDR) were applied to each experiment on the spectrum, peptide and protein level considering model-based and decoy count-based FDRs to report identified peptides and proteins. The Labrador PeptideAtlas was assembled from more than 13.7 million identified spectra (0.0001 FDR) that map to 245,884 modified peptides denoting 148,211 distinct stripped peptides (0.001 FDR) which represent 10,264 proteins (0.01 FDR, 48.9%) of the predicted proteome based on 20,991 unique genes and their representative protein sequences in UniProtKB. Considering all 43,621 protein entries in UniProtKB, which include redundant entries, we obtained a proteome coverage of 23,679 proteins (54.3% of the proteome). In addition, using the Ensembl ROS_Cfam_1.0 proteome, we obtained similar results with 23,377 detected proteins (54.2% of the predicted proteome).
We denote here the 20,991 protein sequences in UniProtKB (one representative protein per gene) as the core proteome and further classified the identified 10,264 proteins based on different measures of confidence and their relationship to other proteins as described in van Wijk et al.55 Canonical proteins are protein identifications at the highest confidence level and require the identification of at least two unique, non-nested peptides, each with a minimum of nine amino acids. For the Labrador PeptideAtlas, we report 6,423 canonical proteins (0.0002 FDR, decoy-based). The lowest level are non-detected proteins, and a tiered system describes the levels in-between55. The second and third largest groups in the Labrador PeptideAtlas were 1,254 proteins classified as weak and 584 proteins classified as marginally distinguished. Briefly, proteins classified as weak have at least one unique peptide with a minimum of nine amino acids and potentially other peptides of seven or eight amino acids length or shared peptides but lack a second unique peptide with nine amino acids that is non-nesting with the first peptide identification for the protein. Marginally distinguished proteins are defined as proteins that share several peptides with a canonical protein, but also have one unique peptide with less than nine amino acids that distinguishes it from a canonical protein identification55. Among the 10,264 identified proteins were also 245 indistinguishable representatives based on the identification of peptides that map to several proteins and the lack of unique peptides. In other words, the peptide evidence shows that at least one protein of a group of proteins is present, however, we are not able to determine which one.
It is well known that the proteome is highly complex and not only defined by the 20,991 core protein sequences but also by numerous post-translational modifications (PTMs). For the resources provided here, we have considered several modifications that occur spontaneously during sample handling such as the oxidation of methionine and tryptophan or the pyroglumatic acid formation from glutamine, as well as the carbamdiomethylation of Cysteine which was introduced chemically to cleave any disulfide bonds and prevent any disulfide reformation during sample preparation (see Methods for all considered PTMs). In addition, there are numerous other PTMs such as phosphorylation and ubiquitination which play key roles in cell signaling and regulation. Many of these PTMs are present at low stoichiometry and require specific sample preparation and enrichment steps. Phosphorylation on serine, threonine and tyrosine for example requires enrichment through immobilized metal affinity chromatography (IMAC) approaches56 or phosho-tyrosine, which is even less abundant, can be specifically enriched with pTyr-antibiodies57 or engineered pTyr SuperBinders58. The current resources provide a strong basis to study the core proteome and account for the most common modifications occurring during sample handling, and can be expanded to include other PTMs as data become available.
An overview of the Labrador PeptideAtlas build and the contribution of each experiment are shown in Fig. 2 and Supplementary Table 1. The total number of PSMs for each tissue type (experiment) that contributed to the Labrador PeptideAtlas build is depicted in Fig. 2a. Of note, the number of MS runs that contributed to each experiment varies (see Supplementary Table 1) which is to a certain extent reflected in the bar graph with plasma showing the most PSMs from 28 MS runs followed by brain, heart and kidney tissue with each 12 MS runs. Urine instead showed the lowest number of PSMs from two acquired MS runs. However, the number of PSMs per experiment does not necessarily correlate with peptide and protein identifications. Figure 2b depicts the number of distinct (non-redundant) peptides contributed by each experiment and their cumulative count while Fig. 2c shows the number of canonical proteins and their cumulative count. For each dataset, the height of the orange and red bar depicts the number of distinct peptides and canonical proteins, respectively, and the total height (blue bars) represents the cumulative number of peptides (Fig. 2b) and proteins (Fig. 2c). Experiment 1–13 refers to the tissue type and its data acquired on the timsTOF Pro mass spectrometer in the same order as depicted in Fig. 2a while experiment 14–18 provides a more granular view on plasma and urine data acquired on the timsTOF Pro and Q Exactive HF MS (see Supplementary Table 1). The most canonical proteins were identified in kidney, brain and small intestine tissue with 3,528, 3,440, and 3,418 proteins, respectively (Supplementary Table 1). We further observed in our data set that brain tissue contributed with 333 proteins the most unique canonical proteins across all analyzed tissues while other tissue categories showed between 1 to 61 unique canonical proteins.
Labrador PeptideAtlas overview. (a) Analyzed samples grouped by tissue type and the total number of PSMs in each category that contributed to the Labrador PeptideAtlas build. Experiment grouping is indicated in parenthesis. Experiments 1–13 denote the samples grouped by tissue type, experiments 14–18 represent the different plasma (14,15,16) and urine (17,18) analyses (Table 1, Supplementary Table 1). The graphs further depict (b) the number of identified distinct peptides in each experiment (orange) and their cumulative count (blue) and (c) the number of canonical proteins in each experiment (red) and the cumulative number of proteins in each experiment (blue). Experiment categories are as described above. (d) The graph depicts the peptide length distribution of observed distinct peptides.
For urine, we aimed to explore the depletion efficiency using cross-reactivity of human protein-directed antibodies with the MARS Hu-14 depletion column, and we kept one aliquot of the urine sample undepleted and subjected one aliquot to depletion from the 14 most abundant plasma orthologue proteins. We detected 42 proteins in the undepleted and 87 proteins in the depleted sample, indicating some advantage of this depletion strategy. We pursued a similar approach for plasma, even though the difference between depleted and undepleted was marginal with the detection of 136 and 125 proteins (Q Exactive HF data), respectively. Acquisition of six depleted plasma samples on the timsTOF Pro resulted in the detection of 208 plasma proteins. The relatively small number of urine and plasma samples have pilot character as these samples contribute protein information to the current Labrador PeptideAtlas, however, identifications in these biospecimens need to be expanded by using larger sample cohorts and exploring other depletion strategies for a deeper proteome coverage. Further, more than 97% of spectra matched to peptides of 7–35 amino acids length with the majority of peptide spectra (77%) being between ten and 21 amino acids, and with seven amino acids being the required minimum peptide length to be considered for a confident identification (Fig. 2d).
Examining the data using Apolipoprotein E (P18649) as an example, a protein involved in lipid metabolism and transport, PeptideAtlas reports the identification status as canonical with a ProteinProphet probability of 1.0 (highest confidence) based on 6,513 spectra matching to 43 distinct peptides (Fig. 3a). PeptideAtlas further returns all observed peptides and their sequence position (blue) as well as all peptides that are expected to be observed based on physiochemical characteristics and peptides that are unlikely to be observed due to their hydrophobicity (orange) or length (yellow). A sequence coverage of 80.4% by all detected peptides or 100% of the likely observable regions of P18649 was achieved by our MS analysis. In addition, all distinct observed peptide sequences of P18649 are listed with several attributes. Verifying one of these peptides, VGALATQPLLER, returns in which samples and MS runs the peptide was detected together with several peptide metrics (best spectrum probability of 1.00, 409 times observed, etc.) and displays the annotated spectrum from each MS run (Fig. 3b), which is helpful for spectra validation in other samples and to select optimal fragment ions for targeted assay development.
Protein and peptide query in PeptideAtlas. Proteins and peptides of interest can be queried in PeptideAtlas and results are returned as tables and graphical displays. A selection is shown for Apolipoprotein E (P18649). (a) Color-coded summary of the sequence coverage of P18649 with a table listing all distinct observed peptides and its attributes representing this protein. (b) Peptide attributes and annotated spectrum for peptide VGALATQPLLER from P18649.
We share here through a publicly available resource a first Labrador PeptideAtlas from tissue, plasma and urine which covers ~50% of the predicted proteome based on stringent analysis criteria. The Labrador PeptideAtlas serves as a resource for veterinary research by providing a reference set of high-confidence peptides and MS spectra. It enables researchers to verify whether detected peptides or proteins have been previously observed in dogs and in which tissues, offering critical context for new findings. Additionally, it supports biomarker discovery by establishing a baseline of protein expression across sample types and helps select and prioritize peptides that are reliably detectable in specific tissues or biofluids for targeted proteomic studies.
This PeptideAtlas can be expanded by adding data sets from the scientific community, e.g., data from other tissues or body fluids, fractionated samples etc., and by reprocessing all data in a consistent manner as described here to provide an updated build with increased proteome coverage (while maintaining this first version).
Generation of a comprehensive Labrador spectral assay library
We developed a spectral assay library to support the accurate identification and quantitation of canine peptides and their respective proteins in targeted DIA measurements. The Labrador spectral assay library was developed from DDA data of thirteen different tissues from two dogs and six plasma samples acquired on the timsTOF Pro mass spectrometer as described above for the Labrador PeptideAtlas. In addition, we acquired data of the different tissue and four of the plasma samples in DIA mode (on the same instrument) aiming for increased proteome coverage by considering both acquisition modes (Table 2). Finally, we measured six tissue samples in DIA mode with liquid chromatography gradients of different length to assess the performance of the library. We used Spectronaut, a widely used commercial software program, to assemble the DIA spectral assay library from a total of 300 MS injections (Table 2). We included a maximum of 20 peptides per protein and used the top six fragment ions for targeted data extraction (Methods). Next, we evaluated the quality of the library with DIALib-QC33, a software tool that systematically examines more than 60 parameters assessing a library’s characteristics, completeness, and correctness (Fig. 1c, Supplementary Table 2). The evaluation with DIALib QC revealed no issues.
The developed spectral assay library encompasses 145,363 peptide ions, 128,850 modified peptides and 121,563 distinct peptides that allow to measure and quantify 11,792 proteins (11,564 protein groups) or 56% of the annotated canine proteome (Table 3, Supplementary Table 2). Assessing the proteome coverage in terms of peptides per protein (or protein group), the DIA library includes 9,769 proteins (46.5% of the proteome) that are represented by three or more distinct peptides while 8,731 proteins (41.6%) are represented by five or more peptides. Only a relatively small number of 1,170 proteins (5.6%) are represented by one distinct peptide. Overall, the majority of proteins in the DIA library can be targeted by two or more peptides supporting solid quantitation results (Fig. 4a).
Coverage and characteristics of the Labrador DIA spectral assay library. The graphs show (a) the number of peptides per protein (or protein groups) in the Labrador spectral assay library, ≥5 specifies five or more peptides; (b) the distribution of precursor ion charge; (c) distribution of precursor m/z in the acquired mass range; (d) distribution of peptide length; (e) occurrence of observed peptide modifications in the assay library including N-terminal acetylation (N-term[Ac]), carbamidomethylation (C[Cam]) and oxidation (M[Oxi]); (f) frequency of observed b- and y- ion fragments; and (g) distribution of fragment ion charge.
Moreover, we assessed various library characteristics. As expected, the majority (88%) of precursor charge states were observed in charge state z = 2 (47%) and z = 3 (41%) (Fig. 4b). Charge state z = 5 plays usually a minor role and was observed for only 0.5% of the peptides. The distribution of precursor ions in the acquired mass range of 100 to 1,700 m/z is depicted in Fig. 4c with nearly half of the precursor ions peaking in the range of 550–750 m/z and 98.3% between 400 to 1,200 m/z, the mass range we also chose for DIA acquisitions. Peptide length ranges from 7 to 52 amino acids with 84% of all peptides having a length of 7–20 and 98% a length of 7–30 amino acids (Fig. 4d) which is very similar to what has been seen in other organisms23,24. With regard to peptide modifications, we observed carbamidomethylated cysteines (C[Cam]) introduced during sample preparation, followed to a smaller extent by oxidized methionine (M[Oxi]), and we observed for 1,821 proteins N-terminal protein acetylation N-term[Ac]) (Fig. 4e). In agreement with collision induced dissociation (CID) fragmentation we observed more y-ions (61%) than b-ions (39%) (Fig. 4f), and peptide fragment ions were observed in charge state one (65.5%), two (30.0%), three (4.5%) and four (0.04%) (Fig. 4g).
Assessing the performance of the spectral assay library
To demonstrate the performance of the Labrador spectral assay library, we measured six different tissues (heart, kidney, liver, large intestine, small intestine, stomach) with four gradients of different length (11, 22, 45, and 90 min) and in technical replicates (Fig. 1d). Using the developed Labrador spectral assay library and Spectronaut, we quantified 10,910 to 51,147 peptides (1% FDR, Fig. 5a) and 3,709 to 8,191 proteins (1% FDR, Fig. 5b) per tissue type and replicate with the respective gradient, and 94,490 peptides and 10,140 proteins across all tissues. The highest number of peptide and protein identifications was obtained in kidney tissue across all gradients followed by small intestine tissue with similar results. The quantified proteins showed a dynamic range of four to five orders of magnitude as shown for kidney tissue in Fig. 5c and, in agreement with previous studies, shorter gradients displayed a loss of sensitivity in terms of protein identifications compared to longer gradients. The 90 minutes gradient resulted in the most peptide and protein identifications per tissue type and exhibited with five orders of magnitude the widest dynamic range. However, with the 11 minutes and shortest gradient we were able to quantify 72% of the proteins detected in kidney tissue with the 90 minutes gradient while using only 12% of instrument time (Fig. 5a,b). With the 22 minutes gradient we quantified 81% of the proteins and with the 45 minutes gradient we quantified 92% of the proteins in comparison to the 90 minutes acquisition of kidney tissue. These results demonstrate that the spectral assay library performs well over different gradient lengths and that the shortest gradient clearly provides an advantage when sample throughput is key.
Performance of the Labrador spectral assay library measuring tissue and plasma. Number of (a) identified peptides and (b) identified proteins obtained from six different canine tissues acquired by DIA with gradients of different lengths (11, 22, 45, 90 min). Depicted is the median value of three technical replicates for each tissue and its standard variation. (c) The plot depicts for each gradient length the identified protein groups (1%FDR) in kidney tissue ranked by their abundance highlighting the dynamic range of proteins that can be quantified. Each gradient length (11, 22, 45, 90 min) is represented by a different color. (d) Distribution of the coefficient of variation (CV) of proteins identified in three replicates of kidney tissue (1%FDR) acquired with gradients of different length. Different gradient lengths are color coded. The first and third quartile are marked by a box with whisker indicating the minimum / maximum value and the solid line indicating the median % CV of 10.6% for the 11 minutes, 7.1% for 22 minutes, 6.7% for 45 minutes and 7.0% CV for the 90 minutes gradient. (e) The plot depicts the number of proteins in the annotated canine proteome and in the developed DIA library as well as the number of proteins identified in the DIA analysis in tissues, in kidney tissue and in plasma using a 90 minutes gradient. (f) Principal Component Analysis (PCA) of six tissue and one plasma sample acquired by DIA in three replicates with a 90 minutes gradient. The axes show the variability explained by each component with PC1 representing the most variation in the dataset followed by PC2.
To further assess the performance, we correlated protein abundance measured with the different gradient lengths. As depicted for kidney tissue in Supplementary Fig. 1, the correlation of protein abundance of the 45 minutes with the 90 minutes gradient and the 22 minutes with the 90 minutes gradient resulted in a high positive correlation of r2 = 0.991 and r2 = 0.977, respectively. Even the shortest 11 minutes vs. the 90 minutes gradient showed a high correlation of measured protein abundance with r2 = 0.924. As expected, technical replicates correlated best resulting in a Pearson Correlation of r2 = 0.997. This high correlation highlights the robustness and consistency in quantitation using the Labrador spectral assay library. Examining the coefficient of variation (CV) of quantified proteins (1% FDR) in three technical replicates of kidney tissue, we obtained a median CV of 7% at the 22, 45, and 90 minutes gradient and a CV of 11% at the 11 minutes gradient which exhibits good quantitative precision and coherence with technical variation (Fig. 5d).
In addition to different tissue samples, we included six plasma samples when developing the DIA library, hence, we explored the library performance for plasma analysis. While we quantified 10,140 proteins across tissues and 8,606 proteins in kidney tissue with the developed spectral assay library, we quantified in plasma 1,482 peptides and 385 proteins with 265 proteins being detected in all three replicates (Fig. 5e). These results agree well with other studies reporting the identification of about 200 to 400 proteins in canine plasma based on the respective MS approach and reference proteome28,31,59. Plasma is rich in albumin and immunoglobulins which make up a large proportion of the protein content and mask low-abundant proteins which results in an overall lower number of protein identifications in plasma compared to tissue. This emphasizes the need for the development of effective canine-specific reagents which in combination with other methods (e.g., fractionation) will allow to expand the identification of proteins in canine plasma. Lastly, we evaluated the quality of our data in terms of different sample types and replicate coherence among proteins that are common in all samples using PCA. We observed that the three replicates of each tissue sample clustered tightly showing great and consistent reproducibility, except for plasma where slight variations were observed among replicates. The different tissue types separate overall with liver and kidney tissue being clearly distinct from other tissues, while stomach and large intestine clustering tighter as well as heart and small intestine (Fig. 5f). Plasma proteins in common with all tissues contributed the most variation along principal component 1, accounting for 79% of the variation among tissues and plasma.
In summary, we developed a comprehensive Labrador PeptideAtlas to explore the dog proteome and a spectral assay library for targeted quantitative applications. As previously stated, the ability to accurately measure the composition and abundance of proteins in a tissue or a body fluid is crucial to understand biological processes, including protein abundance changes resulting from physiological and/or psychological stress, ageing or the onset of a disease. With growing interest in the use of proteomics as a platform to identify clinically useful diagnostic markers of health and disease, these resources will provide a foundation to enhance and accelerate proteomics research in canines.
Usage Notes
PeptideAtlas
PeptideAtlas is a web-accessible, publicly available database that offers a comprehensive picture of the Labrador proteome detected in tissue, plasma and urine. The processed data can be explored in different tables and graphs, or peptides and proteins of interest can be specifically queried. The Labrador PeptideAtlas can be explored via https://peptideatlas.org/builds/canine. ‘Full 2024-01 Build Summary’ links to an interactive webpage that provides an overview and a summary of the results per experiment. The Protein and Peptide View pages (click ‘Current Build’ tab → ‘Protein’ or ‘Current Build’ → ‘Peptide’) allow to explore details regarding a specific protein or peptide. The information icons provide explanations, and more step-by-step details on navigating PeptideAtlas are described in Kusebauch et al.60.
Different chromatography setups and LC gradients
In this study all tissue and plasma samples acquired on the timsTOF Pro mass spectrometer were spiked with iRT standard peptides (Biognosys) for optimal retention-time alignment in the spectral assay library development process and for data analysis purposes of samples with various gradient length. When a different chromatography set-up or gradients of different length will be used in combination with the developed library, retention times can be aligned as demonstrated here by spiking each sample with iRT peptides. Alternatively, data analysis programs that support retention time alignment by iRT prediction (e.g., Spectronaut) when spike-in peptides are not present in each sample can be used.
Adjusting the mass isolation width
We used Bruker recommended default PASEF window schemes to acquire the samples for library generation as these provided good results in our hands. If needed, the mass isolation width can be customized based on the denseness of precursors in a sample to further optimize sensitivity and specificity.
Customizing the DIA spectral assay library
The developed DIA spectral assay library can be customized in various ways in SWATHAtlas or in Spectronaut, for example by applying a certain m/z range or considering isotope labeled samples. We recommend to assess a modified spectral assay library for conflict assays with DIALib-QC33. Of note, the developed library provides all fragment ions while we used the top six fragment ions for data analysis. This parameter can be changed in Spectronaut if less or more fragment ions should be desired.
Software compatibility for DIA data analysis
The DIA spectral assay library in this study is provided in Spectronaut format which is compatible with several other software programs with their recommended data analysis workflows. Other DIA software programs may require a different format of the library which can be user generated as typically the same input content is required such as precursor and fragment mass over charge, retention time, relative fragment intensity etc.
Applicability
To develop a PeptideAtlas and DIA spectral assay library for canines, we focused on samples from Labrador retrievers as these provided the best match for the available reference proteome data base from UniProtKB (Canis lupus familiaris, dog, UP000805418) which is based on Labrador dogs, and likewise is the Ensembl Canis lupus familiaris reference based on the Labrador breed. The resources as they currently are do not specifically account for breed-specific variations which may have implication in the context of precision medicine or the effectiveness in the development of vaccines. In general, sequence variants can be detected if they are included in the search space used to analyze the data and if the detection of variants is supported by the analysis tools used for data analysis, an option that is possible but not a great many software tools currently support. While breed specific genetic differences (as well as inter-individual differences) exist among dogs, the dog proteome is largely conserved across breeds. Most proteins involved in essential cellular functions are for example highly conserved across breeds. Furthermore, differences in protein sequences are typically less pronounced than variations in protein expression levels. The established resources provide a solid foundation and can be used to detect and quantify the vast majority of canine proteins beyond the analysis of Labrador specimens.
Data availability
All Labrador PeptideAtlas related data have been deposited to the ProteomeXchange Consortium49 (http://proteomecentral.proteomexchange.org) via the PRIDE partner repository50 and are available under dataset identifier PXD05293234. The Labrador PeptideAtlas is available under https://peptideatlas.org/builds/canine. All spectral assay library related data are available via the PRIDE partner repository50 with dataset identifier PXD05290235. The spectral assay library is also available at www.swathatlas.org (identifier SAL00328-SAL00330). Data Records provides further details.
Code availability
No custom code was used for this study. Data were analyzed with the software programs described in Methods.
References
Salt, C., Saito, E. K., O’Flynn, C. & Allaway, D. Stratification of Companion Animal Life Stages from Electronic Medical Record Diagnosis Data. The Journals of Gerontology: Series A 78, 579–586, https://doi.org/10.1093/gerona/glac220 (2022).
Sutter, N. B. et al. A single IGF1 allele is a major determinant of small size in dogs. Science 316, 112–115, https://doi.org/10.1126/science.1137045 (2007).
Speakman, J. R., van Acker, A. & Harper, E. J. Age-related changes in the metabolism and body composition of three dog breeds and their relationship to life expectancy. Aging Cell 2, 265–275, https://doi.org/10.1046/j.1474-9728.2003.00061.x (2003).
LeBlanc, A. K. et al. Creation of an NCI comparative brain tumor consortium: informing the translation of new knowledge from canine to human brain tumor patients. Neuro Oncol 18, 1209–1218, https://doi.org/10.1093/neuonc/now051 (2016).
Hoffman, J. M., Creevy, K. E., Franks, A., O’Neill, D. G. & Promislow, D. E. L. The companion dog as a model for human aging and mortality. Aging Cell 17, e12737, https://doi.org/10.1111/acel.12737 (2018).
Ehrenzweig, J. & Hunter, R. P. Canine cognitive decline and Alzheimer disease: clinical insights to solve a shared one-health problem. J Am Vet Med Assoc, 1-8, https://doi.org/10.2460/javma.23.02.0095 (2023).
Hayward, J. J. et al. Complex disease and phenotype mapping in the domestic dog. Nature Communications 7, 10460, https://doi.org/10.1038/ncomms10460 (2016).
Creevy, K. E., Akey, J. M., Kaeberlein, M. & Promislow, D. E. L. An open science study of ageing in companion dogs. Nature 602, 51–57, https://doi.org/10.1038/s41586-021-04282-9 (2022).
Kaeberlein, M., Creevy, K. E. & Promislow, D. E. The dog aging project: translational geroscience in companion animals. Mamm Genome 27, 279–288, https://doi.org/10.1007/s00335-016-9638-7 (2016).
Wang, T. et al. Quantitative Translation of Dog-to-Human Aging by Conserved Remodeling of the DNA Methylome. Cell Syst 11, 176–185.e176, https://doi.org/10.1016/j.cels.2020.06.006 (2020).
Sándor, S., Urfer, S. & Kubinyi, E. Toward establishing a worldwide net of canine biobanks. Aging (Albany NY) 14, 2436–2437, https://doi.org/10.18632/aging.203961 (2022).
Mouttham, L. & Castelhano, M. G. Purpose, Partnership, and Possibilities: The Implementation of the Dog Aging Project Biobank. Biomark Insights 17, 11772719221137217, https://doi.org/10.1177/11772719221137217 (2022).
Alexander, J. E. et al. The MARS PETCARE BIOBANK protocol: establishing a longitudinal study of health and disease in dogs and cats. BMC Vet Res 19, 125, https://doi.org/10.1186/s12917-023-03691-4 (2023).
Lindblad-Toh, K. et al. Genome sequence, comparative analysis and haplotype structure of the domestic dog. Nature 438, 803–819, https://doi.org/10.1038/nature04338 (2005).
Schoenebeck, J. J. & Ostrander, E. A. Insights into Morphology and Disease from the Dog Genome Project. Annual Review of Cell and Developmental Biology 30, 535–560, https://doi.org/10.1146/annurev-cellbio-100913-012927 (2014).
González-Arostegui, L. G., Rubio, C. P., Cerón, J. J., Tvarijonaviciute, A. & Muñoz-Prieto, A. Proteomics in dogs: a systematic review. Res Vet Sci 143, 107–114, https://doi.org/10.1016/j.rvsc.2021.12.026 (2022).
Ahn, H. S. et al. Generating Detailed Spectral Libraries for Canine Proteomes Obtained from Serum and Urine. Sci Data 10, 241, https://doi.org/10.1038/s41597-023-02139-6 (2023).
Miller, I. et al. Some more about dogs: Proteomics of neglected biological fluids. J Proteomics 218, 103724, https://doi.org/10.1016/j.jprot.2020.103724 (2020).
Aebersold, R. & Mann, M. Mass-spectrometric exploration of proteome structure and function. Nature 537, 347–355, https://doi.org/10.1038/nature19949 (2016).
Carrasco-Zanini, J. et al. Proteomic signatures improve risk prediction for common and rare diseases. Nat Med 30, 2489–2498, https://doi.org/10.1038/s41591-024-03142-z (2024).
Collins, B. C. et al. Multi-laboratory assessment of reproducibility, qualitative and quantitative performance of SWATH-mass spectrometry. Nature Communications 8, 291, https://doi.org/10.1038/s41467-017-00249-5 (2017).
Schubert, O. T. et al. Building high-quality assay libraries for targeted analysis of SWATH MS data. Nat. Protocols 10, 426–441, https://doi.org/10.1038/nprot.2015.015 (2015).
Kusebauch, U. et al. A comprehensive spectral assay library to quantify the Halobacterium salinarum NRC-1 proteome by DIA/SWATH-MS. Sci Data 10, 697, https://doi.org/10.1038/s41597-023-02590-5 (2023).
Midha, M. K. et al. A comprehensive spectral assay library to quantify the Escherichia coli proteome by DIA/SWATH-MS. Scientific Data 7, 389, https://doi.org/10.1038/s41597-020-00724-7 (2020).
Blattmann, P. et al. Generation of a zebrafish SWATH-MS spectral library to quantify 10,000 proteins. Scientific Data 6, 190011, https://doi.org/10.1038/sdata.2019.11 (2019).
Zhong, C.-Q. et al. Generation of a murine SWATH-MS spectral library to quantify more than 11,000 proteins. Scientific Data 7, 104, https://doi.org/10.1038/s41597-020-0449-z (2020).
Rosenberger, G. et al. A repository of assays to quantify 10,000 human proteins by SWATH-MS. Scientific Data 1, 140031, https://doi.org/10.1038/sdata.2014.31 (2014).
Doulidis, P. G., Kuropka, B., Frizzo Ramos, C., Rodríguez-Rojas, A. & Burgener, I. A. Characterization of the plasma proteome from healthy adult dogs. Front Vet Sci 11, 1356318, https://doi.org/10.3389/fvets.2024.1356318 (2024).
Doulidis, P. G., Ramos, C. F., Rodriguez-Rojas, A., Roth-Walter, F. & Burgener, I. A. Properdin, transcortin and HGFAC are novel plasma biomarkers in canine chronic inflammatory enteropathies from active disease to remission. Sci Rep 15, 25443, https://doi.org/10.1038/s41598-025-11474-0 (2025).
Gaither, C., Popp, R., Borchers, C. H., Beaudry, F. & Desmarchelier, M. Plasma Proteomic Profiling of a Group of Anxious Dogs by LC-MS/MS: A Case-Control Study. Proteomics Clin Appl 19, e70014, https://doi.org/10.1002/prca.70014 (2025).
Huber, L. et al. Plasma proteome signature of canine acute haemorrhagic diarrhoea syndrome (AHDS). PLoS One 19, e0297924, https://doi.org/10.1371/journal.pone.0297924 (2024).
Martinković, F. et al. Data Independent Acquisition Reveals In-Depth Serum Proteome Changes in Canine Leishmaniosis. Metabolites 13, https://doi.org/10.3390/metabo13030365 (2023).
Midha, M. K. et al. DIALib-QC an assessment tool for spectral libraries in data-independent acquisition proteomics. Nature Communications 11, 5251, https://doi.org/10.1038/s41467-020-18901-y (2020).
Kusebauch, U. E. A. PRIDE Archive https://identifiers.org/pride.project:PXD052932 (2024).
Kusebauch, U. E. A. PRIDE Archive https://identifiers.org/pride.project:PXD052902 (2024).
Leutert, M., Rodríguez-Mias, R. A., Fukuda, N. K. & Villén, J. R2-P2 rapid-robotic phosphoproteomics enables multidimensional cell signaling studies. Mol Syst Biol 15, e9021, https://doi.org/10.15252/msb.20199021 (2019).
Escher, C. et al. Using iRT, a normalized retention time for more targeted measurement of peptides. Proteomics 12, 1111–1121, https://doi.org/10.1002/pmic.201100463 (2012).
Martens, L. et al. mzML - a Community Standard for Mass Spectrometry Data. Molecular & Cellular Proteomics 10, R110.000133, https://doi.org/10.1074/mcp.R110.000133 (2011).
Kessner, D., Chambers, M., Burke, R., Agus, D. & Mallick, P. ProteoWizard: open source software for rapid proteomics tools development. Bioinformatics 24, 2534–2536, https://doi.org/10.1093/bioinformatics/btn323 (2008).
Eng, J. K., Jahan, T. A. & Hoopmann, M. R. Comet: An open-source MS/MS sequence database search tool. PROTEOMICS 13, 22–24, https://doi.org/10.1002/pmic.201200439 (2013).
Kong, A. T., Leprevost, F. V., Avtonomov, D. M., Mellacheruvu, D. & Nesvizhskii, A. I. MSFragger: ultrafast and comprehensive peptide identification in mass spectrometry-based proteomics. Nat Methods 14, 513–520, https://doi.org/10.1038/nmeth.4256 (2017).
Rad, R. et al. Improved Monoisotopic Mass Estimation for Deeper Proteome Coverage. J Proteome Res 20, 591–598, https://doi.org/10.1021/acs.jproteome.0c00563 (2021).
Deutsch, E. W. et al. Trans-Proteomic Pipeline, a standardized data processing pipeline for large-scale reproducible proteomics informatics. PROTEOMICS – Clinical Applications 9, 745–754, https://doi.org/10.1002/prca.201400164 (2015).
Deutsch, E. W. et al. Trans-Proteomic Pipeline: Robust Mass Spectrometry-Based Proteomics Data Analysis Suite. J Proteome Res 22, 615–624, https://doi.org/10.1021/acs.jproteome.2c00624 (2023).
Keller, A., Nesvizhskii, A., Kolker, E. & Aebersold, R. Empirical statistical model to estimate the accuracy of peptide identifications made by MS/MS and database search. Anal Chem 74, 5383–5392, https://doi.org/10.1021/ac025747h (2002).
Shteynberg, D. et al. iProphet: Multi-level Integrative Analysis of Shotgun Proteomic Data Improves Peptide and Protein Identification Rates and Error Estimates. Molecular & Cellular Proteomics 10, M111.007690, https://doi.org/10.1074/mcp.M111.007690 (2011).
Shteynberg, D. et al. reSpect: software for identification of high and low abundance ion species in chimeric tandem mass spectra. J Am Soc Mass Spectrom 26, 1837–1847, https://doi.org/10.1007/s13361-015-1252-5 (2015).
Nesvizhskii, A., Keller, A., Kolker, E. & Aebersold, R. A statistical model for identifying proteins by tandem mass spectrometry. Anal Chem 75, 4646–4658, https://doi.org/10.1021/ac0341261 (2003).
Vizcaino, J. A. et al. ProteomeXchange provides globally coordinated proteomics data submission and dissemination. Nat Biotech 32, 223–226, https://doi.org/10.1038/nbt.2839 (2014).
Vizcaíno, J. A. et al. The PRoteomics IDEntifications (PRIDE) database and associated tools: status in 2013. Nucleic Acids Res 41, D1063–1069, https://doi.org/10.1093/nar/gks1262 (2013).
Desiere, F. et al. The PeptideAtlas project. Nucleic Acids Research 34, D655–D658, https://doi.org/10.1093/nar/gkj040 (2006).
Deutsch, E. W., Lam, H. & Aebersold, R. PeptideAtlas: a resource for target selection for emerging targeted proteomics workflows. EMBO Rep 9, 429–434, https://doi.org/10.1038/embor.2008.56 (2008).
Bislev, S. L. et al. A Bovine PeptideAtlas of milk and mammary gland proteomes. PROTEOMICS 12, 2895–2899, https://doi.org/10.1002/pmic.201200057 (2012).
Vialas, V. et al. A Candida albicans PeptideAtlas. J Proteomics 97, 62–68, https://doi.org/10.1016/j.jprot.2013.06.020 (2014).
van Wijk, K. J. et al. The Arabidopsis PeptideAtlas: Harnessing worldwide proteomics data to create a comprehensive community proteomics resource. Plant Cell 33, 3421–3453, https://doi.org/10.1093/plcell/koab211 (2021).
Saleem, R. A. et al. Integrated phosphoproteomics analysis of a signaling network governing nutrient response and peroxisome induction. Molecular & cellular proteomics: MCP 9, 2076–2088, https://doi.org/10.1074/mcp.M000116-MCP201 (2010).
Bergström Lind, S. et al. Immunoaffinity enrichments followed by mass spectrometric detection for studying global protein tyrosine phosphorylation. J Proteome Res 7, 2897–2910, https://doi.org/10.1021/pr8000546 (2008).
Martyn, G. D. et al. Engineered SH2 Domains for Targeted Phosphoproteomics. ACS Chem Biol 17, 1472–1484, https://doi.org/10.1021/acschembio.2c00051 (2022).
Zheng, W. B. et al. Proteomic alterations in the plasma of Beagle dogs induced by Toxocara canis infection. J Proteomics 232, 104049, https://doi.org/10.1016/j.jprot.2020.104049 (2021).
Kusebauch, U. et al. in Current Protocols in Bioinformatics 46:13.25.41-13.25.28. (John Wiley & Sons, Inc., 2014).
Acknowledgements
This project was supported financially in part by Mars Advanced Research Institute and Mars Petcare, both subsidiaries of Mars Inc., by the National Institutes of Health grants from the National Institute for General Medical Sciences (NIGMS) R01GM087221 (RLM, EWD), the Office of the Director S10OD026936 (RLM), the National Science Foundation awards 1920268 (RLM) and DBI-2324882 (EWD). We thank Christopher Moss (Institute for Systems Biology) for technical assistance and Luis Mendoza (Institute for Systems Biology) for supporting the canine PeptideAtlas website.
Author information
Authors and Affiliations
Contributions
U.K., D.A., P.W. and R.L.M. conceptualized study, D.A. provided samples, M.M., D.H.B., S.R.M. and P.J.R. prepared samples, M.K.M. and P.J.R. acquired data, U.K., Z.S., M.K.M., C.K., C.O.F. and E.W.D. analyzed and interpreted data, Z.S. generated PeptideAtlas build, M.K.M. generated spectral assay library, C.K. performed DIALib-QC assessment, U.K. wrote manuscript with contributions from all authors.
Corresponding authors
Ethics declarations
Competing interests
C.O.F., P.W. and D.A. are employees of Waltham Petcare Science Institute, a part of Mars Petcare. The funders had no role in the design, analysis or writing of this article.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
41597_2026_6647_MOESM1_ESM.pdf (download PDF )
Supplementary information: A Labrador PeptideAtlas and DIA spectral assay library - resources for proteomics research in dogs
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Kusebauch, U., Sun, Z., Midha, M.K. et al. A Labrador PeptideAtlas and DIA spectral assay library - resources for proteomics research in dogs. Sci Data 13, 524 (2026). https://doi.org/10.1038/s41597-026-06647-z
Received:
Accepted:
Published:
Version of record:
DOI: https://doi.org/10.1038/s41597-026-06647-z







