Introduction

Hepatocellular carcinoma (HCC) is the 3rd leading cause of cancer-related death worldwide and a leading cause of cancer-related death in patients with cirrhosis1. Prognosis differs by tumor stage, with curative treatments facilitating long-term survival in patients with early-stage disease2. Ultrasound-based surveillance misses over one-third of HCC at an early stage, particularly in patients with non-viral etiologies, such as metabolic dysfunction associated steatotic liver disease (MASLD)3,4. Alpha-fetoprotein (AFP) is the only biomarker to complete all five phases of biomarker validation but has insufficient sensitivity for early-stage HCC detection, highlighting a need for novel biomarkers5,6.

N-glycosylation of glycoproteins is known to be dysregulated in many cancer types and has been shown to be aberrant in both tissue and serum of patients with HCC7,8,9,10,11,12,13,14,15,16,17,18. N-glycosylation on the tumor surface has been demonstrated to impact both HCC tumor proliferation and metastasis through the upregulation of growth factor receptor signaling19. Previous work has shown an increase in branched and fucosylated N-glycans in HCC tissue when compared to cirrhotic or normal liver tissue18. Additionally, abnormal N-glycosylation has been observed in HCC serum, which includes the differential N-glycosylation of several identified biomarkers of HCC, such as AFP-L313,14. However, prior studies have not connected tissue N-glycosylation with serum N-glycosylation. Aberrant N-glycosylation is often heterogeneous between tumors, and there is limited understanding of how tumor-specific N-glycosylation translates to serum glycoproteins, even though many of those glycoproteins originate from the liver. Recent advancements have been made in the analysis of N-glycans via spatial omics matrix-assisted laser desorption-ionization (MALDI) imaging mass spectrometry (IMS) for both tissue and serum samples, including for the specific analysis of serum glycoproteins through antibody arrays18,20,21,22.

To address the disconnect between HCC-correlated tissue N-glycosylation and HCC-correlated serum N-glycosylation, and to find more cancer-related biomarkers of HCC, we utilized a sample cohort of 53 early-stage HCC tissue samples from 48 patients, including 23 samples with of matched tumor tissue, background liver tissue, and serum samples. These samples were analyzed with MALDI-IMS N-glycan imaging workflows and identified specific glycans that were altered in tissue and serum. Subsequently, glycoproteomics identified the serum glycoproteins that contained these glycan changes. These biomarkers were then analyzed using a high throughput slide-based glycan analysis method, called the GlycoTyper23. This allowed for the simultaneous glycan analysis of the identified glycoproteins in 201 patients, 100 of whom had HCC in the background of cirrhosis and 101 with cirrhosis alone. Combinations of altered glycans on the identified glycoproteins were incorporated into a machine-learning algorithm, that also included age, gender, and AFP, to develop biomarkers of HCC.

Materials and methods

Patient tissues and serum

Formalin-fixed paraffin-embedded (FFPE, 5 µm thick) tissues were sectioned from 53 HCC tissue blocks that were obtained through surgical resection of HCC in the background of liver cirrhosis. Samples were provided from the UT Southwestern Medical Center under the UT Southwestern Medical Center approved IRB 102010-051 and patients were consented for research use. Additionally, 23 serum samples that matched the patients of obtained tissue samples were also provided from the UT Southwestern Medical Center under the same IRB, and patients consented to research use. All N-glycan imaging work at MUSC was performed under MUSC-approved IRB Pro00079936. HCC tissues were subtyped according to the Hoshida.

Classification system as done previously24. All tissues were H&E-stained following MALDI-IMS analysis, and tumor regions were annotated by a liver pathologist. Patient characteristics of the 48 patients (which provided the 53 tumors) and 23 cases with matching serum are found in Table 1.

Table 1 Demographic/clinical information of tissue sample cohort

A second set of serum samples was obtained from the University of California at San Diego25,26,27 and consisted of 100 samples from patients with HCC in the background of cirrhosis and 101 samples from patients with cirrhosis alone (Table 2). In all cases, serum samples were obtained via a study protocol approved by the appropriate Institutional Review Board and written informed consent was obtained from each subject. Blood samples for HCC patients were drawn prior to initiation of HCC treatment.

Table 2 Patients used for biomarker analysis

For all patients, diagnosis of cirrhosis was based on liver histology or clinical, laboratory and imaging evidence of hepatic decompensation or portal hypertension. Each non-HCC patient had a normal ultrasound; if serum AFP were elevated, a CT or MRI showed no liver mass. For HCC patients, the diagnosis of HCC was made per AASLD guidelines28 using histopathology or characteristic diagnostic imaging. Early-stage HCC was defined using the Milan Criteria29. As previously described, demographic and clinical information were obtained, and a blood sample was collected from each subject prior to initiation of any HCC treatment27. A 20-ml blood sample was drawn from each subject, spun, aliquoted, and serum stored at –80 °C until testing.

FFPE tissue preparation for MALDI-IMS

HPLC grade methanol, ethanol, acetonitrile, xylene, and water were obtained from Fisher Scientific (Pittsburgh, PA). Trifluoroacetic acid and α-cyano-4- hydroxycinnamic acid were obtained from Sigma-Aldrich (St. Louis, MO). Peptide-N-glycosidase F (PNGase F) Prime and Sialidase Prime were obtained from N-Zyme Scientific (Doylestown, PA).

FFPE tissues were prepared according to a previously published protocol30. Tissue Tack microscope slides were purchased from Polysciences Inc (Warrington, PA, Catalog No. 24216). Slides were taken through dewaxing and wash steps, before being processed by antigen retrieval in a Decloaking Chamber in 10 mM citraconic anhydride buffer, pH 3. PNGase F Prime and Sialidase Prime were simultaneously applied using a M5 TM-Sprayer Tissue MALDI Sample Preparation System (HTX Technologies, LLC, Chapel Hill, NC), and enzymes were incubated for 2 h at 37 °C. MALDI matrix α-cyano-4-hydroxycinnamic acid (0.042 g CHCA in 6 mL 50% acetonitrile/49.9% water/0.1% TFA) was sprayed by the M5 TM-Sprayer.

HPLC-grade methanol, ethanol, acetonitrile, xylene, and water were obtained from Fisher Scientific (Pittsburgh, PA). Trifluoroacetic acid and α-cyano-4- hydroxycinnamic acid were obtained from Sigma-Aldrich (St. Louis, MO). Peptide-N-glycosidase F (PNGase F) Prime and Sialidase Prime were obtained from N-Zyme Scientific (Doylestown, PA).

N-Glycomic total serum preparation

Serum samples were prepared for total serum N-glycan MALDI-IMS analysis through a previously published protocol21. Hydrogel-coated slides (Nexterion Slide H) were obtained from Applied Microarrays (Tempe, AZ). Serum samples were spotted and immobilized on the slide surface and then washed to remove salts and lipids from the sample. Each sample was spotted in triplicate. N-glycans were then enzymatically released through the same methodology of spraying/incubating PNGase F Prime/Sialidase Prime as tissue samples.

GlycoTyper

Serum samples were prepared for glycoprotein-specific N-glycan MALDI-IMS analysis through a previously published antibody array protocol20. Glycerol free antibodies were tested for their ability to bind to target glycoprotein, but not any other glycoproteins, using methods described elsewhere20. Only antibodies specific to their target were used for analysis. Antibodies were spotted on nitrocellulose-coated microscope slides (Grace Bio-Labs, Bend, OR). The following antibodies were used for the GlycoTyper experiment and purchased from Abcam (Cambridge, MA): Rabbit anti-Alpha 1 Antitrypsin (#ab240375), Rabbit anti-Ceruloplasmin (#ab249323), Rabbit anti-Clusterin (#ab229445). Mouse anti-Human Alpha 1B-Glycopotein (#MAB7757), Goat anti-Human alpha 1-Acid Glycoprotein (#AF3694), Mouse anti-Human Apolipoprotein H (#MAB5087), Mouse anti-Human Fetuin A/AHSG (#MAB1184), Mouse anti-Human HPRG (#MAB1869), Mouse anti-Human Vitamin D Binding Protein (#MAB3778) antibodies were obtained from R&D Systems (Minneapolis, MN). Mouse anti-Alpha-2-Macroglobulin (#2-MM3-P1) antibody was obtained from NeoBiotechnologies (Union City, CA). Mouse anti-angiotensinogen II/III (#NB100-62346) antibody was obtained from Novus Biologicals (Centennial, CO). Mouse anti-Apolipoprotein D (#10R-A137b) and Goat anti-Haptoglobin (#70R-7558) antibodies from obtained from Fitzgerald (Acton, MA). Mouse anti-Hemopexin (#ABA-133202) antibody was obtained from Fisher Scientific (Hampton, NH). Goat anti-Human IgG (#A80-104A) and Goat anti-Human Transferrin (#A80-128A) antibodies were obtained from Bethyl Laboratories (Montgomery, TX).

The antibodies were first diluted in Phosphate Buffered Saline 1× solution (PBS 10×, BP39920 from Fisher Scientific) and HPLC grade water (W5-4 from Fisher Scientific) at a final concentration of 40 ng/µl and using a Picus Electronic Single Channel Pipette from Sartorius (Goettingen, Germany), 1 µl of each antibody was spotted on Nitrocellulose slides (SKU470639 from Grace Bio-Labs) on the top of which were clipped ProPlate 8 well slide modules (Path Protein Microarray Slides, 25 × 25 × 1 mm, #805025 from Grace Bio-Labs, Bend, OR). Well, chambers were mounted to the slide to create separated regions for serum incubation. Each region had 8 unique antibodies spotted at specified positions, with 16 total unique antibodies for analysis. Serum samples were incubated in duplicate in each of the two antibody arrays. N-glycans were enzymatically removed via the same protocol as tissues and total serum analysis.

MALDI-IMS N-glycan imaging

Slides were imaged on a MALDI-TOF (timsTOF flex, Bruker Daltonics) mass spectrometer in positive ion mode (m/z 600–5000 for tissue and m/z 700–4000 for serum assays). For both tissue and serum, images were collected at a 150 µmM raster with 200 laser shots per pixel and at 100 µm raster for serum assay. Data was visualized and analyzed using SCiLSTM Lab 2022 for tissue and SCiLSTM Lab 2024a for serum (Bruker). Supplementary Data File 1 contains the raw intensity for all glycan in all tissue samples, and Supplementary Data File 2 contains raw intensity for all glycan in all serum samples. These raw values were used to generate the heat maps shown in Figures. Peaks were assigned to N-glycan structures utilizing a previously developed database with consideration for biosynthetic pathways of N-glycans18,31. Not all N-glycans observed in tissue were also seen in serum samples. Proposed N-glycan structures and corresponding m/z values can be found in Supplementary Data File 3, and those glycans whose level correlated in tissue and serum are found in Supplementary Table S1.

GlycoProteomics for identification of fucosylated peptides

Serum samples were prepared using the EasyPep 96 MS Sample Prep Kit (Thermo Fisher Scientific, Waltham, MA) with minor modifications to the manufacturer’s instructions. Briefly, 20 µl of serum was denatured with 80 jiL of lysis solution. Cysteine residues were reduced with iodoacetamide and alkylated at 50 °C for 30 min. Proteins were incubated with Endoglycosidase F3 (Endo F3; N-Zyme Scientific) for 1 hour at 37 °C. Subsequently, samples were incubated with PNGase F (N-Zyme Scientific) for 2 h at 37 °C32. One jig of trypsin-Lys C endoprotease digestion mixture was added to each sample, and proteins were digested overnight at 37 °C with shaking at 300 rpm. The reaction was quenched with a stop solution, and the samples were transferred to C18 spin columns to remove salts and detergents.

Eluted peptides were dried by vacuum centrifugation and resuspended in 1× PBS. HexNAcFuc-modified peptides were enriched with a recombinant Aleuria aurantia lectin (AAL N224Q) coupled to nickel beads33. Following enrichment, samples were acidified, and desalted by solid-phase extraction using ZipTip with 0.6 μL C18 resin (Millipore, Burlington, MA). Enriched peptides were dried under vacuum and resolubilized in 5% acetonitrile, 0.2% formic acid in LC-MS/MS grade water immediately prior to LC-MS/MS analysis.

LC-MS/MS acquisition

Peptides were separated and analyzed on with an Easy nLC 1200 in-line with the Orbitrap Exploris 480 mass spectrometer (Thermo Scientific, Waltham, MA) with instrument control software v. 4.2.28.14. Two µg of peptides were pressure loaded onto a C18 reversed-phase column (Acclaim PepMap RSLC, 75 µm × 25 cm (2 µm, 100 Å) Thermo Fisher cat. # 164941) and separated using a gradient of 0-35% B in 90 min at a flow rate of 300 nL/min. Solvent A was 5% acetonitrile, 0.2% formic acid and solvent B was 80% acetonitrile, 0.2% formic acid. Mass spectra were acquired in data-dependent mode with a high resolution (60,000) FTMS survey scan, a mass range of m/z 375-1575, followed by tandem mass spectra (MS/MS) of the most intense precursors with a cycle time of 3 sec. The automatic gain control was set to 300% for the survey MS scan and 100% for the MS/MS scan. HCD fragmentation was performed with a precursor isolation window of 1.4 m/z, a maximum injection time of 40 ms, and HCD collision energy of 33%. The MS/MS scan was acquired at 15,000 resolution. Monoisotopic-precursor selection was set to “peptide”. Precursors within 10 ppm mass tolerance were dynamically excluded from resequencing for 20 sec. Advanced peak determination was enabled. Precursor ions with charge states that were undetermined, 1, or >6 were excluded.

Database searching and quantitation

Raw files were searched using MaxQuant v2.4.2.0 (Max Planck Institute) against a human UniProt protein database (20,422 sequences, updated March 2023) and a reversed, decoy database. Fixed modification of cysteine with carboxyamidomethylation and variable oxidation of methionine, phosphorylation of serine or threonine, and protein N-terminal acetylation were included. Custom modifications for O-HexNAcylated serine or threonine (OGlcNAc_NL_D) and N-linked HexNAc fucose modification of asparagine (HexNAcFuc) were created in Andromeda. The O-linked HexNAc included neutral loss of C8H13NO5 (203.0794 Th) and diagnostic HexNAc ions. The N-linked HexNAcFuc modification had a composition of C14H23NO9 (349.1373 Th), a neutral loss composition of C6H10O4 (146.0579 Th), and diagnostic HexNAc ions. HexNAc ion masses are listed in Supplementary Data File 4: Tab 2. Two missed cleavages were permitted. The final precursor and fragment mass tolerances were less than 4.5 ppm. For identification, a 1% false discovery rate (FDR) threshold was required at the spectrum, peptide, and protein levels. Matching between runs was enabled. Data were processed in Perseus v1.6.15.0 (Max Planck Institute). Peptide intensity values from the modification-specific peptide table were log2 transformed and median normalized. After normalization, a value of 15 was added to each intensity to avoid negative intensity values. The list of peptides was filtered to remove reversed database hits, peptides without HexNAc Fucose, and peptides with more than 3 modifications. The fold change in mean log2 peptide intensities from HCC and cirrhotic serum samples was compared and visualized by plotting against the Student’s t test −log10 p value. The mass spectrometry proteomics data have been deposited to the ProteomeXchange Consortium via the PRIDE partner repository34 with the dataset identifier PXD047546.

Statistical data analysis

In both tissue and serum, all N-glycans of the peak list were normalized to relative abundance to yield a relative quantification, this normalization enhanced robustness.

Data distributions of each N-glycan were visualized, outliers were identified, they were re-valuated if required.

Univariate statistical methods were applied to each N-glycan data, such as the distribution of each N-Glycan, change of mean and median between HCC and cirrhosis, and corresponding AUC were evaluated. The association between the “interesting” glycans were analyzed by clustering and correlation analysis.

The data distribution of glycan from matched tissue and serum were tested by Shapiro-Wilk test. If the results indicated they were not significantly different from a normal distribution, we derived Pearson correlation coefficients and obtained the p value, otherwise Spearman Rank correlation test was applied.

Predictive modeling with machine learning algorithm

To avoid a bias caused by the scale of the different N-glycans in the building of machine learning models, z-transformation was applied to each variable. The data used for the development of the SOML was considered high dimensional data with 16 proteins, each with 83 N-glycans, for a total of 1328 variables (Supplementary Data File 5). We applied feature selection to combat issues often related to high dimension of Omics data. Filter techniques, such as, Information Gain, Importance, Joint Mutual Information Maximization, and AUC of each N-Glycan were evaluated; redundant, non-informative and highly correlated variables were removed from analysis.

Subsequently, we applied wrapper methods to search subsets of glycans to optimize performance of discrimination and prediction (internal Cross-Validation) criterion by AUC. Sensitivity and specificity, and partial ROC-9027 were the criteria for further search.

We prepared a set of candidate machine learning algorithms based on excellent performance in our previous glycan studies, such as, support vector machines, Gradient Boosting Machine, Logistic regression, Random Forest, Bayesian additive regression Tree, Gradient Boosting with Component-wise Linear model, Gradient Boosting for Additive Model, Multivariate Adaptive Regression Splines, Extreme Gradient Boosting, Logistic Model Tree, etc35,36,37.

Next, we analyzed each candidate machine learning algorithm one by one to explore the feature spaces which were established by random feature generation algorithm, sequential selection algorithm, genetic algorithm, recursive feature elimination algorithm in threefold Cross-Validation searching settings. Selective criterion on features was top performances in prediction of the threefold cross-validation.

We further explored performance of the optimal subsets of features which derived from previous feature selection with corresponding machine learning algorithms. The discriminant ability was explored by Apparent Validation(AP), and predictive ability was evaluated by Leave-One-Out-Cross-Validation (LOOCV), 3-fold Cross-Validation, 5-fold Cross-Validation as we have done previously37,38.

To ensure the robustness of the selected algorithm, we performed random shuffling of the samples 200 times for the 3-fold and 5-fold CV (Repeated k-fold cross-validations) to ensure the robustness of the final algorithms.

Generally, every statistical test appropriate based on corresponding data type and data distribution, two-sided tests was utilized unless stated otherwise, p value < 0.05 was considered statistical significance. All analysis was conducted using R (Version 4.3.0)/GraphPad Prism (10.1.0)/Python(3.12).

Inclusion and ethics statement

This study involved researchers in the United States of American, primarily in the states of Texas and South Carolina. Many of the individuals in both these states were involved in the study design, study implementation, data ownership, intellectual property, and authorship of the publications. For the work presented here, roles and responsibilities were agreed amongst collaborators ahead of the research. The use of these human tissue and samples was approved by the local ethics review committee at MUSC. No animals were used in this study. This research will not result in stigmatization, incrimination, discrimination, or otherwise personal risk to anyone. Liver cancer is a major worldwide cancer and one that is increasing dramatically in the state of South Carolina.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Results

We analyzed 53 HCC tissue sections by spatial MALDI glycan imaging (Table 1). Nearly all patients (98.1%) had early-stage HCC, with a median of 1.00 lesions (25%:1.0,75%: 1.0; SD: 0.28) and median tumor size of 2.8 cm (25%:2.1, 75%:4.2; SD:2.5 cm). Most patients (89%) had Child-Pugh A cirrhosis, and the most common etiologies were hepatitis C infection, alcohol-induced liver disease, and NASH. As expected, HCC tissues were associated with significant glycan heterogeneity. Tumors had a mean of 15.7 (SD: 9.7) N-glycan alterations with a 1.5-fold or greater change per tumor. All tumors had at least one alteration in N-linked glycosylation, with the mean of the most altered N-glycan being 11.8-fold increased (range 1.6 to 220.9).

Two of the most common N-glycan alterations in HCC tissue were increases in fucosylation and increases in branching. In Fig. 1, IMS data is shown with a color intensity scale for all images (red is most abundant, and blue is least abundant). Fig. 1a–e highlights five representative HCC tissues that were associated with alterations in fucosylation, along with the other N-glycan changes found concomitantly with these specific tumors. For each panel, the H&E-stained tissue is shown with four representative images of altered glycans in the tumor tissue as compared to the surrounding non-transformed tissue. In all cases, the tumor tissue is highlighted on the H&E-stained image. As Fig. 1a–e show, pathology annotated HCC tissues were associated with increased amount of fucosylated N-glycan at m/z 2174.774 (Hex6HexNAc5Fuc1), m/z 2320.829 (Hex6HexNAc5Fuc2), m/z 2539.907 (Hex7HexNAc6Fuc1), m/z 2685.964 (Hex7HexNAc6Fuc2) and m/z 2905.040 (Hex8HexNAc9Fuc1). Based on the compositions, these are presumed to be branched (tri-antennary and higher), and fucosylated N-glycan, but other fucosylated N-glycan changes were also observed (Supplementary Figs. S1S53). Often, this tissue also had alterations in high mannose N-glycan (m/z 1257.424; Hex5HexNAc2 and m/z 1419.479; Hex6HexNAc2), and these are shown in Fig. 1a–7d. Of the 53 HCC tissues, increased levels of branched N-glycan (tri-antennary or higher) containing fucose were found in 52% of the tissue samples examined. The spatial distribution and abundance of the top 3–5 N-glycan in all 53 HCC tissues is shown in Supplementary Figs. S1S53. The complete list of glycans identified in all tissue sections is also provided in Supplementary Data file 3

Fig. 1: N-glycan heterogeneity in HCC tissue.
figure 1

aj H&E stain of ten individual tissue sections with the pathology annotated cancer lesion highlighted, along with four MALDI imaging images showing specific alterations in N-linked glycosylation. M/Z values are given above or below each image. ae Five individual tissues with predominantly changes in fucosylation; fj Five individual tissues with changes in branching but not fucosylation. k Heat map showing N-glycan clusters in different patient groups. l Correlation plot showing that N-glycan alteration occurs in clusters. aj Color scale is used to represent the intensity of spectra, with blue being the weakest and red being the strongest. kl, the color and size scale and associated with each panel is indicated.

Tissues that lacked major changes in fucosylated N-glycans were associated primarily with increased levels of a N-glycan at m/z 2393.848 (Hex7HexNAc6), presumed to be a tetra-antennary N-glycan devoid of fucosylation. Figure 1f–j highlights five representative HCC tissues that fell into this category. While all these have increased levels of the m/z 2393.848 N-glycan within the HCC tissue, very few other N-glycan alterations were observed. Increased levels of the m/z 2393.848 N-glycan were found in 53% of the tissue samples examined. The most abundant secondary glycans were high mannose glycans (m/z 1257.424, 1419.479, 1743.587; Hex7HexNAc2) but there was significant heterogeneity in the other glycans altered in tissue (Supplementary Fig. S153).

Figure 1k is a heat map of an abbreviated list of glycans that were observed in all tissues examined (the full glycan heat map is presented in Supplementary Fig. S54). Every patient haa at least one N-glycan that was increased by 1.5-fold in the HCC tissue as compared to the adjacent normal tissue. When all tissues were examined, the most abundant type of N-glycan alteration was fucosylation (on any core structure), which was observed in the tumor regions of 60% of tissue samples example. N-glycan alterations were weakly associated with tumor subtype. Increased fucosylated N-glycan was found in 71% of the Hoshida S1 type tumors, 64% of the S2 type tumors but only 31% of the S3 type tumors. In contrast, S3 type tumors had a greater level of a tetra-antennary branched N-glycan devoid of fucosylation (50%) as compared to the S1 (21%) or S2 (18%) tumors. Fig. 1l highlights that alterations in specific glycan classes, such as fucosylation of specific structures were associated with broad overall alterations in that specific class of glycosylation (such as fucosylation). Supplementary Fig. S55 shows the relationship between all glycans identified.

Identification of cancer-specific N-linked glycans in matching tissue and serum

Paired serum samples were available in 23 patients to allow correlation between the glycans found in tissue and those found in serum. Characteristics of these 23 patients were like the entire HCC cohort (Table 1). Specifically, over 90% of patients had Child-Pugh A cirrhosis and early-stage HCC. Serum samples from 23 patients that had matching tissues were analyzed for total N-glycosylation content using a recently developed MALDI-IMS serum N-glycan profiling method (Fig. 2a)21. The goal was to determine which glycans in serum correlated with the level of N-glycan in the matched tissue. Fig. 2b–d shows an example of the N-glycan heterogeneity observed in 14 representative HCC patients. Fig. 2b–d highlights the abundance of N-glycan at m/z 2320.829, 2393.848 m/z, and 2539.907 in these patients. As these Figures show, patients had variations in the glycans expressed (patients were analyzed in triplicate). For example, Patient 1, has a low level of the N-glycan at 2320.84, very high expression of the N-glycan at 2393.848 and no expression of a N-glycan at 2539.907. In contrast, patient 12, has high expression of the N-glycan at 2539.907 but lower levels of the glycans at 2320.829 and 2393.848. Pearson correction or Spearman rank correction was determined for all N-glycans in the m/z peak list, which allowed for the correlation between total serum N-glycosylation and tumor-specific tissue N-glycosylation. Fig. 2e and Supplementary Table S1 shows the relationship between the glycans whose levels in HCC tissue and in serum were correlated. Results showed that 13/73 (23%) N-glycans were statistically positively correlated between tumor and serum. The majority of these (11/13) were fucosylated N-glycan (indicated with the asterisks). All of these were positively correlated, which suggests that HCC tumor and serum N-glycosylation patterns are related and importantly, trends on tumor tissue are represented in serum.

Fig. 2: Glycans observed in tissue are also observed in matching serum.
figure 2

a Method of serum N-glycan analysis. bd Serum N-linked analysis from 14 representative HCC patients for whom matched tissue N-glycan analysis was performed. Samples were analyzed in triplicate and sample numbers are shown either above, below, or adjacent to spots. b is for the N-glycan at m/z 2320.829, c for the N-glycan at m/z 2393.848 and d for the N-glycan at m/z 2539.907. The control sample (purchased healthy serum sample) is shown in triplicate under sample #4 in the red box. e Graph showing the relationship between positively correlated tissue and serum glycans. In all figures m/z values are included. Asterisk indicates those glycans that are fucosylated. b A color scale is used to represent intensity of spectra, with blue being weakest and red being strongest.

Glycoproteomics to identify altered glycoproteins

The major N-glycan changes observed in both tissue and matching serum included fucosylation and, given the nature of the N-glycan (on agalactosylated structures or multiply fucosylated) it was presumed that many, if not most, were core fucosylated. To identify those serum proteins that contained these N-glycan changes, we used a glycoproteomic method (Fig. 3a) that utilized sequential enzymatic digestion of glycoproteins with Endoglycosidase F3 (Endo F3) followed by PNGase F32. A limited number of serum samples were analyzed, and Supplementary Table S2 contains the information regarding the identified fucosylated glycopeptides and glycoproteins. A volcano plot shows all the glycopeptides that were altered in HCC, and as Fig. 3b shows, the number of fucosylated glycopeptides increased in the HCC samples. Indeed, 541 glycopeptides (peptide with an attached N-acetyl-glucosamine and fucose residue) were identified and 360 of these were increased in the HCC samples (67%). Fig. 3c–e highlights some of the glycopeptides from the proteins identified including glycopeptides from clusterin (Fig. 3c); alpha-2 macroglobulin (Fig. 3d) and hemopexin (Fig. 3e). Supplementary Data file 4 contains the source data for Fig. 3b–e. Of these, the fucosylated glycopeptides from hemopexin were found to be decreased in the HCC samples, as compared to the cirrhotic samples. Changes in the relative abundance of the glycopeptides could also result from alterations in protein abundance or other modifications of the peptides such as methionine oxidation, which was elevated in the cirrhotic samples. It is noted that many of these glycoproteins have been previously identified as part of the fucosylated proteome16,39,40,41,42,43,44.

Fig. 3: Glycoproteomics for the analysis of fucosylated glycoproteins.
figure 3

a Proteomic workflow. b Volcano plot showing altered glycopeptides in HCC. ce Three glycopeptides that were found to be increased in HCC samples, c is a glycopeptide from clusterin, d is a peptide from alpha-2 macroglobulin and e is a glycopeptide from hemopexin. f Workflow for the GlycoTyper, used to orthogonally confirm N-glycan alterations. g GlycoTyper analysis of clusterin, showing either changes in total fucosylation (left panel) or on a specific N-glycan (m/z 2539.907), h GlycoTyper analysis of alpha-2-macroglobulin (A2M), showing either changes in total fucosylation (left panel) or on a specific N-glycan (m/z 2174.772), i GlycoTyper analysis of hemopexin, showing either changes in total fucosylation (left panel) or on a specific N-glycan (m/z 2539.907). Where statistical difference exists, the p value is provided. Graphs include the mean and the 95% CI.

To orthogonally analyze the identified glycoproteins, we analyzed the same patient samples used for glycoproteomics, by a recently developed MALDI-IMS antibody array workflow, which allows for the specific capture, and subsequent N-glycan analysis, of glycoproteins of interest23; thus allowing for validation of altered glycosylation. Fig. 3f outlines the workflow of this approach and Fig. 3g–i highlights the same three glycoproteins shown in Fig. 3c–e. For two of these, the trend is the same, increased levels of total fucosylation are observed in most notably observed on a N-glycan at m/z 2539.907 (Hex7HexNAc6Fuc1; Fig. 3g) or 2174.772 (Hex6HexNAc5Fuc1; Fig. 3h), which were glycans also observed as being increased in tissue (Fig. 1). Surprisingly, hemopexin, which was decreased in the glycoproteomics, also had increased levels of fucosylation, which was most clearly observed with the N-glycan at 2539.907 (Fig. 3i). Supplementary Data file 5 contains the source data for Fig. 3b–e. This implies that the fucose present on this hemopexin N-glycan was not on the core N-acetylglucosamine but attached to one of the antennae N-acetylglucosamines or the total level of hemopexin was reduced, which would impact the glycoproteomics but not the GlycoTyper.

Development of SOML algorithms for the early detection of HCC

To examine the biomarker potential of these abundant serum glycoforms, we utilized the same method in Fig. 3f for the simultaneous examination of glycoproteins in a case-control patient set including 100 cases with HCC and cirrhosis and 101 controls with cirrhosis (Table 2). Most patients were male and had underlying hepatitis C infection. Patients were diverse regarding race/ethnicity with 45% White, 12% Black, and 39% Hispanic.

Sixteen glycoproteins were selected for analysis based on results of the glycoproteomic experiments in Fig. 3 and availability of appropriate antibodies. In total there were 83 N-linked glycans that were associated with the 16 glycoproteins analyzed. The source data for Fig. 4 is found in Supplementary Data file 6. The major N-glycans found to be altered on the examined glycoproteins included glycans such as m/z 2320.829, m/z 2539.908, and m/z 1257.424, which were also found directly on HCC tissue and in the matching serum (Fig. 4b–k). No individual glycoprotein or glycoform examined had an AUC > 0.7 (Supplementary Table S3). However, when the unique glycoforms were combined, biomarker algorithms could be obtained. Table 3 presents seven potential algorithms that were created using a support vector machine with a linear kernel. These were referred to as Spatial Omics associated Machine learning (SOML) HCC algorithms. Glycoform combinations had AUROC values ranging from 0.90 to 0.97 in apparent validation and 0.88 to 0.91 in internal 5-fold cross-validation analyses (Table 3). Combinations included 7 to 39 glycoforms from up to 15 different glycoproteins. As we have done before26,27,35,36,37,45, age, gender, and AFP were also included in the algorithm to improve diagnostic performance46,47. Supplementary Table S412 shows the glycoforms associated with the different algorithms and the relative role of all factors in algorithm performance. As Table 3 and Fig. 4l show, the best algorithm had an AUROC of 0.97 in all patients. This SOML algorithm increased the sensitivity of HCC detection at 90% specificity to 95% as compared to AFP, which had an AUROC of 0.78, with a sensitivity of 47% at 90% specificity.

Fig. 4: GlycoTyper Method for the serum analysis of specific glycoproteins.
figure 4

a Slides are spotted with 16 antibodies (top to bottom in columns) to the following antibodies: (1) anti-Alpha 1 Antitrypsin, (2) anti- Alpha 1B-Glycopotein, (3) anti-alpha 1-Acid Glycoprotein, (4) anti-Alpha-2-Macroglobulin, (5) anti-Angiotensinogen II/III, (6) anti-Apolipoprotein D, (7) anti-Apolipoprotein H (ApoH), (8) anti-Ceruloplasmin, (9) anti-Clusterin, 10) anti-Fetuin, 11) anti-Haptoglobin, 12) anti-Hemopexin, (13) anti histidine-proline rich glycoprotein; (14) anti-IgG, (15) anti-Transferrin; (16) anti-Vitamin D Binding Protein. B-D) Example of GlycoTyper data for the 16 captured proteins for N-glycan at 2539.881 (b, e), N-glycan at 2174.654 (c, f) and N-glycan at 1809.639 (d, g). bd are from patients with HCC, while eg are from patient with cirrhosis. h Bar graph with individual data points showing the mean (with SD) level of N-glycan at 2539.907 on haptoglobin; i Bar graph with individual data points showing the mean (with SD) level of N-glycan at 1663.581 on transferrin. j, k Similar analysis on angiotensinogen (j) hemopexin (k). l ROC curves for Model G, AFP and AFP-L3 of discriminant ability to classify all or early-stage HCC from cirrhosis. Where statistical difference exists, the p value is provided and error bars indicate the standard deviation.

Table 3 SOML algorithms

aOf the 100 patients with HCC, 41% had early-stage cancer (Table 2), and the best SOML algorithms had a similar performance in this subset of patients, as compared to all patients. As Fig. 4l shows, the AUROC of of the model G algorithm was 0.94, with a sensitivity of 88% at 90% specificity. In contrast, AFP performance was more limited in this subset with an AUROC of 0.74, with a sensitivity of 39% at 90% specificity. The best SOML algorithms improved upon this with an AUROC of 0.94, with a sensitivity of 90% at 90% specificity. The ROC curves for the fivefold cross-validation are shown in Supplementary Figs. S56S62.

Discussion

In recent years, aberrant N-glycosylation has been a major focus of cancer biomarker development, which includes altered N-glycosylation of known biomarkers such as AFP, PSA, and CA-19-948. It has been clearly demonstrated that N-glycosylation of cancer is widely dysregulated from non-cancer, although much of this work has been performed on serum, with no link to what is occurring in the tissue. Recently, tissue-based N-glycan spatial omics has been used examine the glycans found in tissue but again, this work was independent of matched serum. To extend this work to clinically relevant biomarker development, it must be demonstrated that cancer-caused dysregulation of N-glycosylation found within the tumor, is apparent in matched serum as well as tumor tissue. To accomplish this, MALDI-IMS techniques have been developed, aiming to analyze N-glycosylation and correlate with spatial glycomic analyses of tumor tissue. In this work, we utilized these serum MALDI-IMS techniques to analyze patient-matching serum and tissue samples to connect tumor-related tissue N-glycosylation to tumor-related serum N-glycosylation and create a liquid biopsy for N-linked glycans.

Regarding serum N-glycosylation, there was a remarkable amount of correlation between total serum N-glycosylation and tumor-specific N-glycosylation, with 23% of individual N-glycans significantly positively correlated between the two. Considering that many of the glycoproteins within the serum did not originate from the tumor or even from the liver, that level of correlation suggests that there is significant HCC tumor biomarker potential contained within the serum N-glycome. It is noted that this is the first study examining both HCC tissue and serum for a protein or glycan biomarker.

However, not surprisingly, glycosylation associated with HCC displayed a great degree of heterogeneity, with no single N-glycan, or even N-glycan class, being found in more than 60% of patient samples. Many groups have examined the glycosylation of a single protein, often with a single glycan change, as biomarkers of HCC, but these, have all had limited performance. Indeed, AFP-L3 is a version of AFP with a bi-antennary core fucosylated glycan43. Thus, it is not surprising that N-glycan biomarker development has been difficult, with many groups looking for changes in single glycans that could identify all cancers. That is, while alterations in glycosylation have been associated with many types of cancers16,26,39,40,41,42,43,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68, the development of N-glycan-specific biomarkers has been limited. The reason for this was that N-glycan analysis often required the purification of a protein of interest before analysis, which was not feasible for the analysis of the hundred or thousands of samples needed for clinical evaluation or importantly, clinical use. Additionally, this often limited the analysis to a single protein at a time. Studies that utilize just total serum are easier but lose all links to protein identification hence diminishing clinical utility.

Our data does suggest that the greatest biomarker potential comes from the analysis of specific glycans associated with specific proteins. That is as Fig. 3c–e, g–i show, looking at only changes in glycan “class”, such as alterations in total or even just changes in core fucosylation, has limited biomarker ability, when compared to specific glycan structures (Fig. 3g–i).

The method utilized here, allows for the simultaneous analysis of all glycans on multiple targets. Indeed, while we captured and performed N-glycan analysis on 16 glycoproteins, that number could easily be increased. Current antibody array platforms often incorporate 100’s or 1000’s of antibodies, and such a platform could be used in the future to allow for targeting of more serum glycoproteins. This could include HCC glycoprotein biomarkers like AFP and GP73, whose glycoforms have shown promise as improved cancer biomarkers39,69,70.

It is noted that antibody microarrays are often used for detecting protein expression from various biofluids and have been used in combination with lectins for N-glycan information71. However, lectins only recognize a subset of N-glycan features, many of which are unknown, and their binding is mediated by several factors that can alter their binding affinity (such as protein structure). The GlycoTyper method addresses this limitation and allows for compositional information to be determined for every protein captured.

The data presented in this paper has several limitations that will have to be addressed in future studies. The first is our analysis of N-glycan devoid of sialic acid. As conventional MALDI-TOF-based methods are often associated with a loss of sialic acid during ionization, to reduce potential variability, we treated all samples, tissue, and serum, with a recombinant sialidase before analysis72. However, aberrant sialylation is an established hallmark of several types of cancer, including breast, ovarian, pancreatic, prostate, colorectal, and lung cancers, melanoma, and, importantly, HCC73,74. Future methods will have to be developed to allow for the analysis of sialic acid in multiple MALDI-IMS workflows. One such approach that has been used with immune cell antibody array slides is an amidation reaction to stabilize sialic acids, as well as distinguish their isomeric linkages75.

Finally, while our analysis included multiple rounds of internal cross-validation and discovery and analysis in independent sample sets, with similar biomarker performance, all samples analyzed in this study were collected at the time of cancer detection. A longitudinal study will have to be performed to truly determine how this algorithm would be used in the management of patients at risk of developing HCC.

In conclusion, we have developed a workflow demonstrating connections between circulating glycoproteins and attached glycoforms that are also present in tumor. This approach, which allows for the development of a liquid N-glycan biopsy-based biomarker for HCC, should be applicable to many other cancers.