Background & Summary

Erythrocytes in unique discoid shapes are essential in delivering oxygen and carbon dioxide to maintain the life of a body. Besides the major function of gas transportation, these red blood cells (RBCs) also transfer GPI-linked proteins and iC3b/C3b-containing immune complexes1, and binds DNA through their membrane receptor TLR92. Moreover, RBCs are commonly known to carry blood group antigens that determine blood compatibility during transfusion are associated with diseases3.

RBCs are also involved in many diseases such as spherocytosis, elliptocytosis, stomatocytosis, sickle cell disease, nonspherocytic hemolytic anemia due to alterations in the cytoskeletal, metabolic, and other proteins4,5,6. In addition, dysfunction of RBCs plays roles in the cardiovascular disease7, and RBC protein profile showed difference in breast cancer and neurodegeneration8,9,10. Therefore, investigation of the RBCs protein profiles is necessary to understand the molecular bases of these important cells and their involved diseases.

Earlier the LC-MS/MS based proteomic analyses have identified hundreds of proteins from the RBCs under in-gel or in-solution digestions with or without peptide fractionation1,11,12,13,14,15,16,17. However, more than 2,000 distinct gene products could be annotated in human RBCs according to the omics data18, and nearly 1,200 proteoforms could be directly identified in purified human RBCs by the top-down proteomics approach19. A proteomic analysis of human erythropoiesis has profiled the absolute expression of 6,130 proteins during erythroid differentiation from late burst-forming units-erythroid to orthochromatic erythroblasts although these are not the end mature RBCs20. With more sophisticated analyzers and optimized protocols, up to nearly 3,000 proteins profiled in human RBCs have been reported in recent years21,22,23,24.

With major advances in the LC-MS/MS-based proteomic analysis pipeline like extensive peptide fractionation25, DMSO in liquid chromatography26, data-independent acquisition-based SWATH scan27, an increased number of identified proteins could be expected to achieve. Here we perform in-depth proteomic analyses on RBC membrane and cytoplasmic fractions respectively in an attempt to provide a comprehensive protein profile of human RBCs for better understanding of their physiology and pathological alterations.

Here we isolated human RBCs into the membrane and cytoplasmic fractions and performed extensive offline basic pH RPLC fractionation (~100 fractions) respectively for the mass spectrometry-based proteomic analyses, yielding identification of 4777 proteins in the RBC membrane fraction and 2350 proteins in the cytoplasmic fraction with 5264 proteins in total. These comprehensive RBC proteome datasets provide data resource for better understanding of human RBCs and their involved diseases.

Methods

Human RBC collection

This study was approved by the Ethics Committee of our Nanjing Drum Tower Hospital (IRB Review Approval #: 2022-165-01). 100 human EDTA-anticoagulated blood samples with normal blood routine test results were collected from the leftovers in the blood bank or the clinical laboratory of our hospital after all prescribed clinical tests were completed. These samples included four different blood types (A, B, AB, and O), males and females, and different ages, in order to cover the general profiles of human red blood cells (Fig. S3). All samples were screened for no indications of infectious diseases (hepatitis B and C, syphilis, and human immunodeficiency virus).

Each whole blood sample was filtered to deplete WBCs1,23,24, and then centrifuged at a low speed (800 g) for three minutes. The bottom of the pelleted RBCs was taken to avoid residual white blood cells and platelets, and washed three times by PBS (Phosphate Buffered Saline) for three times. Purified cells were kept on ice for hypotonic lysis. For each sample, 10% of the final purified cells were resuspended in PBS at the same volume as its original whole blood for blood cell routine testing or in their own plasma for blood smear staining to ensure no WBCs or plates were detected or observed.

RBC membrane and cytoplasmic fractionation

The final purified RBCs were lysed in 0.1x PBS (10 volumes of the starting whole blood) and then checked under the microscope to confirm complete lysis of RBCs. After centrifugation at 20,000 g for 10 minutes, the top half of the supernatant was collected as the cytoplasmic fraction. After removal of the rest supernatant, the pellet was washed with the 0.1x PBS for three times until no red residuals were observed. The pellet was then harvested as the membrane fraction. Both the membrane and cytoplasmic fractions were aliquoted and stored at −80 °C.

LC-MS/MS based proteomics and data analysis

Proteomics sample preparation

All procedures followed the reports with minor modifications28,29,30,31. The frozen RBC cytoplasmic and membrane fractions were taken and precipitated by 80% acetonitrile. The cleaned protein pellets were then dissolved in 8 M urea (50 mM Triethylammonium bicarbonate buffer, TEAB, pH 8.5) at 4 °C overnight. After quantification by the bicinchoninic acid method, 100 μg proteins were taken and 1 μg LysC (Wako) was added to start the in-solution digestion for 2 hours at the room temperature. Three volumes of the TEAB buffer were subsequently added to dilute urea to 2 M and 1 μg trypsin (Promega) was added to continue digestion for at least four hours or overnight. After that, 5 mM dithiothreitol was added to the digestion solution to start reduction at the room temperature for one hour, followed by addition of 15 mM iodoacetic acid for alkylation in dark for 15 minutes, which was subsequently quenched by 15 mM dithiothreitol for 15 minutes.

The digested peptides were further acidified by addition of 1% formic acid (FA) for desalting by the C18 column with 1% FA as the washing solution and 60% acetonitrile in 1% FA as the elution solution.

DDA LC-MS acquisition

The peptide samples were analyzed through a column (50 μm ID and 30 cm long, packed with C18, 1.9 μm) under a gradient of acetonitrile (from about 5~20% for early fractionated peptides with lower hydrophobicity and about 15%~35% for later peptide fractions with higher hydrophobicity) for ~120 minutes by the mass spectrometer (Q Exactive HF-X, Thermo) at the flow rate of 0.3 μl/μl in the data-dependent acquisition mode. MS settings included the MS1 scan (60,000 resolution, 3 × 106 AGC target, 30 ms maximal ion time, 350–1500 m/z scan range) followed by 20 data-dependent MS2 scans (150,000 resolution, 1 × 105 AGC target, 50 ms maximal ion time, 1.6 m/z isolation window, 29% stepped HCD, 200–2000 m/z scan range, and 30 s dynamic exclusion).

For the in-depth proteomics analysis, 50 μg from each of the 100 membrane or cytoplasmic fraction samples were taken to mix together for digestion by LysC and then trypsin, followed by reduction, alkylation, quenching and desalting steps. After that, the peptide offline fractionation by reversed phase liquid chromatography (RPLC) at high pH was performed by the C18 column (4.6 mm × 250 mm, BEH,3.5 μm, XBridge) and the high-pressure LC system (Shimadzu LC-20AD with UV detector). 3 mg peptides were loaded to the XBridge column and then separated by the linear gradient (first 5% solution B for 10 minutes, then 5~40% solution B for 100 minutes, and finally 40~65% solution B for 10 minutes (A: 10 mM ammonium formate, pH 8.0; B: 90% acetonitrile, 10 mM ammonium formate, pH 8.0) at the flow rate of 1.0 ml/min. About 100 fractions were finally collected and run individually without concatenation except those beginning or ending fractions.

Proteomics data processing

The MS raw files were processed by the Proteome Discoverer 2.4 software (ThermoFisher) and the Sequest HT engine with the Uniprot human proteome database (UP000005640, 83,587 proteins including 20,405 reviewed and 63,182 unreviewed ones). Specific parameters were basically as the default, including full tryptic digestion, maximum cleavages of 2, peptide length ranging from 6–144, 10 ppm in precursor mass tolerance and 0.02 Da in fragment mass tolerance, b and y ions in spectrum matching, maximum modifications of 3 and 15.995 Da (M) with N-terminal acetylation, methionine loss or both in the dynamic modifications and carbamidomethyl (+57.021) in the static modifications. Peptide-spectrum matches (PSMs) were verified based on q-values at the false discovery rate (FDR) of 1% under the Percolator module. At proteins levels, the Strict and Relaxed FDRs were set at 1% and 5% respectively. Proteins with q-values lower than 1% were assigned with “High” confidence in identification, and those between 1~ 5% were assigned with “Medium” and those with 5% or higher were assigned with “Low”.

The searching results were mapped to retrieve gene names by “ID mapping” of Uniprot. Protein and peptide results with no gene names were removed. For the results with same the gene name, only the one with “Reviewed” status were kept. The cellular component, molecular function and biological process enrichment analyses were performed on line by Gene Ontology (https://www.geneontology.org/) under the default setting.

Data comparison with public databases and reported studies

Two databases and two reports were used for RBC protein comparison with this study. For the Uniprot database, searching by the word “erythrocyte” against the UniprotKB yielded 597 protein records that are related to the red blood cells. Besides, lists of proteins from the membrane proteins (Uniprot and GO), proteins associated to a disease, and red blood cell antigens were downloaded from the RESPIRE database32, and merged into 737 proteins. In addition, lists of RBC proteins were also downloaded from the supplementary data of the two reports including 1563 and 2653 proteins respectively21,24. The gene names of all these proteins were retrieved for cross-comparison and merging.

Data Records

The mass spectrometry proteomics data have been deposited to the ProteomeXchange Consortium (https://proteomecentral.proteomexchange.org) via the iProX partner repository33,34 with the dataset identifier PXD067677, including 82 MS raw files of the RBC membrane fraction and 94 raw files of the cytoplasmic extract together with their.msf searching files generated by the Proteome Discoverer Software and their searching results of both proteins and peptides35.

Technical Validation

RBC membrane and cytoplasmic fractionation

Hemoglobin occupies about 98% of the total RBC proteins1,36,37, causing significant interference in the mass spectrometry-based proteomic analyses. RBC lysates were commonly prepared into the membrane fraction (“white ghost”) and the cytoplasmic fraction1,12,15,16,21,22,23,24,38,39. We similarly purified human RBCs from the anticoagulated whole blood and lysed them in hypotonic buffer for separation into the membrane and cytoplasmic fractions (Fig. 1A–C). The purify of the final RBCs was confirmed under the microscope and by automated counting of WBCs and platelets (Fig. 1B and Fig. S1). SDS-PAGE demonstrated proteins of the whole RBC lysate were well separated into the cytoplasmic and membrane extracts with a large amount of hemoglobin mainly at the bottom of the polyacrylamide gel in the cytoplasmic fraction (Fig. 1D). Western blots demonstrated that the RBC-specific protein α-Spectrin was enriched in the membrane extract while the general cytoplasmic protein marker GAPDH and hemoglobin were enriched in the cytoplasmic extracts (Fig. S2).

Fig. 1
Fig. 1The alternative text for this image may have been generated using AI.
Full size image

Red blood cell (RBC) fractionation and proteomic analyses. (A) The workflow of red blood cell fractionation. WBC: white blood cells. (B) Purity of isolated red blood cells by staining under the microscope. (C) Isolated RBC samples during purification. (D) SDS-PAGE and Coomassie blue staining of the membrane and cytoplasmic extracts of the red blood cells. (E) Proteomic analyses of the membrane and cytoplasmic extracts by LC-MS/MS. (F) Identified proteins in the membrane and cytoplasmic extracts. (G) Heatmap to demonstrate the relative levels of identified proteins profiled in both samples. The numbers of PSMs of each protein in both samples were used and normalized to their average. PSM: peptide-spectrum match, indicating the number of times by which a protein is sequenced. (H) The membrane and cytoplasmic protein percentages in the two fractions respectively. (I) The PSMs of the selected proteins as reported to be the most abundant identified in membrane and cytoplasmic extracts15.

Proteomic analyses of these two fractions by LC-MS/MS showed different liquid chromatographs due to different protein contents (Fig. 1E) and identified 1058 and 146 unique proteins in the membrane and cytoplasmic fractions respectively with 455 proteins in both (Fig. 1F). The majority of the identified proteins showed a large difference in their levels between these two fractions (Fig. 1G), suggesting that they were mainly located in the membrane or cytoplasm of the RBCs and were well-separated by fractionation.

We downloaded the human membrane and cytoplasm proteins respectively from the subcellular proteome at Uniprot and mapped all 1659 identified proteins on these two lists, and found 41.8% of the mapped proteins were membrane proteins in the membrane fraction while only 26.2% in the cytoplasmic fraction (Fig. 1H). The presence of certain amount of membrane proteins in the cytoplasmic fraction is probably because these proteins are both membrane and cytoplasmic during intracellular traffic or in the process of synthesis, or need more subcellular characterization. We further looked at the specific top RBC membrane and cytoplasmic proteins as reported15, and found the reported RBC membrane proteins were largely present in the RBC membrane extract, and the reported RBC cytoplasmic proteins were highly present in the cytoplasmic extract but also appeared in the membrane fraction in a certain amount (Fig. 1I), possibly due to association with the RBC membrane.

In-depth proteomic analyses of RBC membrane and cytoplasmic fractions

To provide a general human RBC protein profile as much as possible, we collected the whole blood samples from 100 humans that covered different blood types, genders, ages and diseases (Fig. S3). For comprehensive proteomes, extensive fractionation by RPLC at basic pH was performed in which nearly 100 fractions were collected and analyzed individually without concatenation (Fig. 2A, Fig. S4A, B). This led to profiling of 4,777 proteins (4,164 at FDR < 1%, 459 at 1% ≤ FDR < 5%, and 154 at 5% ≤ FDR < 8%; 3,608 Reviewed, 1,169 Unreviewed; 4,264 with PSM ≥ 2) in the membrane fraction and 2,350 proteins (1,951 at FDR < 1%, 368 at 1% ≤ FDR < 5%, and 31 at 5% ≤ FDR < 8%; 1,755 Reviewed, 595 Unreviewed; 2,127 with PSM ≥ 2) (Fig. S4C) in the cytoplasmic fraction, yielding 5,264 proteins overall.

Fig. 2
Fig. 2The alternative text for this image may have been generated using AI.
Full size image

In-depth proteomic analyses of the RBC membrane and cytoplasmic extracts. (A) The workflow and the number of profiled proteins of the in-depth proteomic analyses. (B) Heatmap to demonstrate the relative levels of proteins (normalized to their averaged PSM between two fractions) profiled in the membrane and cytoplasmic RBC extracts. (C) The number of identified proteins in the two fractions. (D) The gene ontology cellular component enrichment analyses of the membrane and cytoplasmic RBC fractions. (E) Protein sequence coverage with cumulative percentages. (F) The natural variant and mutations of the amino acids of the protein G6PD with sequence coverage by identified peptides in this in-depth proteome analysis.

Although albumin, the most abundant protein in plasma and a common contaminant in the LC-MS/MS-based proteomic analysis, was detected by 3363 and 2199 times in the membrane and cytoplasmic fraction respectively, the other top 10 most abundant plasma proteins (α-2-macroglobulin, complement C3, serotransferrin, etc.)40 were detected by only nearly a hundred to around 1000 times in the membrane fraction and by only tens to a few hundreds of times in the cytoplasmic fraction (Table S1). In contrast, spectrins as the RBC membrane-specific proteins were identified with PSMs nearly 70,000. Besides, these 10 most abundant proteins that has occupied almost half (~46%) of all detections in the plasma, were only a small portion (~0.84%) of all identifications in the RBC membrane extract (Fig. S5). Among the most abundant platelet membrane proteins41, Integrin alpha-IIb (P08514) and Integrin beta-3 (P05106) were detected only in the membrane fraction with the PSMs of 35 and 36 while the protein Platelet glycoprotein V (P40197) was not detected, suggesting very little contamination from platelets. Taken together, these data indicate that the purity of the isolated RBCs in this study are relatively high.

The majority of these proteins were present highly either in the membrane fraction or in the cytoplasmic fraction (Fig. 2B), indicating that the protein populations of the membrane and cytoplasm in RBCs are largely different. Comparison of proteins (Reviewed status in Uniprot, PSM ≥ 2, FDR < 0.01) in these two datasets showed 1,753 proteins identified only in the RBC membrane fraction and 103 proteins only detected in the cytoplasmic fraction with 1,392 proteins identified in both (Fig. 2C). The top 100 most abundant proteins in membrane and cytoplasmic fractions were also listed respectively (Tables 1, 2).

Table 1 Top 100 most abundant unique proteins in the membrane fraction.
Table 2 Top 100 most abundant unique proteins in the cytoplasmic fraction.

Gene Ontology cellular component analyses showed the membrane fraction proteins were enriched in proteasome core complex, spectrin, ankyrin-1 complex, eukaryotic translation initiation factor 3 complex, membrane attack complex, etc., while the cytoplasmic fraction proteins were enriched in proteasome regulatory particle (base subcomplex), hemoglobin complex, haptoglobin-hemoglobin complex, Arp2/3 protein complex, U6 snRNP, Dynactin complex, etc. (Fig. 2D). The molecular function and biological process analyses by Gene Ontology also showed largely different enrichments between the membrane fraction and the cytoplasmic fraction (Fig. S6). The GSEA analyses protein enriched in the membrane extract were involved in ion transportation by ATPases, integrin cell surface interactions, ER traffic, etc., while proteins enriched in the cytoplasmic extracts were related to RABGAPs, the pentose phosphate pathway, pyrimidine metabolism, etc. (Fig. S7)

As many RBC diseases involve genetic variations and are encoded into amino acid alterations at the corresponding proteins, high protein sequence coverage in the proteome is able to indicate whether these mutated amino acids might be sequenceable. Due to the in-depth analyses, 0.9% of all identified proteins were almost 100% sequenced and 29.1% were nearly 50% sequenced (Fig. 2E). We then looked specifically into the Glucose-6-phosphate 1-dehydrogenase (G6PD), a protein that is associated with hemolytic anemia and its deficiency is the most common metabolic disorder of RBCs that affects more than 400 million people worldwide4. 76% of the sequence of G6PD was sequenced in this study, covering 65 of total 78 nature variants or mutant amino acids (Fig. 2F, Table S2). Coverages of proteins involved in inherited glycolytic, redox, nucleotide metabolism, and membrane disorders as well as those blood group antigens were listed (Figs. S8, S9).

Comparison with other RBC proteomics studies and protein databases

We compared our datasets with two impressive RBC proteomic studies and two protein databases (Uniprot and RESPIRE)21,24,32. There were 2348 proteins in this study that were not reported or recorded in these studies and databases (Fig. 3A, Table S3), and showed a significant correlation (P = 2.33E-218, R = 0.36) with the proteins detected in the study by the group of Dr. Wiśniewski24 (Fig. 3B). We listed the top 100 most abundant novel proteins identified in this study (Table 3). We noticed that proteins like hemoglobin (HBE1, hemoglobin subunit epsilon), actin (ACTA1, Actin, alpha skeletal muscle), tubulins (TUBB4B, TUBB2A, TUBB1, etc.) were also in the list. These proteins should not be missed in previous RBC proteomics studies as they are highly abundant. All these proteins have unique peptides (Table S1), suggesting that they are truly present. It is possible that these proteins are only a small portion or their unique sequences are not easily detected in regular proteomic analyses without in-depth approaches. For example, HBE1 (Uniprot accession ID: P02100) has five unique peptides and most only have a few PSMs which will probably be missed in regular proteomic analyses, but its shared peptide (“LLVVYPWTQR”) that can be mapped to hemoglobin HBB, HBD, HBG1 and HBG2 were sequenced by 1537 times (PSM = 1537) (Table S2).

Fig. 3
Fig. 3The alternative text for this image may have been generated using AI.
Full size image

Comparison of this study with other databases and reports. (A) The numbers of unique and shared proteins identified in this study as compared with other reports and databases. (B) Correlation of proteins profiled in this study with those reported in a study24.

Table 3 Top 100 novel proteins identified in the in-depth RBC membrane and cytoplasmic proteome analyses.

The major approach that contributed to such deep analyses is utilization of the extensive offline peptide fractionation and analysis of each individual peptide sample without concatenation. Earlier RBC proteomic studies used protein separation in polyacrylamide gels11,12,13, identifying hundreds of proteins17. Other studies used offline peptide fractionation by RPLC or strong cation exchange (SCX) but concatenated the fractions into a few tens of pooled peptide samples for analysis, yielding identification of more than 1000 or 2000 proteins21,22. The study by the group of Dr. Frayne used SCX probably without concatenation profiled 2838 unique proteins of RBCs23. It is notable that another study used the multienzyme digestion filter-aided sample preparation approach without offline peptide fractionation identified 2650 proteins in RBCs24. The near 100 peptide fractions and analysis without concatenation in our study has led to an unprecedented scale of human RBC proteome study.

The other reason of our expanded RBC proteome coverage is that we have used pooling of 100 human RBC samples including different blood types, genders, ages and clinical diagnoses. Previous RBC proteome studies only used the blood samples from a few donors1,17,21,23,24 or a small sub set of groups14,16, missing the proteins expressing variably in individuals or only at certain disease situations. The inclusion of a large number of RBC providers in diverse conditions would certainly increase the number of proteins detected.

Besides proteins involved in the physiology and disorders of RBCs themselves, this study also provides evidence for studying diseases of other systems. RBCs might contribute to cardiovascular disease by regulating NO-induced vasodilatation and redox through the NO synthase, arginase, guanylyl cyclase, protein kinase G, phosphodiesterase, and ATP-binding cassette (ABC) transporters7. Some of these proteins were detected in this study but others were not, indicating that the mechanistic pathways or molecules need further validation to be solid. The RBC protein profile was reported to be modified in breast cancer with the protein LAMP2 as a prognostic biomarker8 which is truly detected in this study (Table S1).

RBCs were also reported to be altered in neurodegenerative disorders. RBC membrane modifications and proteins (GLUT1 and INSR) showed difference and the RBC kynurenic acid level were reduced in Alzheimer’s disease (AD) compared to controls9,42,43. The thorough protein profiles of this study would provide molecular clues to these RBC alterations. It was reported that APP (Amyloid-beta precursor protein), the protein that generates the culprit Aβ peptides in AD, was detected in RBCs1. However, it was not detected in this study but instead, the APP-binding protein (APBB1IP) and APP-like protein (APLP2) as well as the protein BACE2 that cleaves APP for generation of Aβ were detected (Table S1). It was also reported that α-Synuclein, the hallmark protein in brain pathology of patients with Parkinson’s disease, were increased in heterocomplexes with Aβ peptides in RBCs of AD patients44. Research also shows that RBCs are the major source of α-Synuclein in blood45. It is notable that α-Synuclein was truly detected in high abundance in this study (PSM = 921 in the cytoplasmic fraction, Table S1), supporting these reports.

Using pooling of 100 human RBC samples with isolation into membrane and cytoplasmic fractions, followed by extensive offline peptide fractionation and individual LC-MS/MS analysis without concatenation, we have performed an in-depth analysis of human RBCs, providing a comprehensive protein profile for studying this type of essential cells as well as their involved diseases.