Introduction

Histones are low molecular weight proteins that provide structural support for chromosomes and play well-characterized roles in gene expression. In addition, histones play roles in epigenetics, in part through crosstalk between their own protein post-translational modifications (PTMs) and chemical modification on the DNA itself1. Acetylation of histones at key sites leads to an open chromatin structure, which permits transcription, while deacetylation causes a closed state and the corresponding repression of transcription. In addition, histone deacetylation has long been noted as a hallmark of cancer progression2. As other characteristics of cancer progression are unchecked growth and cellular processes that occur despite damage that should stop these in normal cells, this relationship appears to be a linear one. A class of drugs loosely defined as histone deacetylase inhibitors (HDACs) has shown promise in treating not only cancer, but also metabolic diseases, viral infections and age-related neurodegeneration. While promising, the core mechanisms and overall effects of many of these drugs remain poorly understood3.

Today, multiple methods exist that can profile cis-regulatory elements to help understand epigenetics in single-cell populations. While these techniques are continuing to improve, they are largely derivatives of ChIP-seq technology that use nucleotide, aptamer, or antibody probes specific for cis-regulatory elements. Newer variations of these methods can provide information on up to three histone PTMs as well as provide information on the nucleotide sequences with which they are associated4,5. However, over 20 separate chemical PTMs have been identified on histone proteins by mass spectrometry that can exist in hundreds of possible combinations. Nearly all of these PTMs have been identified through the use of liquid chromatography mass spectrometry (LCMS)1.

Work in our lab recently described the identification and quantification of 8 classes of protein PTMs in single human cells using multiplexed single-cell mass spectrometry (SCP)6. In a follow-up work applied to understanding the KRASG12D inhibitor MRTX1133, histone PTMs were observed in nearly every one of more than 1400 single human cancer cells analyzed7. The main limitation in SCP by LCMS is the absolute protein concentration within each individual cell—more abundant proteins are exponentially easier to detect than lower abundance ones8. As histone proteins exist in millions of individual copies per cell, peptides from all main classes of histones are easily identified in nearly every SCP study of nucleated cells9. The signal quality from these proteins is so high that recent work described how the relative size of each single-cell can be normalized solely by using the histone H4 protein abundance as a scaling factor10. It seems likely, therefore, that drugs affecting histones, such as HDACs, could be studied by SCP. In addition, methods with higher relative throughput than those typically in use may still provide valuable insight into the effects of HDACs in single human cells, even when proteome coverage decreases as relative throughput increases. In this study, I test this hypothesis by analyzing pseudo-randomized control and HDAC-treated single cells with multiplexed SCP methods, allowing between 210 and 3500 single cells to be analyzed per day.

Results and discussion

An HPLC system designed for clinical samples can be applied to high-throughput SCP experiments

The EvoSep One system is a fit-for-purpose LCMS system originally designed for clinical proteomics and other standardized high-throughput proteomics applications11. While limited to the validated methods provided by the vendor, the system provides value where reproducibility and throughput are a priority over method development. The method for multiplexed SCP on current time of flight instruments allows for 7 cells to be analyzed per LCMS injection, with the remaining three multiplex reagents used for blanks and controls7,12. For clarity of presentation, I will refer to the methods in this study by the number of cells analyzed per day (CPD). For example, a 700 CPD method will allow, after blanks and carrier controls, 700 single human cells to be analyzed on one instrument in 24 hours. A summary of the results of analyzing single cells with various CPD methods is provided in Table 1. Surprisingly, doubling the throughput from 210CPD to 420CPD led to an overall increase in the average number of proteins identified in the study. A commercially available labeled peptide digest was used for assessing the quantitative value of each method6,13,14. The 210CPD and 420SPD methods resulted in nearly identical levels of observed isolation interference. While the 700SPD resulted in increased relative coisolation interference compared to the other two, the results for the met6 protein were still found to be quantitatively significant (S. Fig. 1). The 1400CPD, 2100CPD and 3500CPD methods resulted in less than 150 protein identifications in total and coisolation interference at levels which implied that any single-cell measurements obtained were likely of no quantitative value.

Fig. 1: The occurrence of acetylated histone peptides in this study.
figure 1

A A bar chart describing the number of histone acetylation sites identified in each cell by percentage at each relative throughput. B A bar chart representing the total percentage of PSMs in which a 126.0913 ± 0.005 fragment ion was observed, as well as the occurrence of this diagnostic ion in spectra identified in this workflow as lysine acetylated. C A Venn diagram demonstrating that all histone acetylation sites identified in single cells were observed in a recently published bulk proteomic analysis of cells treated with mocetinostat.

Table 1 A summary of proteomic and histone-specific proteomic coverage of cells analyzed at each relative number of cells per day

Increasing SCP acquisition rates has less detrimental effects on histone measurements

LC-MS-based proteomics has always been biased toward proteins of the highest relative concentrations within a complex mixture. As such, most advances in increasing proteomic depth center on repeatedly measuring peptides from the highest abundance proteins15,16. Reflective of this, the number and average percent sequence coverage of histone proteins are only slightly diminished when increasing the number of samples analyzed per day. Even at 700CPD, when the number of proteins per cell has decreased by 54.9% relative to the 210 SPD method, the average percent coverage of histone proteins detected only decreases from 47% to 41.7%. Impressively, this trend continues even when increasing the sample acquisition rate to 3500CPD. Nine histone proteins were still detected; however, at only 22.11% sequence coverage of the histone proteins identified. As previously mentioned, isolation interference in the QC samples suggests that quantitative evaluation of these highest acquisition rate data is likely of negligible value.

Sixteen individual combinations of histone PTMs can be identified in single cells using this approach

While multiple technologies today can monitor histone PTMs in single cells, these are currently limited to a maximum of three per cell, while most can monitor only one PTM on a single histone protein per study5. In the approach described here, 16 separate histone peptides can be identified with high confidence in single H358 cancer cells. A graphical representation of the PTMs and their sites in three main histone proteins is provided as S. Fig. 2. Due to the high degree of homology in H3.1 and H3.3 in a tryptic peptide, the protein from which the PTM originated could not always be determined. It should be noted that the method applied here utilizes a fully untargeted method for identifying peptides that are fragmented, sequenced, and subsequently assigned to every single cell. Targeted approaches or data-independent analysis methods that are less biased by stochastic sampling should be applicable to epigenetic analysis of single cells, while likely resulting in fewer missing values17,18. Despite these limitations, seven histone acetylation sites could be confidently identified with these methods, as illustrated in Fig. 1A. As expected with a longer total analysis time, the 210CPD method leads to the highest identification rates of all unique sites per cell, including the only confident identifications of Histone 2B K21 sites. Surprisingly, however, a higher percentage of cells is identified with the Histone 4C1 dual acetylation on K13 and K17 with both the 420CPD and 700CPD methods than with the longer acquisition methods.

Fig. 2: Mocetinostat treatment significantly increases the signal of acetylated peptides from histones H3 and H4.
figure 2

A The measured abundance of the Histone H3 protein, as a whole, demonstrates no alteration following treatment. B The H3 K24 peptide with acetylation, as well as a double acetylation (C) of K19 and K24, both demonstrate a highly significant increase in abundance following drug treatment. D The Histone H4 protein, as a whole, does have a significant increase in total abundance (p = 0.0173) following treatment. E A combined abundance plot of peptides demonstrating a combination of acetylation sites on K9, K13 and K17 is significantly increased following drug treatment. F A reanalysis of (E), after performing cell size normalization using H4 abundance. Source data are provided.

Diagnostic ions are valuable complementary data in supporting histone acetylation sites

A valuable tool in the analysis of many PTMs is the presence of distinct fragment ions that correspond to modified amino acids or peptides. In glycoproteomic analysis, the use of diagnostic oxonium ions is central to both instrument methods and to nearly all data analysis pipelines19. It also appears that the well-characterized lysine diagnostic fragment ion of 126.0913 can also be used as a secondary metric for the relative number of acetylation sites within histone peptide20. A peptide identified as possessing a single acetylation site appears to produce a diagnostic ion roughly proportional to the reporter ion tag from a single-cell (S. Fig. 3A). As the number of identified acetylation sites in the peptide increase to two (S. Fig. 3B) or three distinct sites (S. Fig. 3C) the relative intensity of the 126.0913 diagnostic ion scales in a manner roughly proportional to the reporter ion signal from each cell. While not truly quantitative, these ions provide supporting evidence for the identities of the peptides assigned by the search engine and mirror recently described methods for analysis of intact glycopeptides21,22.

Fig. 3: Single-cell heterogeneity can be visualized by histone peptide signatures.
figure 3

A A PCA plot of the protein abundance values for 580 cells demonstrates only clustering by analysis batch. B A PCA analysis of histone peptides from these same 580 cells, with the large contributor to each dimension labeled.

For a more quantitative analysis of the use of the 126.0913 diagnostic ion, I utilized the recently described DIDAR software to count the number of spectra with this ion from spectra identified as acetylated versus all spectra acquired. DIDAR reads LCMS files with MS/MS spectra and counts the number of spectra possessing user-defined fragment ions23. In the processed results from the 420 CPD method, 1523 PSMs were identified as acetylated by the search software. These spectra were exported and analyzed by DIDAR. DIDAR identified 349 processed fragmentation spectra (22.9%) possessing a fragment ion of 126.013 ± 0.005 Da. A strikingly similar 22.7% of spectra in the 210 CPD analyzed cells was likewise observed (175/769). Conversely, of 283,609 MS/MS original spectra obtained in total in the 420 CPD study, only 1687 spectra (0.59%) were found to contain this same ion under these same criteria (Fig. 1B). Similar numbers were found for the cells analyzed using the 210 CPD study. It should be noted that spot-checking the acetylated peptides manually, as shown in S. Fig. 3, suggests that the number of acetylated peptides with this diagnostic ion may actually be higher than the ~22% calculated by DIDAR. In order to process spectra from a TIMSTOF in Proteome Discoverer, a bin filtering system was employed to reduce the overall noise in each spectrum as one of the first steps in the analysis24. In this case, only the top 12 most intense ions from each 100 Th window of each MS/MS spectra are retained for processing with Sequest. Due to the presence of the reporter ions in the first bin from 100 to 200 m/z, it appears likely that the 126.0913 reporter ion is often filtered out. In our pipeline, the reporter ion region is extracted from each spectrum and analyzed separately with the quantification and identification values linked back together following data analysis. As the 126.0913 reporter falls within the normal reporter ion region, this reporter ion region observed in post-processing is a more accurate representation of the original mass spectra acquired. Despite this complication, it appears clear that spectra identified as acetylated by our workflow are substantially enriched in the 126.0913 acetylation diagnostic ion.

Following the preprint of the first version of this manuscript, a comprehensive high-depth proteomic analysis of 21 lysine deacetylase inhibitors was published by Chang et al.25, including an analysis of 10 concentrations of mocetinostat for 6 hours. To compare these results, I reprocessed the 48 original LC-MS files from the study using the same workflow used for the single cells in this manuscript, with minor alterations to account for mass analyzer architecture. In the bulk data, 52 unique histone acetylation sites were confidently identified, including every histone acetylation site reported in single cells in this study (Fig. 1C).

Mocetinostat-treated single cells have significantly higher levels of histone acetylation

Work in our lab has recently reported, and confirmed in a larger study, that PTMs on high-abundance proteins can be readily detected in many, if not all, single cells7. To determine if PTMs could be meaningfully quantified following drug treatment, NCI-H-358 cells (H358) were treated for 24 hours with mocetinostat, a class-specific HDAC3. A summary of some observations made using the 210 CPD method is shown in Fig. 2. While no alteration in Histone H3 protein abundance was observed between the control and treated population (Fig. 2A), two separate peptides annotated to possess acetylations were significantly increased in abundance. Both acetylation on K24 (Fig.  2B) and a double acetylation on both K19 and K24 (Fig. 2C) were found to be significantly increased following treatment. Similarly, four peptides were observed with a combination of acetylations affecting Histone H4 on K9, K13 and K17 and are, as a group, found to be highly significantly increased following drug treatment. However, I do observe a somewhat significant increase in total H4 protein abundance following mocetinostat treatment (Fig. 2D, E). A recent study described the use of H4 protein levels as a proxy for normalizing between single cells of varying sizes10. To test the effect that H4 protein abundance reflects on the increased acetylation observed, I reprocessed these data using the H4 scaling method. The increase in H4 acetylation remained highly significant (p < 0.0001) following scaling (Fig. 2F).

The Chang et al. bulk proteomic analysis of histone acetylase inhibitors lends further support to the use of the 126.0913 acetylation diagnostic ion as a confidence filter in the identification of acetylated peptides. In the reprocessing of these data for cells treated with mocetinostat, 80.7% (42/52) of histone peptides identified as acetylated possessed a clear 126.0913 fragment ion in the diagnostic region25. This is pertinent as the Orbitrap MS3-based method utilized in this study does not require prefiltering prior to analysis in this software and is likely more reflective of both the true prevalence of this diagnostic ion in unfiltered tandem mass spectra26.

Heterogeneity in histone PTM abundance can be observed following treatment

The proteomic alterations imparted by some drug treatments can not be readily observed in single-cell proteomic data unless the alterations are relatively extreme. In the case of drugs where cell cycle alterations occur, such as treating KRASG12C mutant cells with a covalent inhibitor, a simple principal component analysis can readily separate control from treated cells6. However, when using a noncovalent KRASG12D inhibitor at nonlethal doses, neither PCA nor T-SNE analysis alone could stratify control and treated cells7. At the dose and time of mocetinostat treatment, this is also the case when evaluating the cells in this study from the level of whole protein abundance measurements. The only clustering readily apparent in these data is the separation of LCMS batches analyzed ~1 week apart (Fig. 3A). It should be noted that the PCA plot, as provided, only displays prominent batch effects as shown when limited protein quality filtering is performed and is provided to demonstrate the lack of protein level alterations observed in single cells. When filtering for proteins with more than one peptide, these batch effects are reduced to random scatter. However, when performing a PCA analysis on the abundance of histone peptides, some stratification is apparent (Fig. 4B). The loading plots for these principal components indicated that the largest factors contributing to these separations are both histone acetylation sites with Histone 2 K21 acetylation playing the largest role in PC1 and a tri-acetylation of histone 4 at K9, K13 and K17 playing the largest role in PC2.

Fig. 4: Mocetinostat treatment significantly increases S100A9/A8 protein in nearly all treated cells.
figure 4

A A volcano plot from the 210 CPD analysis of control and treated H358 cells. B Intensity values observed for the S100-A9 protein in all cells in this study. In all cases, a student’s t test found a p value < 0.0001 for control vs. treated cells. For this protein, regardless of the acquisition rate for the cells analyzed. The 700 cell per day method is labeled according to software defaults for pairwise significance. Source data are provided.

Proteomic data is obtained along with measurements of histone modifications in each cell

The use of standard proteomics statistical techniques is typically discouraged in single-cell proteomics due to the inability of bulk protein statistics to accurately capture heterogeneity26. It was therefore surprising to observe high significance in alteration in a single protein, such as seen (Fig. 4A) for S100-A9 (P06702). This calcium and zinc-binding protein has, to the best of my knowledge, not been previously implicated in mocetinostat or other HDAC response mechanisms. Aside from keratins and proteins observed in a relatively small number of cells, the only other significantly altered protein was S100-A8. The proteins S100-A8 and S100-A9 are known to function as a hetero-tetramer complex in multiple contexts27. As shown in Fig. 4B, S100-A9 was observed as significantly increased in nearly every single cell studied at every acquisition speed. Surprisingly, the bulk proteomic study from Chang et al., on mocetinostat-treated HeLa cells, found no alterations in S100-A9 or S100-A8 expression at any dose of drug. This is curious as the Human Protein Atlas lists the detection of the transcripts in neither the HeLa nor the H358 cell line. However, both S100-A8 and S100-A9 have been previously linked to induced KRAS mutations in cell line experiments27.

In the context of histone acetylation, another protein-level observation appears of quantitative value. The primary histone deacetylase (HDAC1) is only sporadically detected in single cells in this study, for example, in 98/562 (17.1%) of 420CPD analyzed cells. When HDAC1 is above the limit of detection, the total abundance of acetylated peptides is decreased, even in the cells treated with mocetinostat. In the 420CPD example, cells with detectable levels of HDAC1 have a 2.22-fold decrease in total acetylated histone signal. This appears to be largely driven by the modifications on Histone 2B, which are Histone 3, which have a reduction in acetylated peptide signal of 2.28-fold and 1.89-fold, respectively, when HDAC1 is above the limit of detection. The triply phosphorylated version of Histone H4 is notable in that the peptide was never detected in a single cell in which HDAC1 was detected. Preliminary work in my lab has demonstrated that the amount of drug concentration in every single cell can be highly variable in clonal cells in a population, and that the proteome of every single cell is affected by the relative concentration of internalized drug28. An analysis of mocetinostat activity in the context of HDAC1 expression is beyond the scope of this article and the expertise of this author. However, the literature suggests that these are proteins that are primary targets of HDAC1, and it is tempting to conjecture that this protein may simply be able to overcome the activity of the inhibitor concentration used in this study2.

Conclusions

Single-cell approaches to interrogate epigenetic control mechanisms are a major focus of research today. There is no shortage of technologies that enable one to measure epigenetic interactions, with most focusing on determining the site of histone/DNA interactions. The first description of Chip-Seq was over 20 years ago29, followed shortly thereafter by single-cell Chip-Seq30. These tools have continued to evolve, most pertinent to this current study, with the simultaneous integration of more histone total protein and PTM sites31. These methods primarily use immunoprecipitation to pull down the histone version of interest as an enrichment tool prior to sequencing to determine the nucleic acid regions where histone protein versions of interest are interacting. The study presented here demonstrates the other side of the picture, where the histone and histone PTMs are analyzed in isolation from the DNA binding site information. Other technologies exist today that can quantify histones and histone PTMs in single cells. Single-cell proteins and PTMs can be targeted with antibodies and analyzed with flow cytometry32, and a larger number of antibodies can be utilized per single cell by extending this to CyTOF technology33. Today, LCMS is the clear method of choice for identifying histone PTMs in bulk samples. Nearly every known modification on these important proteins has been identified by clever use of enzymes or derivatization coupled with enzymatic digestion of these proteins34. In addition, the incredibly high intracellular abundance of these proteins means that unless your methods actively exclude them from analysis, they will nearly always be sequenced with high coverage in every SCP experiment35,36. Rather than ignore this important class of proteins, single-cell epigenetics by mass spectrometry can exploit these characteristics to advantage. Global unbiased shotgun proteomics by data-dependent methods always results in a high relative number of missing values, which is the entire argument for targeted, semi-targeted and data-independent analysis methods37. By simply changing the method described here for any of these other common LC-MS proteomics techniques, the data in this study should be reproducible and likely improved in terms of data completeness. In addition, while the highest CPD methods in this study were found to possess too much interference from background interference, this may be almost entirely due to the mismatch of chromatography and methods. The 4 cm analytical columns used on this clinical HPLC system are intended for high-speed, data-independent methods utilizing wide windows to complement narrow chromatographic peaks38. The use of relatively slow data-dependent methods necessary here is simply a mismatch of two technologies. Multiple new methods have recently been described that allow both high-speed chromatography with higher relative chromatographic resolution, through the application of multi-pump systems, which largely rely on the preparation of the next sample while the current sample is being analyzed39,40. The application of these chromatographic methods may allow accurate quantification of histones and histone PTMs in single cells at a throughput >700CPD. Finally, while this study was intended to focus almost entirely on the histone PTMs themselves, no changes were made to our methods for multiplexed SCP, and a new protein level response to this drug appears to be revealed by that angle of these same data. SCP is still a field of study in relative infancy compared to nearly any other -omics technique today, but the results presented here suggest that single-cell epigenetics may be a fruitful avenue of future investigation. Ultimately, it is up to epigenetic researchers to determine the data that is of value to their research, and this study is meant to describe how single-cell epigenetic insights can be a consequence of acquiring SCP data, and that these histone PTM modifications do appear to provide new insights in the drug treatment model described here.

Methods

Cell culture

The NCI-H-358 cell line (H358) was obtained from ATCC (CRL-5807) and cultured according to vendor instructions in RPMI-1640 (ATCC 30-2001) with 10% fetal bovine serum (ATCC 30-2020) and 10 mg/mL Penn Strep antibiotic solution (ATCC 30-2300). Single control cells were treated with DMSO or with 5 µm Mocetinostat (SelecChem S1122) prepared in DMSO. Treated cells were cultured alongside control cells for 24hours prior to rapid washing of cells with ice-cold magnesium and calcium-free PBS (Fisher). The adherent cells were briefly rinsed in 3 mL of 0.05% Trypsin plus EDTA solution (ATCC 30-2001). This solution was rapidly aspirated off and replaced with 3 mL of the same solution. The cells were examined by light field microscopy and incubated at 37 °C with multiple examinations until the adherent cells had lifted off the plate surface. The active trypsin was then quenched by the addition of 7 mL of the original culture media. The 10 mL solution was transferred to sterile 15 mL Falcon tubes (Fisher) and centrifuged at 300 × g for 3 minutes to pellet the cells. The supernatant was gently aspirated off, and the cells were resuspended in PBS solution without calcium or magnesium with 0.1% BSA (both, Fisher Scientific) at 1 million cells per mL as estimated by bright field microscopy. Cells for single cell aliquoting were gently dissociated from clumps by slowly pipetting a solution of ~1 million cells through a Falcon cell strainer (Fisher, 353420) and the cells were placed on wet ice and immediately transported to the JHU Public Health sorting core. Non-viable cells were labeled with a propidium iodide (PI) solution provided by the core facility and briefly vortexed prior to cell isolation and aliquoting. Isolated single cells were deposited directly onto microwell plates containing two microliters of LC-MS grade acetonitrile per well. A carrier channel of 150 single cells from control or treated cells was aliquoted into the first well in each row. The second well in each row was used as a method blank control well in which 20 nanoliters of the sorting buffer were aliquoted without transferring a single cell. Aliquoted plates were immediately transferred to dry ice prior to −80 °C storage.

Single-cell isolation and preparation

Plates containing single cells were removed from −80 °C in small batches and were immediately unsealed and placed onto a 95 °C heatblock for ~90 seconds to fully lyse the cells and to dry off the remaining acetonitrile. The plates were cooled to room temperature, and the lysates were digested with a solution of 100 mM TEAB, 0.1% DDM, and 2 ng/µL sequencing grade trypsin. Two µL were added to each blank and single cell well, and 4 µL was added to each carrier channel well. The plates were briefly centrifuged to remove air bubbles and were tightly sealed with plate sealing tape. The digestion was allowed to proceed at room temperature (~18 °C) overnight. Following digestion, the plates were thoroughly centrifuged and labeled with TMTPro reagent. For this experiment, the carrier channel was labeled with TMTPro 135, the method blank control was labeled with 126, and the single cells were labeled with 127c, 128c, 129c, 130c, 131c, 132c, 133c and 134c. Due to impurities from the carrier channel, the 134c channel was ignored in all downstream analyses. Control and treated plates were combined using a control-to-treatment swap method for pseudo-randomization7. The result was that in each LCMS experiment, the carrier channel contained an equal mixture of peptides from control and treated single cells (~75 cellular volumes of each), and each injection contained both control and treated cells. The loading method ensures that in each subsequent injection, the cells analyzed are reversed. For example, in injection 1, channels 127–130 would contain control cells, where 131-134 would be treated cells. In the next injection, these would be reversed, in that 127–130 were treated cells and 131–134 control cells. Each injection was labeled in a manner to allow original cellular identities to be deconvoluted in downstream data analysis.

LCMS instrument parameters

An EvoSep One (EvoSep) and TIMSTOF SCP (Bruker Daltonics) were used for all analyses. EvoTips were prepared identically for all experiments, resulting in each tip containing ~30 nanograms of total labeled cells if making the ludicrous assumption that no sample loss occurred in sample preparation. The EvoSep one was operated using PepSep columns coupled to a 10 µm CaptiveSpray emitter through a “Zero Dead Volume” steel union. The TIMSTOF SCP was operated in ddaPASEF mode using a 75 millisecond ramp time, with eight ramps per cycle. A target of 20,000 ions was used with a minimum threshold of 2500 counts. TIMS Stepping was employed, in which the low mass fragment ions were obtained using a collision energy dependent on the 1/k0 value, ranging from 45 to 90. Peptide sequencing data were obtained using a CE of 21–60. A 1/k0 isolation window of 0.7–1.45 was used, and a custom polygon was used to attempt to collect only +2 or greater ions with an m/z of 380 or higher.

Data analysis

To reduce co-isolation interference, I utilize a single-point recalibration method using the 135n carrier channel signal to adjust the reporter ion region using an in-house developed tool (pasefRiQ Calibrator). This secondary calibration allows tighter mass accuracy tolerances to be used during the final data analysis, resulting in reduced background noise6. For all control and treated cells, a linear mass shift of −0.00564 from 100–136 m/z. The recalibrated output files were processed in Proteome Discoverer 2.4SP1 using SequestHT and Percolator41. Briefly, the MS/MS spectra were binned into 100 Da segments and filtered so that only the top 12 most abundant ions from each bin were retained. The resulting cleaned spectra were searched with a 15 ppm MS1 tolerance and 0.03 Da MS/MS tolerance. TMTPro labels were considered static on the N-terminus and dynamic on lysines to allow for the search for lysine PTMs. Methionine oxidation was the only additional dynamic modification. The default cutoffs for Percolator PSM validation and peptide and protein FDR determination were employed in all analyses. Reporter ions were integrated using a 35ppm mass tolerance window, and quantification values were only used for unique peptides. A sum-based normalization of all PSM signals for TMTPro channels 127-133, as well as the raw non-normalized abundance values. For cell size normalization, the H4 protein was used as the sole scaling factor for a secondary analysis of the 420SPD data. For re-analysis of the Chang et al. bulk proteomics data, the same pipeline was utilized, with the exception that a 10 ppm and 0.6 Da MS1 and MS2 tolerance, respectively, were utilized, and the TMT6-plex label was appended as the modifications. All 48 fractions offline fractions were analyzed as fractions for the sake of quantification, and ratios were produced by comparing each dosage point to the mock-treated cells. To improve the overall confidence in histone peptide identifications, a method was employed in which SequestHT, MSAmanda 2.0 and MSFragger were combined within the Proteome Discoverer workflow. For SequestHT a 15 ppm MS1 tolerance and 0.03 Da MS/MS tolerance were employed. For the other two engines, 15 ppm MS1 and 20 ppm MS/MS tolerance were used, as these both use ppm-based mass tolerance rather than static. Peptides containing PTMs were considered as high confidence when separately identified by at least 2 of these very different search tools. For further confidence, all single-cell histone peptides were manually evaluated for sequence coverage and quality using the Proteome Discoverer viewer interface, with fragment identifications considered a match within a 0.03 Da tolerance window. All histone acetylation sites featured in Fig. 1 have been provided as Supplemental Figures with color-matched fragments and fragmentation maps.

Diagnostic ion data analysis

Recent work in my group has described a method for diagnostic ion data analysis reduction (DIDAR). This Python script or GUI interface will remove and enumerate spectra that contain specific fragment ions based on user-entered criteria. For analysis of the occurrence of the lysine acetylation diagnostic fragment ion, 126.0913, all peptide spectral matches were exported from each throughput experiment in Proteome Discoverer as a single file in the Mascot Generic text format (.MGF). In addition, all peptide spectral matches identified as acetylated in Proteome Discoverer were separately exported in the same manner. It should be noted that the former files due include the latter files in their entirety. The six files were then processed with DIDAR to create new files that only contain spectra with an ion detected within 126.0913 ± 0.005.

Statistics and reproducibility

Single-cell proteomics data is intrinsically difficult to reproduce due to the fact that all material is destructively analyzed. In all stages of this study, control and treated cells were pseudo-randomized in an attempt to reduce batch-related effects. All figures are based on cells studied with the 420CPD method, which is a composite of 562 single cells. For pseudo-bulk proteomic comparisons, p values are reported using a student t test. All protein level identifications are reported following filtering at an estimated false discovery rate of 0.01 at the peptide-spectral match, peptide, and protein level, with filtering at each stage.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.