Fig. 5: Protein identification.
From: Nanopore-based massively parallel sensing for peptide profiling and protein identification

a The confusion matrix of the test set based on the CNN-DM filtered dataset. Label hp1_1 represents the first LysC-derived peptide fragment from the hp1 protein (etc.). Data with an error rate lower than 1% is not shown. b Schematic of the single-blind protein identification workflow. Anonymously labeled protein samples (Protein 1, 2, or 3) are individually subjected to LysC digestion. The resulting peptide mixtures are then azidated with FSO2N3, followed by OPO library preparation. Nanopore sensing and CNN-DM analysis then allow for the assessment of their distribution characteristics and the prediction of the protein’s identity as one of three candidates (hp1, hp2, or hp3). c Distribution of predicted OPO reads for three model proteins generated from CNN-DM-based classification of protein 1 (dark green), protein 2 (yellow), and protein 3 (light green), which were identified as hp1 (n = 9 peptides), hp2 (n = 6 peptides) and hp3 (n = 9 peptides), respectively. The Tukey box plot summarizes OPO reads’ distribution, where the box represents the interquartile range (IQR) from the 25th to the 75th percentile, the central line indicates the median, the whiskers extend to the highest and lowest values within 1.5 times the IQR, and any data points beyond the whiskers are identified as outliers.