Fig. 2: Design of biomarkers using a biological embedding. | Nature Communications

Fig. 2: Design of biomarkers using a biological embedding.

From: Immunodiagnostic plasma amino acid residue biomarkers detect cancer early and predict treatment response

Fig. 2

a Embeddings are frequently used to process complex systems with many targets for machine learning applications, such as in natural language processing. Such embeddings represent each target as a vector of component parts within a high-dimensional space, and combines the component vectors in a dimensionality reduction step, such that similar targets cluster together in reduced n-dimensional space. b Contrary to traditional embeddings that derive the relative contributions of the dataset dimensions to maximise captured variation, our biological embedding is determined by the proportional molar concentration of components in blood plasma. The Immunoglobulin fraction, c. 38% of proteins, is further divided into immunoglobulin classes, with the proportional contribution shown in healthy individuals. c We performed bioinformatic analysis to determine the optimal biological embedding dimensions to capture this variability. Possible dimensions include any of the N = 20 major amino acid types, which comprise the major plasma protein fractions. We identified N = 5 amino acid types whose numbers changed substantially across the proportion-weighted fractions, such that they would be detectable in an average biological embedding signature.

Back to article page