Fig. 1: GENBAIT workflow and evaluation.
From: Computational design and evaluation of optimal bait sets for scalable proximity proteomics

a Proximity labeling data are acquired by MS. Interaction scoring is performed to generate a matrix of baits and high-confidence preys D. NMF is used to soft-assign preys into a predefined number of components B based on GO:CC terms. b We compared GENBAITās performance to 10 feature selection methods and random selection (as a baseline) using 15 different metrics. c Schematic of the evaluation procedure (fitness function). Bait selection generates a subset Dā of the original dataset D. NMF is used to soft assign the subsetās preys into components Bā, and the Hungarian algorithm is used to create a matrix B* in which the components of Bā are aligned with those in the full dataset B. Then, Pearson correlations between corresponding components are calculated. The mean of the diagonal values is used as the fitness score in the genetic algorithm. d Workflow of the genetic algorithm for optimizing BioID bait subsets. Randomly selected initial bait subsets undergo mutation and crossover operations, followed by fitness evaluation. High-scoring subsets are used in the next generation to iteratively define the optimal subset for scalable BioID profiling.