Communications Biology

Table 1 Technical terms.

From: Systematic auditing is essential to debiasing machine learning in biology

Term	Explanation
Training sets	Data examples we feed ML models to learn from.
Features	Extracted information used to describe entities to inform the ML models about their characteristics from which the models should learn.
ML generalization	Ability of ML models to perform well on datasets independent from which their training examples were sampled.
ML auditor	A system where a ML model of interest is compared to another ML model that is tailored to examine a specific hypothesis about the initial model.
ML auditing	Examining biases of ML frameworks by building ad-hoc ML auditors.
Representational bias	Imbalance or inequality in how different entities are represented in the data due to inherent or experimental conditions.
Paired-input prediction	A class of ML prediction methods where the goal is to predict the relationships between two entities. The ML models are thus trained on pairs of entities to learn their relationships.
In-network prediction	In paired-input prediction problems, the prediction for the pair (A,B) is in-network if the training data for the predictor contains relationships in which A and B are separately involved.
Out-of-network prediction	In paired-input prediction problems, the prediction for the pair (A,B) is out-of-network if the training data for the predictor does not contain relationships for A, B, or both.
AUC	Area Under an ROC (receiver operating characteristic) curve is a classification quality measure where an AUC of 1 represents perfect prediction performance and an AUC of 0.5 indicates random prediction.

Back to article page

Search

Advanced search

Quick links