Table 1 Technical terms.

From: Systematic auditing is essential to debiasing machine learning in biology

Term

Explanation

Training sets

Data examples we feed ML models to learn from.

Features

Extracted information used to describe entities to inform the ML models about their characteristics from which the models should learn.

ML generalization

Ability of ML models to perform well on datasets independent from which their training examples were sampled.

ML auditor

A system where a ML model of interest is compared to another ML model that is tailored to examine a specific hypothesis about the initial model.

ML auditing

Examining biases of ML frameworks by building ad-hoc ML auditors.

Representational bias

Imbalance or inequality in how different entities are represented in the data due to inherent or experimental conditions.

Paired-input prediction

A class of ML prediction methods where the goal is to predict the relationships between two entities. The ML models are thus trained on pairs of entities to learn their relationships.

In-network prediction

In paired-input prediction problems, the prediction for the pair (A,B) is in-network if the training data for the predictor contains relationships in which A and B are separately involved.

Out-of-network prediction

In paired-input prediction problems, the prediction for the pair (A,B) is out-of-network if the training data for the predictor does not contain relationships for A, B, or both.

AUC

Area Under an ROC (receiver operating characteristic) curve is a classification quality measure where an AUC of 1 represents perfect prediction performance and an AUC of 0.5 indicates random prediction.