Table 1 Technical terms.
From: Systematic auditing is essential to debiasing machine learning in biology
Term | Explanation |
|---|---|
Training sets | Data examples we feed ML models to learn from. |
Features | Extracted information used to describe entities to inform the ML models about their characteristics from which the models should learn. |
ML generalization | Ability of ML models to perform well on datasets independent from which their training examples were sampled. |
ML auditor | A system where a ML model of interest is compared to another ML model that is tailored to examine a specific hypothesis about the initial model. |
ML auditing | Examining biases of ML frameworks by building ad-hoc ML auditors. |
Representational bias | Imbalance or inequality in how different entities are represented in the data due to inherent or experimental conditions. |
Paired-input prediction | A class of ML prediction methods where the goal is to predict the relationships between two entities. The ML models are thus trained on pairs of entities to learn their relationships. |
In-network prediction | In paired-input prediction problems, the prediction for the pair (A,B) is in-network if the training data for the predictor contains relationships in which A and B are separately involved. |
Out-of-network prediction | In paired-input prediction problems, the prediction for the pair (A,B) is out-of-network if the training data for the predictor does not contain relationships for A, B, or both. |
AUC | Area Under an ROC (receiver operating characteristic) curve is a classification quality measure where an AUC of 1 represents perfect prediction performance and an AUC of 0.5 indicates random prediction. |