Fig. 1: Clinical perspective on the use of AI/ML in precision oncology.

Analytical tools must be adapted to intended goals and available data. Innovations in artificial intelligence, machine learning analytical techniques, and new modalities for deep measurement of disease hold great promise for advancing precision oncology. Central to deriving maximal benefit from these innovations is for researchers and practitioners to clearly articulate (a) what goals they are seeking to achieve and (b) what sources of data are available for analysis. This will then dictate the choice of (c) analytical tools. Inferential statistics is a “data model” approach that seeks to understand or infer the relationships between independent variables (covariates) and dependent variables (outcomes) based on prior assumptions about the data structure. In contrast, machine learning is an “algorithmic model” approach, which makes few assumptions about the data but rather designs algorithms that can input direct measurements or derived variables, transform them through the mathematical workings of the algorithm into “features”, and ultimately “learn” to predict the dependent variable (label). Inference is to statistics as prediction is to machine learning, and moving forward, we will need to use all the tools in our analytical toolkit. Interventional statisticians (i.e., clinical trialists) often use the entire sample size for a primary analysis to maximize the power of the analysis and less frequently use training and validation sets, whereas data scientists and observational statisticians (i.e., epidemiologists) divide patient samples into training, validation, and test sets to demonstrate predictive ability on the “unseen” test set based on analysis of the training and validation sets. Both utilize models, but the primary objectives are different187. Inferential statistics, using “data models,” seeks to understand or infer the relationships between the independent variables and the dependent or outcome variables within a dataset in three fashions: exploratory or inductive, hypothesis-testing or deductive, and explanatory or abductive. In all cases, a model that makes assumptions about the structure of the data (normal distribution or proportional hazards between groups) is applied to the dataset in order to understand the relationship between prespecified independent input variables (“x”) and dependent outcome variables (“y”) and to draw population inferences from a sample188. ML “algorithmic models” often make fewer assumptions compared to inferential statistics about the structure of the data or the nature of the relationship between variables. The flexibility of ML “algorithmic models” lies in their ability to adapt these assumptions based on the chosen model and application, making these models applicable to a wide range of predictive tasks189. Since ML/DL is a form of “representation learning,” in that the machine is fed raw data and develops its own models for pattern recognition190, the results can be used to make predictions about independent or “unseen” data. “Created with BioRender.com”.