Table 7 Comparison of the performance between machine learning models that integrate missing data and those that do not.

From: HMLA: A hybrid machine learning approach for enhancing stroke prediction models with missing data imputation techniques

Aspect

With missing data handling

Without missing data handling

Data size and completeness

Maintains the size of the data by imputing missing values, employing the complete information for training

Rows containing values that are missing are frequently eliminated, hence affecting the amount of sample and information

Bias and Variance

Lowered bias, since imputed data aids in preserving stability and preventing distortion of the parameters of the model

Exclusion of rows or columns may result in significant bias, producing an unrepresentative sample

Impact on Feature Relationships

Imputation maintains inter-feature interactions, resulting in stronger and consistent models

Distorting correlations occur when significant characteristics are missing values, resulting in unreliable predictions

Algorithm Compatibility

Most machine learning methods can be efficiently employed with imputed input

Some approaches (e.g., linear models, neural networks) are incapable of directly accommodating missing values

Computational Efficiency

Imputation methods, such as KNN and MICE, can be highly computational, impacting scalability

Models could show superior computing speed but demonstrate a deficiency in performance stability

Practical Application

Appropriate for sensitive domains (e.g., healthcare) where data integrity is essential for safety

Insufficient for delicate applications; skewed systems may result in significant inaccuracies

Model Interpretability

Models retain interpretability by precise imputation that preserves the structure of the data

Interpretability is compromised by the absence of context and imperfect correlations

Overall Model Performance

Generally superior performance regarding precision, reliability and stability

Unreliable and inconsistent performance resulting from insufficient learning and biases