Table 7 Comparison of the performance between machine learning models that integrate missing data and those that do not.
Aspect | With missing data handling | Without missing data handling |
|---|---|---|
Data size and completeness | Maintains the size of the data by imputing missing values, employing the complete information for training | Rows containing values that are missing are frequently eliminated, hence affecting the amount of sample and information |
Bias and Variance | Lowered bias, since imputed data aids in preserving stability and preventing distortion of the parameters of the model | Exclusion of rows or columns may result in significant bias, producing an unrepresentative sample |
Impact on Feature Relationships | Imputation maintains inter-feature interactions, resulting in stronger and consistent models | Distorting correlations occur when significant characteristics are missing values, resulting in unreliable predictions |
Algorithm Compatibility | Most machine learning methods can be efficiently employed with imputed input | Some approaches (e.g., linear models, neural networks) are incapable of directly accommodating missing values |
Computational Efficiency | Imputation methods, such as KNN and MICE, can be highly computational, impacting scalability | Models could show superior computing speed but demonstrate a deficiency in performance stability |
Practical Application | Appropriate for sensitive domains (e.g., healthcare) where data integrity is essential for safety | Insufficient for delicate applications; skewed systems may result in significant inaccuracies |
Model Interpretability | Models retain interpretability by precise imputation that preserves the structure of the data | Interpretability is compromised by the absence of context and imperfect correlations |
Overall Model Performance | Generally superior performance regarding precision, reliability and stability | Unreliable and inconsistent performance resulting from insufficient learning and biases |