Scientific Reports

Table 11 Summary of the results and discussion.

From: A practical evaluation of AutoML tools for binary, multiclass, and multilabel classification

Scenario	Key findings	Challenges	Recommendations
Binary	Advanced HPO approaches yielded top performance but required more time. Some faster frameworks or minimal pipelines struggled on complex or missing data and occasionally failed on certain tasks	Rigid preprocessing and insufficient handling of missing data or class imbalance limited performance and led to failures.	Adopt robust data encoding, improve imbalance mitigation, and enable adaptive model selection to address diverse complexities and avoid execution errors
Multiclass	A few frameworks consistently achieved strong results, while others showed unexpected drops on simpler data. No framework failed to run, though performance variability was substantial	Maintaining stable accuracy across varied distributions was difficult, and certain frameworks always used the full-time budget or saw large accuracy swings despite quick runs.	Incorporate adaptive ensembling or selective search to handle different data complexities effectively without monopolizing runtime or suffering marked performance drops
Multilabel (Native)	Only a limited set of frameworks supported native multi-output classification; one generally excelled in accuracy, while another was faster but less accurate	Sparse label sets, limited native support, and inconsistent training times reduced reliability, with many frameworks providing no results at all	Enhance native multilabel capabilities to cope with label sparsity and ensure consistent optimization loops for stable performance
Multilabel (Powerset)	More exhaustive pipelines or ensembling achieved higher scores but demanded longer training. Some frameworks finished rapidly but showed significantly lower accuracy or failed under extreme label inflation	Label powerset transformations exposed imbalance and sparse label combinations, causing pipeline instability and partial failures in certain tools	Adopt specialized balancing or meta-label methods to handle expanded label sets and refine search algorithms to stay robust under label inflation
General	No single tool dominated all tasks. Comprehensive search approaches delivered higher accuracy but often used the entire time limit, while faster methods risked significant degradation on challenging data	Handling real-world data characteristics, such as missing features and label imbalance, remained a common obstacle, and traditional complexity metrics did not fully capture domain-level issues	Employ resilient pipelines that combine flexible search with advanced preprocessing and domain-aware strategies, balancing thoroughness against strict time constraints

Back to article page

Search

Advanced search

Quick links