Table 2 Techniques used in prior works along with their merits and demerits.
From: An efficient learning based approach for automatic record deduplication with benchmark datasets
Reference | Technique | Dataset | Advantages | Limitations |
|---|---|---|---|---|
ML-based methods | Real-world datasets | High accuracy, scalability and efficiency | Usability is low | |
A statistical approach to treat missing values | Custom datasets | High accuracy, scalability and efficiency | Usability is low | |
ML for generation of match classifiers | Custom dataset | High scalability and efficiency | Usability and accuracy are low | |
Ensemble with noise elimination | Smart Dataset | High scalability and high efficiency | Usability and accuracy are low and complexity is high | |
Bayesian generative process model | Car sales dataset | High scalability, accuracy and efficiency | High complexity and usability are low | |
Classification rules configuration | RDD | High accuracy and efficiency | Usability and scalability are low |