Table 2 Techniques used in prior works along with their merits and demerits.

From: An efficient learning based approach for automatic record deduplication with benchmark datasets

Reference

Technique

Dataset

Advantages

Limitations

71

ML-based methods

Real-world datasets

High accuracy, scalability and efficiency

Usability is low

10

A statistical approach to treat missing values

Custom datasets

High accuracy, scalability and efficiency

Usability is low

9

ML for generation of match classifiers

Custom dataset

High scalability and efficiency

Usability and accuracy are low

12

Ensemble with noise elimination

Smart Dataset

High scalability and high efficiency

Usability and accuracy are low and complexity is high

8

Bayesian generative process model

Car sales dataset

High scalability, accuracy and efficiency

High complexity and usability are low

72

Classification rules configuration

RDD

High accuracy and efficiency

Usability and scalability are low