Table 1 Fitness of self-contained classifiers to address characteristics and issues in enterprise data.

From: Table2Vec-automated universal representation learning of enterprise data DNA for benchmarkable and explainable enterprise data science

Classifiers

Imbalance

Mixed features

Heterogeneity

Sparsity

Inconsistency

Dynamics

Data quality issues

KNN

\(\checkmark\)

      

Naive Byes

\(\checkmark\)

\(\checkmark\)

     

SVM

\(\checkmark\)

 

\(\checkmark\)

\(\checkmark\)

   

Decision Tree

\(\checkmark\)

\(\checkmark\)

\(\checkmark\)

\(\checkmark\)

   

Random Forest

\(\checkmark\)

\(\checkmark\)

\(\checkmark\)

\(\checkmark\)

   

XGBoost

\(\checkmark\)

\(\checkmark\)

\(\checkmark\)

\(\checkmark\)

   

DNN

 

\(\checkmark\)

\(\checkmark\)

\(\checkmark\)

   

Table2Vec

\(\checkmark\)

\(\checkmark\)

\(\checkmark\)

\(\checkmark\)

\(\checkmark\)

\(\checkmark\)

\(\checkmark\)

  1. Typical classifiers only focus on classification, requiring heavy and duplicated commitments by each analyst to address similar data quality issues. Table2Vec instead addresses both data quality issues and representation learning in one go, enabling end-to-end and automated enterprise data science.