Table 6 Pricing performances under different input features.

From: How to price a dataset: a deep learning framework for data monetization with alternative data

Methods

Features

80–20%

70–30%

90–10%

MSE

RMSE

MAE

MSE

RMSE

MAE

MSE

RMSE

MAE

LGBM

TF

2.7226

1.6500

1.2457

2.6122

1.6162

1.2317

2.5761

1.6050

1.2520

 

TF + Text (data title)

1.2715

1.1276

0.7684

1.5692

1.2527

0.8702

1.5022

1.2257

0.8198

 

TF + Text (target user)

2.6050

1.6140

1.1421

2.4328

1.1199

1.5597

2.7867

1.6694

1.2409

 

TF + Text (data function)

2.4041

1.5505

1.0854

2.2813

1.5104

1.0778

2.4970

1.5802

1.1401

 

TF + Text (data descriptions)

0.8016

0.8953

0.6248

0.9492

0.9743

0.6809

0.8116

0.9009

0.6173

 

TF + Text (total)

0.9941

0.9970

0.6489

1.1497

1.0722

0.7266

0.8526

0.9234

0.6389

MLP

TF

5.3025

1.7867

2.3027

5.4665

2.3381

1.8287

4.7131

2.1710

1.7092

 

TF + Text (data title)

2.6124

1.6163

1.1653

2.8345

1.6836

1.1803

3.2863

1.8128

1.2544

 

TF + Text (target user)

4.0210

2.0052

1.4659

3.8034

1.9502

1.4093

3.5362

1.8805

1.3539

 

TF + Text (data function)

4.4767

2.1158

1.6672

4.5859

2.1415

1.6115

3.2846

1.8124

1.3204

 

TF + Text (data descriptions)

1.0651

1.0320

0.6483

1.1049

1.0511

0.7034

0.9007

0.9491

0.6676

 

TF + Text (total)

1.7074

1.3067

0.9235

1.5713

1.2535

0.8629

0.9355

0.9672

0.7165

DT

TF

5.5779

2.3618

1.5295

5.2832

2.2985

1.5048

5.5555

2.3570

1.5112

 

TF + Text (data title)

4.8629

2.2052

1.4432

5.0400

2.2450

1.5140

4.2060

2.0509

1.2653

 

TF + Text (target user)

5.6352

2.3739

1.5087

4.5270

2.1277

1.3550

5.7507

2.3981

1.5182

 

TF + Text (data function)

4.1051

2.0261

1.2680

4.7251

2.1737

1.3668

5.2672

2.2950

1.5241

 

TF + Text (data descriptions)

2.9486

1.7171

1.0972

3.1107

1.7637

1.1673

3.1418

1.7725

1.1185

 

TF + Text (total)

3.1783

1.1199

1.7828

3.4990

1.8706

1.1850

3.1492

1.7746

1.0538

GBDT

TF

2.8153

1.6779

1.2922

2.8486

1.6878

1.2866

2.6176

1.6179

1.3153

 

TF + Text (data title)

1.8530

1.3612

1.0152

2.0159

1.4198

1.0273

1.8415

1.3570

0.9990

 

TF + Text (target user)

2.8905

1.7002

1.2572

2.7421

1.6559

1.2455

2.7182

1.6487

1.2887

 

TF + Text (data function)

2.8044

1.6746

1.2172

2.4539

1.5665

1.1528

2.3408

1.5300

1.1697

 

TF + Text (data descriptions)

1.1892

1.0905

0.8086

1.3966

1.1818

0.8818

1.2838

1.1331

0.8391

 

TF + Text (total)

1.8630

1.3649

1.0129

1.8482

1.3595

1.0199

1.5875

1.2600

0.9567

RF

TF

2.4185

1.5552

1.1429

2.4799

1.5748

1.1575

2.6446

1.6262

1.2125

 

TF + Text (data title)

1.7186

1.3110

0.9076

1.9694

1.4033

0.9913

1.7925

1.3388

0.9372

 

TF + Text (target user)

2.5307

1.5908

1.1290

2.3857

1.5446

1.0973

2.8026

1.6741

1.1928

 

TF + Text (data function)

2.3676

1.5387

1.0450

2.3092

1.5196

1.0264

2.5204

1.5876

1.0876

 

TF + Text (data descriptions)

1.1402

1.0678

0.7340

1.2443

1.1155

0.8120

1.0117

1.0058

0.6861

 

TF + Text (total)

1.1682

1.0808

0.7661

1.3727

1.1716

0.8350

1.0718

1.0353

0.7284

  1. TF means traditional features.
  2. The bold values indicate the best performance for each evaluation metric.