Table 13 Comparison of Off‑policy PPO FS with classical, advanced, and RL-based methods on the CelebA dataset.

From: Reinforcement learning-driven feature selection enhanced by an evolutionary approach tuning for criminal suspect identification

Model

Accuracy

F-measure

G-means

AUC

LASSO

68.511 ± 0.027

70.211 ± 0.048

71.881 ± 0.041

0.613 ± 0.013

mRMR

69.609 ± 0.074

71.678 ± 0.082

72.314 ± 0.085

0.628 ± 0.086

PCA

67.511 ± 0.043

69.202 ± 0.019

71.929 ± 0.070

0.603 ± 0.080

MI

70.772 ± 0.049

72.270 ± 0.066

73.909 ± 0.088

0.632 ± 0.016

Cross-Attention

71.582 ± 0.074

74.345 ± 0.005

75.995 ± 0.084

0.648 ± 0.062

AE

72.948 ± 0.012

74.437 ± 0.005

75.138 ± 0.085

0.652 ± 0.066

TSFS

74.223 ± 0.058

76.531 ± 0.072

77.186 ± 0.069

0.658 ± 0.088

A-SFS

75.209 ± 0.089

77.730 ± 0.062

78.395 ± 0.073

0.661 ± 0.010

FaceNet

75.987 ± 0.077

78.495 ± 0.072

79.157 ± 0.082

0.667 ± 0.059

VGG-Face

77.220 ± 0.003

78.351 ± 0.018

79.048 ± 0.078

0.681 ± 0.070

DeepFace

77.810 ± 0.063

79.956 ± 0.026

80.655 ± 0.062

0.689 ± 0.083

RL

80.860 ± 0.054

82.556 ± 0.029

83.198 ± 0.090

0.782 ± 0.023

SAC

83.579 ± 0.076

84.102 ± 0.012

84.631 ± 0.026

0.798 ± 0.053

PPO

81.991 ± 0.052

82.887 ± 0.067

83.449 ± 0.069

0.792 ± 0.095

Off-policy PPO

87.951 ± 0.096

89.409 ± 0.087

90.193 ± 0.030

0.829 ± 0.073