Table 1 Summary of the studies.

From: Automated detection of corruption reports in text via deep reinforcement learning

References

Year

Research Goal

Method

Limitation/Key finding

Lima et al.9

2020

Predict corruption perception across countries

Machine Learning (Random Forest)

Country-level prediction; not text-based

Liu12

2024

Detect corrupt textual data

Unsupervised learning

Unsupervised text detection; not supervised classification

Utomo et al.19

2022

Predict Anti-Corruption Disclosure (ACD) from firm data

Deep Neural Network

ACD prediction from structured firm data; not raw text classification from reports

Li et al.20

2020

Identify self-reported corruption on Twitter

Unsupervised Machine Learning (Biterm topic model)

Unsupervised topic modeling of tweets; no multi-class classification of specific types

Umer et al.21

2023

Investigate FastText with CNNs for text classification

CNN with FastText embeddings

General text classification; highlights FastText + CNN efficacy

Mohammed and Kora22

2022

Develop effective ensemble for text classification

Ensemble Deep Learning (meta-classifier)

Ensemble improves accuracy but increases computational cost

Ash et al.23

2021

Analyze and support anti-corruption policy

Machine Learning (tree-based gradient-boosted classifier)

Corruption detection from administrative/structured data; not raw text classification

Li24

2023

Textual data mining for financial fraud detection

Deep Learning (Neural Network models)

Financial fraud in regulatory texts; not multi-class corruption in social media

Muco26

2024

Assess corruption from text data

NLP methods (using human-coded data)

Corruption assessment using human-coded text; specific method details unclear in review

Chen et al.27

2022

Automated legal text classification

Random Forest with domain concepts vs. Deep Learning

Feature engineering (Random Forest) outperformed DL in specific legal domain

Dogra et al.28

2022

Review state-of-the-art NLP models for text classification

Review paper (various ML/DL models)

Comprehensive survey of text classification methods

Mittal et al.29

2021

Multi-label text classification

Deep Graph-LSTM

Graph-based LSTM for multi-label text; not social media

Köksal and Akgül30

2022

Comparative study of deep learning for text classification

DNN, CNN, LSTM, GRU with hyperparameter tuning

Improvements with word embeddings and hyperparameter tuning in general text classification

Soni et al.31

2023

Develop CNN-based architecture for text classification

TextConvoNet (2D multi-scale CNN)

Novel 2D CNN (TextConvoNet) captures inter/intra-sentence features for general text classification

Bangyal et al.32

2021

Detect fake news text (COVID-19)

Various ML/DL (CNN, LSTM, RNN, GRU) with TF–IDF

Fake news detection in microblogs; applies various DL models

Xiong et al.33

2023

Extreme multi-label text classification (XMTC)

XRR (Retrieving and Deep Ranking with Transformers)

Two-stage transformer for extreme multi-label classification

Abarna et al.34

2022

Idiom/literal text classification

Ensemble K-BERT (with knowledge graphs)

Advanced semantic classification using knowledge-enhanced BERT