Table 1 Analysis of existing studies (Results in acc %).

From: Identifying artificial intelligence-generated content using the DistilBERT transformer and NLP techniques

References

Model

Dataset

Feature

Results

Strength

Limitation

19

RNN, Markov model

Reddit, Yahoo Answers

N-Gram, TF-IDF

88

Combines sequential and probabilistic models for robust pattern detection

Limited contextual understanding; struggles with long-range dependencies

20

SVM, KNN, Decision Tree

Research paper content

POS tagging

85

Leverages syntactic cues via POS features for clear interpretability

Classical classifiers may overfit on limited syntactic patterns; poor semantic generalization

24

CNN, RNN

Research paper content

N-Gram, POS

85

Utilizes convolutional layers for local pattern extraction and recurrent memory for context

High computational cost; sensitivity to hyperparameter tuning

35

Naïve Bayes, LSTM

WordNet and PAN human-written texts

Textual feature sets

90

Combines probabilistic baseline with deep sequence modeling for balanced performance

Naïve Bayes overly simplistic; LSTM requires extensive training data

22

RoBERTa

Yelp user reviews

Default transformer encoding

91

State-of-the-art contextual embeddings capture nuanced sentiment and style

Large model size leads to high inference latency and resource demands

25

RoBERTa WordNet ontology

Tweets, Reddit comments, Yahoo answers, and Yelp user reviews. weets, Reddit comments, Yahoo answers, and Yelp user reviews. tweets, Reddit comments

Default feature encoding

91

Subword‐level embeddings handle misspellings and rare words effectively

Ontology reliance may introduce bias; limited to covered semantic relations

17

BERT

Essays

TF-IDF

79

Fine-tuned transformer demonstrates baseline applicability to structured essay texts

TF-IDF lacks semantic depth; model underperforms on free-form or noisy inputs

18

BERT

Essays

Default feature encoding

66

Leverages pretrained contextual knowledge

Low accuracy indicates overfitting to training domain; limited feature adaptation

27

GRU

Essays, Tweets, Yelp

Count vectorization

87

Gated units capture sequence dynamics with moderate resource usage

Simpler than LSTM; may miss very long-term dependencies

16

SVM, Logistic Regression, RF, DT

BBC News

Textual features

89

Comprehensive comparison of multiple shallow classifiers highlights best performer

Shallow methods struggle with semantic nuances; inconsistent performance across topics

28

SVM, GBM, DT

Online text corpus (AI vs. Human)

Linguistic feature fusion

87

Ensemble and single models compared on AI-human task shows versatility

Performance gains marginal; feature engineering intensive

29

Random Forest, SVM

1500 Human texts

Word embeddings

92

Embedding-based features significantly boost classical models

Small dataset limits generalization; embedding quality dependent on pretraining data

30

SVM + LSTM

Mixed Genre Corpus

Word embeddings

85

Hybrid approach balances interpretability and sequence modeling

Complexity in combining models; tuning both components is challenging

31

Hybrid Detection Framework

GPT vs. Human news articles

Word embeddings

88

Unified pipeline demonstrates end-to-end applicability across news domains

Framework complexity may hinder real-time deployment

33

RoBERTa

Human vs. LLM Text Corpus

Pretrained embeddings

93

Achieves state-of-the-art results with minimal feature engineering

Large model footprint; sensitive to domain shift without further fine-tuning

34

BERT

Essays

Pre-Trained Embeddings

90

Strong baseline with deep contextual representations

High inference time; less efficient compared to distilled variants