Table 1 Configuration parameters for NLP models

From: Aligning online images and realities beyond the hype for sustainable heritage tourism

Model Component

Library Used

Parameter

Value

Rationale / Purpose

TF-IDF Vectorizer

scikit-learn

max_features

1000

Limits vocabulary to the 1000 most frequent terms across the corpus.

  

min_df

2

Excludes terms appearing in only one document (filters rare words/noise).

  

max_df

0.8

Excludes overly common terms appearing in >80% of documents (improves distinction).

Sentiment Analysis

SnowNLP

N/A (Built-in)

N/A

Chosen for specific suitability and performance on Chinese language UGC.

LDA Topic Model

scikit-learn

n_components (K)

5

Optimal number based on coherence, perplexity, and interpretability analysis.

  

max_iter

20

Number of iterations for batch learning algorithm to ensure convergence.

  

learning_method

“batch”

Uses all data in each iteration; suitable for the dataset size.

  

random_state

42

Ensures reproducibility of results.

  

doc_topic_prior (α)

None (Defaults to 1/K)

Uses a non-informative symmetric prior (0.2); lets data drive topic formation.

  

topic_word_prior (β)

None (Defaults to 1/K)

Uses a non-informative symmetric prior (0.2); lets data drive topic formation.