Table 1 Configuration parameters for NLP models
From: Aligning online images and realities beyond the hype for sustainable heritage tourism
Model Component | Library Used | Parameter | Value | Rationale / Purpose |
|---|---|---|---|---|
TF-IDF Vectorizer | scikit-learn | max_features | 1000 | Limits vocabulary to the 1000 most frequent terms across the corpus. |
min_df | 2 | Excludes terms appearing in only one document (filters rare words/noise). | ||
max_df | 0.8 | Excludes overly common terms appearing in >80% of documents (improves distinction). | ||
Sentiment Analysis | SnowNLP | N/A (Built-in) | N/A | Chosen for specific suitability and performance on Chinese language UGC. |
LDA Topic Model | scikit-learn | n_components (K) | 5 | Optimal number based on coherence, perplexity, and interpretability analysis. |
max_iter | 20 | Number of iterations for batch learning algorithm to ensure convergence. | ||
learning_method | “batch” | Uses all data in each iteration; suitable for the dataset size. | ||
random_state | 42 | Ensures reproducibility of results. | ||
doc_topic_prior (α) | None (Defaults to 1/K) | Uses a non-informative symmetric prior (0.2); lets data drive topic formation. | ||
topic_word_prior (β) | None (Defaults to 1/K) | Uses a non-informative symmetric prior (0.2); lets data drive topic formation. |