npj Digital Medicine

Table 7 Text classification results (F1-scores) for test-class-mhr (fivefold CV; results averaged per class). We use 2S-KS test for (a) comparisons between models trained with the same type of data; * marks statistically significant improvements for LDA over BoW, and CNN over LDA (α = 0.05, n₁ = n₂ = 30); (b) comparisons within a model trained with the different types of data (column KS test). Models using less than all key phrases that provided results closest to those with real data are highlighted in bold. We also report results of our ablation experiments when the training data contain only the context of key phrases, real, or generated.

From: Generation and evaluation of artificial mental health records for Natural Language Processing

	ICD-10
	F20	F32	F60	F31	F25	F10	av.	KS test, (D, p-value)
BoW
genuine	0.47	0.31	0.32	0.20	0.14	0.24	0.28
all	0.47	0.33	0.27	0.23	0.17	0.23	0.28	0.07, 0.88
top+meta	0.48	0.36	0.29	0.20	0.14	0.26	0.29	0.09, 0.61
*one+meta*	0.46	0.34	0.29	0.23	0.14	0.26	0.29	0.07, 0.80
key	0.47	0.27	0.26	0.11	0.12	0.23	0.24	0.17, 0.02
LDA
genuine*	0.55	0.47	0.35	0.32	0.25	0.40	0.39
all*	0.55	0.44	0.35	0.31	0.26	0.37	0.38	0.11, 0.35
*top+meta**	0.52	0.43	0.37	0.29	0.25	0.40	0.38	0.09, 0.51
one+meta*	0.50	0.45	0.36	0.28	0.23	0.39	0.37	0.14, 0.10
key*	0.54	0.45	0.38	0.30	0.24	0.40	0.39	0.07, 0.88
CNN
genuine*	0.66	0.59	0.51	0.37	0.23	0.53	0.48
all*	0.65	0.57	0.47	0.27	0.24	0.50	0.45	0.14, 0.10
*top+meta**	0.63	0.55	0.45	0.31	0.23	0.42	0.43	0.20, 4e−3
one+meta*	0.59	0.52	0.42	0.25	0.15	0.43	0.39	0.22, 1e−3
key	0.57	0.34	0.33	0.23	0.20	0.35	0.34	0.37, 1.9e−09
No key phrases
CNN
genuine	0.48	0.34	0.22	0.22	0.15	0.12	0.25
top+meta	0.30	0.30	0.09	0.25	0.09	0.03	0.18	0.24, 2.7e−04
LDA
genuine*	0.41	0.40	0.32	0.22	0.20	0.26	0.30
top+meta*	0.29	0.37	0.28	0.23	0.14	0.25	0.26	0.23, 4.4e−04

Back to article page

Search

Advanced search

Quick links