Table 3 Model performance evaluated against the legacy dictionary method.

Topic	True	False	True	False	Precision	Recall	F₁ Score
Topic	Pos.	Pos.	Neg.	Neg.	Precision	Recall	F₁ Score
Tax Policy	75,058	26,288	1,222,541	37,107	74	67	70
Foreign Trade	4,188	3,573	1,352,506	727	54	85	66
Transportation	28,825	25,922	1,293,531	12,716	53	69	60
Police and Fire	32,875	31,386	1,285,187	11,546	51	74	60
Labor and Employment	36,349	37,616	1,265,878	21,151	49	63	55
Local Government	26,830	29,715	1,281,215	23,234	47	54	50
Health	70,380	96,796	1,178,041	15,777	42	82	56
Agriculture	17,401	25,952	1,313,319	4,322	40	80	53
Insurance	33,912	49,878	1,269,258	7,946	40	81	54
Education	52,843	87,123	1,194,764	26,264	38	67	48
Natural Resource	9,616	19,233	1,322,858	9,287	33	51	40
Utilities	12,632	28,405	1,318,188	1,769	31	88	46
Military	36,180	81,761	1,238,984	4,069	31	90	46
Environment	13,009	30,954	1,310,959	6,072	30	68	41
Religion	6,229	16,709	1,337,153	903	27	87	41
Construction	15,286	42,622	1,298,526	4,560	26	77	39
Public Lands and Water Management	3,240	9,183	1,347,280	1,291	26	72	38
Bank	7,392	25,410	1,325,243	2,949	23	71	34
Small Business	10,169	34,779	1,311,516	4,530	23	69	34
Fiscal and Economic Issues	20,419	79,291	1,253,791	7,493	20	73	32
Sports	5,986	29,289	1,322,176	3,543	17	63	27
Immigration	3,935	19,594	1,337,063	402	17	91	28
Civil Rights	5,089	33,152	1,321,595	1,158	13	81	23
Welfare	4,900	34,107	1,320,355	1,632	13	75	22
Manufacturing	6,006	38,707	1,314,548	1,733	13	78	23
Communication	3,625	24,991	1,330,391	1,987	13	65	21
Law	11,438	191,236	1,149,966	8,354	6	58	10
International Affairs and Foreign Aid	216	4,322	1,356,431	25	5	90	9
Macro-average					31	74	40
Micro-average					34	72	43

Notes: Model performance is evaluated against the dictionary method, i.e. a “true positive” (TP) denotes an instance where there is a keyword present and the model concurs with the topic prediction. A “false positive” (FP) denotes an instance where there is not a keyword present, but the model predicts the topic regardless. A “true negative” (TN) denotes an instance where neither the keyword is present, nor does the model predict the topic. A “false negative” (FN) denotes an instance where the keyword is present, but the model does not predict the topic. Macro average is computed with each policy area equally weighted, and micro-average is computed with policy areas weighted by their respective number of bills coded.

Search