Table 1 Number of variables used from respective sources with some examples given, complete list with distributions given in supplementary material.

From: Ensemble machine learning of factors influencing COVID-19 across US counties

Source

N Var.

Var. Examples

USAFacts

6

COVID-19 outcome data, population

Bureau of Economic Analysis (BEA)

1

GDP

5-Year American Community Survey (ACS), 2014–2018

14

County percentages by Sex and Ethnicity, Employment, Household Income, use of Public Transportation

TIGER/Line Geodatabases

7

Latitude, longtitude, land area

TIGER/Line Geodatabases; Federal Aviation Administration (FAA)

 

Distance to Airports

Interactive Atlas of Heart Disease and Stroke (2014–2016)

4

Number of Hospitals, Stroke, Access to Parks

County Health Rankings and Roadmaps

21

Life Expectancy, Smoking, Obesity,, Food Access, Mental Health, Physicians, Houshold Overcrowding etc.

Centers for Medicare & Medicaid Services (CMS)

15

Druge Abuse, Hypertension, Hyperlipidemia, Osteoporosis, etc.

National Centers for Environmental Information

1

Precipitation

CDC’s Social Vulnerability Index (SVI)

11

Percentile over 65 or under 17, Minority Scores, Limited English, Low Income Housing Estimates, Number Institutionalized

Quarterly Census of Employment and Wages

14

Labor force types, farming/mining, private industry, education/healthcare etc.

MIT election lab

1

Calculated Proportion Voted Republican 2016

Google

6

Google mobility to location type, Residence, Grocery etc.