Fig. 1
From: Reducing annotation burden in physical activity research using vision language models

Illustration of the computer vision approaches compared (top). Below, quartile plots12 show the five-number summary of per-participant F\(_1\)-scores for sedentary behaviour (SB), light intensity physical activity (LIPA), and moderate-to-vigorous physical activity (MVPA), for the best-performing vision-language model, LLaVA (squares), and the best-performing discriminative vision model, ViT (circles), selected via hyperparameter tuning. Performance is shown for participants in the Oxfordshire study (blue) and the Sichuan study (red) withheld from model selection. MVPA constitutes only 8% of the training set, which is reflected in the high variance of per-participant F\(_1\)-scores.