Table 3 The process of development of (sub-)classification tools for LBP using AI/ML compared to the STarT Back and McKenzie.

From: Artificial intelligence to improve back pain outcomes and lessons learnt from clinical classification approaches: three systematic reviews

 

Classification accuracya

Internal consistencyb

Test−retest reliabilityc

Intra- or inter-rater reliabilityd

Construct validitye

Discriminative validityf

Prognosis: paing

Prognosis: disabilityg

Treatment: painh

Treatment: disabilityh

Treatment: costsh

AI/ML

20/25 (80%)

STarT Back

NA

6/9 (67%)

9/9 (100%)

5/11 (45%)

8/8 (100%)

2/6 (33%)

6/8 (75%)

1/4 (25%)

3/4 (75%)

0/2 (0%)

McKenzie

NA

4/10 (40%)

1/2 (50%)

5/11 (45%)

4/11 (36%)

0/1 (0%)

  1. Values reported as number and percentage.
  2. AI/ML artificial intelligence and machine learning, — no studies available or unable to be measured, NA not assessed in this systematic review.
  3. aNumber of AI/ML studies reporting ≥80% accuracy of classification into ‘low-back pain’ versus ‘healthy’.
  4. bInternal consistency was considered acceptable if Cronbach’s α was ≥0.7146.
  5. cTest−retest was considered as acceptable above an intraclass correlation coefficient (ICC) of ≥0.7146,163.
  6. dKappa scores for intra-rater and inter-tester reliability were considered good ≥0.61122.
  7. eConstruct validity ≥0.6 was considered acceptable146,164.
  8. fDiscriminative validity ≥0.7 was considered as acceptable discrimination13.
  9. gPrognosis prediction was considered ‘adequate’ when the classification approach resulted in statistically significant prediction of outcome after adjusting for baseline pain or disability in multivariate models147,148,149,150.
  10. hTreatment effect was considered ‘adequate’ when the classification approach resulted in a statistically significant improved patients outcomes for pain or disability or healthcare costs in randomised or non-randomised clinical trials.