Figure 4

Diagram of proposed PFL-SD compared with conventional method. (a) The existing AI method (baseline) receives the patient’s entire audio recording, \(x_{te}\), as the network input and provides the diagnosis result as probability vector \(f_{\theta ^*}(x_{te})\). (b) The proposed PFL-SD splits the audio recording, \(x_{te}\), into P patches \(\{x^i_{te}\}_{i=1}^P\), provides a diagnosis probability estimate (\(\{p_{\theta ^*}(x^i_{te})\}_{i=1}^P\)) based on few-shot learning per patch, and performs patient-level diagnosis by merging the diagnosis results (Eq. (10)). The diagnosis of individual patches is performed through distance (Eq. (6)) comparison between the target sample and prototype representation of the training data support set per class11. As training data can be integrated as additional information even during inference, high diagnostic performance can be achieved even with few training samples available.