Fig. 2
From: A text-speech multimodal Chinese named entity recognition model for crop diseases and pests

Cross modal attention \(C{A}_{i}({{\varvec{X}}}_{\boldsymbol{\alpha }},{{\varvec{Y}}}_{{\varvec{\beta}}})\) between sequences \({{\varvec{X}}}_{\boldsymbol{\alpha }},{{\varvec{Y}}}_{{\varvec{\beta}}}\) from various modalities.