Fig. 3

Network backbone flow work: We extract four-layer feature information through the PVT encoder, and use Cross Feature Module (CFM) to cross the feature information of the last three layers to reduce the differences between features and avoid redundant information. To prevent fuzzy feature information from being filtered, we connect the features of the first layer to obtain the prediction probability of the model.