Abstract:In response to the challenges posed by significant variations in biological scales and the issue of sample imbalance in underwater object detection, a multi-scale object detection method for underwater organisms (UMS-YOLO v7) was proposed. Firstly, a feature extraction module was designed, comprising switchable atrous convolutions. This module captured multi-scale target features across various receptive field sizes, ensuring a more comprehensive extraction of feature information. Secondly, a lightweight universal upsampling operator was employed to fuse contextual information, enhancing the model’s ability to learn features for objects. Finally, by combining two similarity metrics, Wise-IoU and normalized Wasserstein distance, the localization accuracy of targets at different scales was improved, simultaneously mitigated the impact of uneven distribution of multi-scale samples on the model. The experimental results demonstrated that the proposed model significantly enhanced detection accuracy compared with other current models, with average accuracies of 64.5% and 68.9% on the RUOD and DUO datasets, respectively. Compared with the YOLO v7 model, UMS-YOLO v7 improved multi-scale object detection accuracy, and precise detection of underwater organisms can also be achieved in complex underwater environments. On the DUO dataset, the average accuracy for large, medium, and small-scale objects was respectively increased by 8.3 percentage points, 4.8 percentage points, and 12.5 percentage points, respectively, with the most notable improvement observed for small objects. In comparison with other existing models, the improved model exhibited higher detection accuracy, and it was better suited for underwater biological multi-scale object detection tasks. Additionally, it exhibited generalization, robustness, and adaptability for samples with different data distributions.