Abstract:Accurate measurement of phenotypic traits in plant growth using automated methods is crucial for applications such as breeding and cultivation. Aiming to address the need for non-destructive, precise detection of phenotypic traits in factory-grown lettuce, by integrating RGB images and depth images collected by depth cameras, an improved DeepLabv3+ model was used for image segmentation, and a dual-modal regression network estimated the phenotypic traits of lettuce. The backbone of the improved segmentation model was replaced from Xception to MobileViTv2 to enhance its global perception capabilities and performance. In the regression network, a convolutional multi-modal feature fusion module (CMMCM) was proposed to estimate the phenotypic traits of lettuce. Experimental results on a public dataset containing four lettuce varieties showed that the method estimated five phenotypic traits—fresh weight, dry weight, canopy diameter, leaf area, and plant height—with determination coefficients of 0.922 2, 0.931 4, 0.862 0, 0.935 9, and 0.887 5, respectively. Compared with the RGB and depth image-based phenotypic parameter estimation benchmark ResNet-10 (Dual) without CMMCM and SE modules, the improved model increased the determination coefficients by 2.54%, 2.54%, 1.48%, 2.99%, and 4.88%, respectively, with an image detection time of 44.8 ms per image. This demonstrated that the method achieved high accuracy and real-time performance for non-destructive detection of lettuce phenotypic traits through dual-modal image fusion.