Abstract:Plant height is an important phenotypic indicator for identifying maize germplasm traits and crop vigor, and maize genetic characteristics are obvious at the seedling stage, so accurate measurement of plant height at the seedling stage is of great significance for maize genetic characteristics identification and field management. Aiming at the problem that traditional plant height acquisition methods rely on manual measurement, which is time-consuming and subjective error, an improved ZoeDepth monocular depth estimation model incorporating mixed attention information was proposed. The improved model added the Shuffle Attention module to the various stages in the Decoder module, so that the Decoder module was more able to pay attention to the effective information in all the feature maps in the process of extracting information from the low-resolution feature maps, which enhanced the model’s ability of key information extraction, and could generate more accurate depth maps. In order to verify the effectiveness of the method, the validation was carried out on the NYU-V2 depth dataset, and the results showed that the ARE, RMSE, LG were 0.083, 0.301mm and 0.036, and the accuracy δ under different thresholds of the improved Shuffle-ZoeDepth model were 93.9%, 99.1% and 99.8%, respectively, all of which were better than those of the improved Shuffle-ZoeDepth model on NYU-V2 depth dataset.In addition, the Shuffle-ZoeDepth monocular depth estimation model combined with the maize plant height measurement model was used to complete the measurement of seedling maize plant height, and maize height measurement experiments were carried out by collecting images of seedling maize at different distances, and when the maize height was in the three height intervals of 15~25cm, 25~35cm, and 35~45cm, the AE were respectively 1.41cm, 2.21cm, and 2.08cm, and the PE were 8.41%, 7.54%, and 4.98%, respectively. The experimental results showed that this method can accomplish the accurate measurement of maize plant height at the seedling stage in complex environments using only a single RGB camera with a complex outdoor environment.