Abstract:Multi-modal fusion technology, by combining data from multiple sources, has been widely applied in fields such as medicine, autonomous driving, and emotion recognition to overcome the limitations of a single modality. In recent years, advancements in sensor and remote sensing technologies have provided richer data sources for crop monitoring, including spectral data, image data, radar data, and thermal infrared data. By utilizing computer vision and data analysis methods, information such as phenotypic parameters and physicochemical characteristics of crops can be obtained, helping to assess crop growth and guide agricultural production management. Most existing studies were based on single-modal data, which involved only one type of input and lacked an understanding of the overall information, making them susceptible to noise from a single modality. Although some studies employed multi-modal fusion technology, they still did not fully consider the complex interactions between modalities. To thoroughly analyze the potential of multi-modal fusion technology in crop monitoring, the advanced technologies and methods of multi-modal fusion in the agricultural field were firstly outlined, with a focus on its application in crop identification, trait analysis, yield prediction, stress analysis, and pest and disease diagnosis. The existing challenges were also discussed and an outlook on future developments was provided, aiming to promote precision agriculture management and improve production efficiency through multi-modal fusion methods.