Abstract:Soil available phosphorus (AP) is supposed to be an important nutrient constituent for the growth and development of crops. Hyperspectral analysis has proven to be a rapid and effective means for quantitatively predicting soil AP, which has a good prospect benefit from the narrow bandwidth and the high resolution. However, the existence of multicollinearity and redundant considerably leads to overfitting of the regression model and decrease of the generalization ability. A total of 145 lime concretion black soil samples collected from the Northern Anhui Plain, China, were used as research objects to investigate the prediction performance of the back-propagation neural network (BPNN) based on the partial least square regression (PLS-R) algorithm. The PLS-R was applied to conduct dimensionality reduction and feature selection on the soil visible and near infrared hyperspectral data ranging from 400~1000nm with 339 wavelengths. Five latent variables (LVs) were obtained by the leave one out crossvalidation, and nine optimal wavelengths were selected by the variable importance in projection (VIP) scores. The BPNN regression models were built with the input of the five latent variables (LVs-BPNN), the nine optimal wavelengths (VIPs-BPNN), and the whole wavelengths (Ws-BPNN), respectively. The ratio of performance to deviation (MRPD) and the ratio of the interpretable sum squared deviation to the real sum squared deviation (MSSR/SST) were selected to evaluate the prediction accuracy and explanatory power of different regression models, respectively. As a result, the prediction accuracies of three BPNN models outperformed the PLS-R model significantly;the VIPs-BPNN model achieved similar performance (MRPD was 2.05, MSSR/SST was 0.79) as the Ws-BPNN model (MRPD was 2.09, MSSR/SST was 0.85) of the validation set, while the MRPD was decreased obviously from 10.27 (Ws-BPNN) to 2.66 (VIPs-BPNN) of the calibration set;the LVs-BPNN model gained the highest prediction accuracy as MRPD was 2.29 of the validation set, even though the MSSR/SST was slightly decreased to 0.76. The results illustrated that the PLS-BPNN models could significantly reduce the degree of overfitting and improve the generalization ability;moreover, the LVs-BPNN model could improve the accuracy of predicting soil AP.