Cereal and Rapeseed Yield Forecast in Poland at Regional Level Using Machine Learning and Classical Statistical Models

Abstract
This study performed in-season yield prediction, about 2–3 months before theharvest, for cereals and rapeseed at the province level in Poland for 2009–2024. Variousmodels were employed, including machine learning algorithms and multiple linear regression.The satellite-derived normalized difference vegetation index (NDVI) and climaticwater balance (CWB), calculated using meteorological data, were treated as predictors ofcrop yield. The accuracy of the models was compared to identify the optimal approach.The strongest correlation coefficients with crop yield were observed for the NDVI at thebeginning of March, ranging from 0.454 for rapeseed to 0.503 for rye. Depending on thecrop, the highest R2 values were observed for different prediction models, ranging from0.654 for rapeseed based on the random forest model to 0.777 for basic cereals based onlinear regression. The random forest model was best for rapeseed yield, while for cereal, thebest prediction was observed for multiple linear regression or neural network models. Forthe studied crops, all models had mean absolute errors and root mean squared errors notexceeding 6 dt/ha, which is relatively small because it is under 20% of the mean yield. Forthe best models, in most cases, relative errors were not higher than 10% of the mean yield.The results proved that linear regression and machine learning models are characterized bysimilar predictions, likely due to the relatively small sample size (256 observations).
Description
Keywords
grain yield, satellite data, remote sensing, random forest, neural networks
Citation
Okupska, E.; Gozdowski, D.; Pudełko, R.; Wójcik-Gront, E. Cereal and Rapeseed Yield Forecast in Poland at Regional Level Using Machine Learning and Classical Statistical Models. Agriculture 2025, 15, 984. https://doi.org/10.3390/ agriculture15090984
Collections