We developed an integrated approach by coupling a Chemical Transport Model (CTM) with Machine Learning (ML) techniques to produce high spatial resolution daily NO2 and O3 concentration fields across Italy. Simulations for three years (2013-2015) with a spatial resolution of 5 km, performed by the regional air quality model FARM, were used as predictors along with other spatiotemporal data such as population, land use, surface greenness, and road networks, through a Random Forest (ML-RF) algorithm to produce daily concentrations at higher resolution (1 km) across the country. The evaluation of the integrated approach was based on NO2 and O3 observations from 530 and 293 monitoring stations across Italy, respectively. The CTM application yielded good performance for NO2 and excellent results for O3. For NO2, the levels at urban traffic stations were not captured by the simulations due to the adopted horizontal resolution and associated emission uncertainties. Performance improvements were achieved with the ML-RF predictions, reducing the underestimation of NO2 (fractional bias results close to zero) and better capturing spatial contrasts. The results from this work were used to support national exposure assessment and environmental epidemiology studies planned in the BEEP (Big Data in Environmental and Occupational Epidemiology) project and confirm the potential of machine learning methods to adequately predict atmospheric pollutant levels at high spatiotemporal resolutions.

Spatial-temporal Prediction of Ambient Nitrogen Dioxide and Ozone Levels Over Italy Using a Random Forest Model for Population Exposure Assessment