Mesic ecosystems occupy a very small portion of semi-arid landscapes, yet they function as ecological and hydrological keystones supporting biodiversity, wildlife, and moisture-dependent services across the sagebrush biome. Despite their importance, monitoring mesic ecosystems at broad spatial and temporal scales remains challenging because these habitats are rare, spatially fragmented, and highly variable across environmental gradients. Existing satellite products face trade-offs between spatial resolution (Sentinel-2) and temporal depth (Landsat), complicating efforts to generate long-term, high-resolution maps. Although machine learning offers new opportunities for continental-scale vegetation monitoring, performance is often constrained by limited training data and strong spatial autocorrelation. Addressing these challenges is essential for building reliable, generalizable models. This study developed a Random Forest regression framework integrating Landsat imagery, Sentinel-2 training products, and environmental covariates to generate growing-season mesic vegetation and surface-water fractional cover from 1984 to present at 30-m resolution across the sagebrush biome. We evaluated (1) how training dataset size affects model accuracy, (2) how errors vary across ecoregions, and (3) whether incorporating spatial structure improves performance. Approximately 250,000 ecoregion-stratified training samples were derived from Sentinel-2 fractional cover data (2017â2020). Five subsets (56kâ250k points) were used to test sensitivity to training size. Two model configurations were compared: a spectral-only baseline and a spatially informed model including ecoregions and geographic coordinates. Independent validation samples were stratified by Level II ecoregions.
Results show that training dataset size strongly controls prediction accuracy. MAE exceeded 0.42 with 56k samples, declined substantially at 116k and 168k, and stabilized near 224k samples (MAE < 0.28), indicating a saturation point for continental-scale mapping. Spatial predictors produced modest improvements in overall error but enhanced ecological consistency across regions.
This study provides a scalable workflow for long-term mesic ecosystem monitoring. Using sufficiently large, ecologically representative training datasets and incorporating spatial structure are critical for accurate large-area mapping. The resulting 40-year products support conservation planning, hydrological monitoring, and long-term ecosystem assessment across the sagebrush biome.