--- title: "h3sdm workflow for a single model" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{h3sdm workflow for a single model} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", warning = FALSE, message = FALSE ) ``` ## Introduction This vignette demonstrates a complete workflow for species distribution modeling (SDM) for a single species using `h3sdm`. We cover data preparation, model fitting, spatial cross-validation, prediction, and feature importance. ## Load packages ```{r} library(h3sdm) library(sf) library(terra) library(dplyr) library(ggplot2) library(tidymodels) library(spatialsample) library(DALEX) library(themis) ``` ## 1. Define the Area of Interest ```{r} cr <- cr_outline_c ``` ## 2. Load Environmental Predictors ```{r} bio <- terra::rast(system.file("extdata", "bioclim_current.tif", package = "h3sdm")) names(bio) <- gsub(".*bio_", "bio", names(bio)) ``` ## 3. Create the Hexagonal Grid The hexagonal grid is the backbone of the `h3sdm` workflow. All subsequent steps — predictor extraction, presence assignment, and pseudo-absence generation — are built on top of it. The resolution controls the spatial scale of the analysis. At resolution 7, each hexagon covers approximately 514 ha, which is appropriate for many vertebrate species. ```{r} h7 <- h3sdm_get_grid(cr, res = 7) ``` ## 4. Prepare Predictors Environmental variables are extracted for every hexagon in the grid. Here we use three WorldClim bioclimatic variables: Bio1 (Annual Mean Temperature), Bio12 (Annual Precipitation), and Bio15 (Precipitation Seasonality). ```{r} bio_predictors <- h3sdm_extract_num(bio, h7) predictors <- h3sdm_predictors(bio_predictors) |> dplyr::select(h3_address, bio1, bio12, bio15, geometry) ``` ```{r} ggplot() + theme_minimal() + geom_sf(data = predictors, aes(fill = bio1)) + scale_fill_viridis_c(option = "B") ``` ## 5. Species Occurrence Data Presence hexagons are generated using `h3sdm_pres()`, which downloads occurrence records and assigns them to hexagons. Multiple records within the same hexagon are consolidated into a single presence, reducing spatial sampling bias. Since each hexagon represents an area (~514 ha at resolution 7), this better reflects how organisms actually occupy space. Pseudo-absences are then generated with `h3sdm_pa()`. They are placed outside the known distribution — excluding hexagons with presences and their immediate neighbors (`buffer_k = 1`) — and are selected using k-means clustering in environmental space. This ensures pseudo-absences represent the full range of environmental conditions available in the study area, not just their geographic distribution. Since there are approximately 100 presence hexagons, we request 300 pseudo-absences (3×). ```{r} pres <- h3sdm_pres("Silverstoneia flotator", cr, res = 7, limit = 10000) ``` ```{r} records <- h3sdm_pa(pres, predictors, n_pseudoabs = 300) ``` ```{r} table(records$presence) ``` ```{r} ggplot() + theme_minimal() + geom_sf(data = records, aes(fill = presence)) + scale_fill_manual( name = "Presence/Absence", values = c("red", "blue"), labels = c("Absence", "Presence") ) ``` ## 6. Combine Records and Predictors ```{r} dat <- h3sdm_data(records, predictors) ``` ## 7. Spatial Cross-Validation ```{r} scv <- h3sdm_spatial_cv(dat, v = 5, repeats = 1) autoplot(scv) + theme_minimal() ``` ## 8. Define Recipe and Model ```{r} receta <- h3sdm_recipe(dat) |> themis::step_downsample(presence) modelo <- parsnip::logistic_reg() |> parsnip::set_engine("glm") |> parsnip::set_mode("classification") ``` ## 9. Create Workflow ```{r} wf <- h3sdm_workflow(modelo, receta) ``` ## 10. Fit the Model ```{r} presence_data <- dat |> dplyr::filter(presence == 1) f <- h3sdm_fit_model(wf, scv, presence_data) ``` ## 11. Evaluate Model Performance ```{r} evaluation_metrics <- h3sdm_eval_metrics( fitted_model = f$cv_model, presence_data = presence_data ) evaluation_metrics ``` ```{r} ggplot(evaluation_metrics, aes(.metric, mean)) + theme_minimal() + geom_col(width = 0.03, color = "dodgerblue3", fill = "dodgerblue3") + geom_point(size = 3, color = "orange") + ylim(0, 1) ``` ## 12. Make Predictions ```{r} p <- h3sdm_predict(f, predictors) ``` ```{r} ggplot() + theme_minimal() + geom_sf(data = p, aes(fill = prediction)) + scale_fill_viridis_c(name = "Habitat\nsuitability", option = "B", direction = -1) ``` ## 13. Model Interpretation ```{r} e <- h3sdm_explain(f$final_model, data = dat) predictors_to_evaluate <- setdiff(names(e$data), c("h3_address", "x", "y", "presence")) var_imp <- DALEX::model_parts( explainer = e, variables = predictors_to_evaluate ) plot(var_imp) ``` ```{r} pdp_glm <- ingredients::partial_dependence(e, variables = c("bio12", "bio1", "bio15")) plot(pdp_glm) ``` ## 14. Future Scenario Predictions ```{r} bio_future <- terra::rast(system.file("extdata", "bioclim_future.tif", package = "h3sdm")) names(bio_future) <- c("bio1", "bio12", "bio15") bio_future_predictors <- h3sdm_extract_num(bio_future, h7) predictors_future <- h3sdm_predictors(bio_future_predictors) p_future <- h3sdm_predict(f, predictors_future) ``` ```{r} ggplot() + theme_minimal() + geom_sf(data = p_future, aes(fill = prediction)) + scale_fill_viridis_c(name = "Habitat\nsuitability", option = "B", direction = -1) ``` ## Conclusions This vignette demonstrated a complete SDM workflow using `h3sdm`, including data preparation with a hexagonal grid, presence assignment, environmentally stratified pseudo-absence generation, model fitting with spatial cross-validation, performance evaluation, predictions, and variable importance analysis for both current and future climate scenarios.