| Title: | Species Distribution Modeling with H3 Grids |
|---|---|
| Description: | Provides tools for species distribution modeling using H3 hexagonal grids (Uber Technologies Inc., 2022, <https://h3geo.org>). Facilitates retrieval of species occurrence records, generation of H3 grids, computation of landscape metrics, and preparation of spatial data for modern species distribution models workflows. Designed for biodiversity and landscape ecology research. |
| Authors: | Manuel Spínola [aut, cre] |
| Maintainer: | Manuel Spínola <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 0.1.5 |
| Built: | 2026-06-05 20:04:00 UTC |
| Source: | https://github.com/ManuelSpinola/h3sdm |
A GeoTIFF with current bioclimatic variables for Costa Rica.
GeoTIFF file, readable with terra::rast().
This file is stored in inst/extdata/ and can be accessed with:
terra::rast(system.file("extdata", "bioclim_current.tif", package = "h3sdm"))
library(terra) bio <- terra::rast(system.file("extdata", "bioclim_current.tif", package = "h3sdm"))library(terra) bio <- terra::rast(system.file("extdata", "bioclim_current.tif", package = "h3sdm"))
A GeoTIFF with projected bioclimatic variables for Costa Rica.
GeoTIFF file, readable with terra::rast().
This dataset corresponds to the climate projection:
Model: INM-CM4-8
Scenario: SSP1-2.6
Period: 2021–2040
The file is stored in inst/extdata/ and can be accessed with:
terra::rast(system.file("extdata", "bioclim_future.tif", package = "h3sdm"))
library(terra) bio <- terra::rast(system.file("extdata", "bioclim_future.tif", package = "h3sdm"))library(terra) bio <- terra::rast(system.file("extdata", "bioclim_future.tif", package = "h3sdm"))
A simplified outline of Costa Rica as an sf object.
cr_outline_ccr_outline_c
An sf object containing polygon geometry of Costa Rica.
Adapted from publicly available geographic data.
library(sf) plot(cr_outline_c)library(sf) plot(cr_outline_c)
Estimates the Dissimilarity Index (DI) and the Area of Applicability
(AOA) for new data given the training data and a fitted model. This
function is designed to be applied directly to the output of
h3sdm_predict(), so that both the predicted values and the AOA
are available in a single sf object ready for mapping.
h3sdm_aoa(newdata, train, fit_object, cv = NULL, verbose = TRUE)h3sdm_aoa(newdata, train, fit_object, cv = NULL, verbose = TRUE)
newdata |
An |
train |
An |
fit_object |
The list returned by |
cv |
An |
verbose |
Logical. Should progress messages be printed?
Default |
The algorithm follows Meyer & Pebesma (2021). Predictor variables are
extracted automatically from the model formula inside
fit_object. They are standardized using z-score scaling
computed from train, then optionally weighted by variable
importance. The mean nearest-neighbor distance among training points
(trainDist_avrgmean) is used to normalize DI values. The AOA
threshold is the maximum cross-validated training DI after removing
outliers with Tukey's rule (Q3 + 1.5 * IQR). Locations with
DI <= threshold are inside the AOA; locations above the
threshold should be interpreted with caution.
Variable importance is extracted automatically for ranger and
xgboost models via vip::vi(). For GAM models, or when
importance cannot be extracted, all variables receive equal weight.
The input newdata sf object with two additional
columns:
DINumeric. Dissimilarity Index for each observation. Values near 0 indicate high similarity to the training data; larger values indicate increasing dissimilarity.
AOAInteger. 1 = inside the AOA;
0 = outside the AOA.
Meyer, H., Pebesma, E. (2021): Predicting into unknown space? Estimating the area of applicability of spatial prediction models. Methods in Ecology and Evolution 12: 1620–1633. doi:10.1111/2041-210X.13650
h3sdm_predict(), h3sdm_fit_model(), h3sdm_spatial_cv(),
h3sdm_data()
## Not run: cv <- h3sdm_spatial_cv(dat, method = "block") fit <- h3sdm_fit_model(workflow, cv) pred <- h3sdm_predict(fit, new_data = h7) result <- h3sdm_aoa(pred, train = dat, fit_object = fit, cv = cv) ## End(Not run)## Not run: cv <- h3sdm_spatial_cv(dat, method = "block") fit <- h3sdm_fit_model(workflow, cv) pred <- h3sdm_predict(fit, new_data = h7) result <- h3sdm_aoa(pred, train = dat, fit_object = fit, cv = cv) ## End(Not run)
Calculates 5 Information Theory (IT)-based landscape metrics (condent,
ent, joinent, mutinf, relmutinf) for each hexagon
in a given H3 hexagonal grid.
h3sdm_calculate_it_metrics(landscape_raster, sf_grid)h3sdm_calculate_it_metrics(landscape_raster, sf_grid)
landscape_raster |
A categorical SpatRaster containing land-cover data. |
sf_grid |
An |
This function computes landscape metrics using the landscapemetrics::sample_lsm() workflow.
The results are pivoted to a wide format for easy use.
An sf object containing the input hex grid with new columns for each calculated metric.
Hesselbarth et al., 2019. landscapemetrics: an open-source R tool to calculate landscape metrics. Ecography 42: 1648–1657.
Nowosad & Stepinski, 2019. Information theory as a consistent framework for landscape patterns. doi:10.1007/s10980-019-00830-x
library(sf) library(terra) # Create a categorical SpatRaster (land-cover map) landscape_raster <- terra::rast( nrows = 30, ncols = 30, xmin = -85.0, xmax = -83.0, ymin = 9.0, ymax = 11.0, crs = "EPSG:4326" ) terra::values(landscape_raster) <- sample(1:4, terra::ncell(landscape_raster), replace = TRUE) names(landscape_raster) <- "landcover" # Create a simple hexagon grid as sf polygons hex_grid <- sf::st_make_grid( sf::st_as_sfc(sf::st_bbox(c( xmin = -84.5, xmax = -83.5, ymin = 9.5, ymax = 10.5 ), crs = sf::st_crs(4326))), n = c(3, 3), square = FALSE ) sf_grid <- sf::st_sf(h3_address = paste0("hex_", seq_along(hex_grid)), geometry = hex_grid) # Calculate Information Theory (IT) landscape metrics per hexagon result_sf <- h3sdm_calculate_it_metrics(landscape_raster, sf_grid) head(result_sf)library(sf) library(terra) # Create a categorical SpatRaster (land-cover map) landscape_raster <- terra::rast( nrows = 30, ncols = 30, xmin = -85.0, xmax = -83.0, ymin = 9.0, ymax = 11.0, crs = "EPSG:4326" ) terra::values(landscape_raster) <- sample(1:4, terra::ncell(landscape_raster), replace = TRUE) names(landscape_raster) <- "landcover" # Create a simple hexagon grid as sf polygons hex_grid <- sf::st_make_grid( sf::st_as_sfc(sf::st_bbox(c( xmin = -84.5, xmax = -83.5, ymin = 9.5, ymax = 10.5 ), crs = sf::st_crs(4326))), n = c(3, 3), square = FALSE ) sf_grid <- sf::st_sf(h3_address = paste0("hex_", seq_along(hex_grid)), geometry = hex_grid) # Calculate Information Theory (IT) landscape metrics per hexagon result_sf <- h3sdm_calculate_it_metrics(landscape_raster, sf_grid) head(result_sf)
Converts continuous probability predictions into binary presence/absence based on a specified threshold.
h3sdm_classify(predictions_sf, threshold)h3sdm_classify(predictions_sf, threshold)
predictions_sf |
An |
threshold |
A numeric value representing the probability threshold
(e.g., |
This function is useful for converting continuous probability outputs into binary presence/absence data for mapping or model evaluation purposes.
An sf object with the same geometry and all original columns, plus a new
integer column predicted_presence with values 0 (absence) or 1 (presence).
## Not run: library(sf) library(dplyr) # Crear un sf de ejemplo df <- data.frame( id = 1:5, prediction = c(0.2, 0.6, 0.45, 0.8, 0.3), lon = c(-75, -74, -73, -72, -71), lat = c(10, 11, 12, 13, 14) ) df_sf <- st_as_sf(df, coords = c("lon", "lat"), crs = 4326) # Clasificar usando un umbral classified_sf <- h3sdm_classify(df_sf, threshold = 0.5) # Revisar resultados print(classified_sf) ## End(Not run)## Not run: library(sf) library(dplyr) # Crear un sf de ejemplo df <- data.frame( id = 1:5, prediction = c(0.2, 0.6, 0.45, 0.8, 0.3), lon = c(-75, -74, -73, -72, -71), lat = c(10, 11, 12, 13, 14) ) df_sf <- st_as_sf(df, coords = c("lon", "lat"), crs = 4326) # Clasificar usando un umbral classified_sf <- h3sdm_classify(df_sf, threshold = 0.5) # Revisar resultados print(classified_sf) ## End(Not run)
Computes and combines performance metrics for multiple species distribution models
created with h3sdm_fit_models() or similar workflows. Metrics include standard yardstick
metrics (ROC AUC, TSS, Boyce index, etc.). Returns a tibble summarizing model performance.
h3sdm_compare_models(h3sdm_results)h3sdm_compare_models(h3sdm_results)
h3sdm_results |
A list or workflow set containing fitted models with a |
A tibble with one row per model per metric, containing:
Model name
Metric name (ROC AUC, TSS, Boyce, etc.)
Metric type (usually "binary")
Metric value
# Minimal reproducible example example_metrics <- tibble::tibble( model = c("model1", "model2"), .metric = c("roc_auc", "tss_max"), .estimator = c("binary", "binary"), mean = c(0.85, 0.7) ) example_results <- list(metrics = example_metrics) h3sdm_compare_models(example_results)# Minimal reproducible example example_metrics <- tibble::tibble( model = c("model1", "model2"), .metric = c("roc_auc", "tss_max"), .estimator = c("binary", "binary"), mean = c(0.85, 0.7) ) example_results <- list(metrics = example_metrics) h3sdm_compare_models(example_results)
Takes a user-provided dataset with presence records (from Excel, fieldwork,
acoustic detections, camera traps, or any other source) and generates a
hexagonal grid with counts (species richness, total detections, or individuals)
ready for analysis with h3sdm. The input can be a data.frame with
coordinate columns or an sf object. Coordinates are assumed to be in
WGS84 (EPSG:4326).
h3sdm_count_from_records( records, aoi_sf, res = 7, expand_factor = 0.1, lon_col = "x", lat_col = "y", species_col = NULL, count_type = c("richness", "detections", "individuals"), presence_col = NULL, abundance_col = NULL, confidence_col = NULL, confidence_threshold = NULL, date_col = NULL, date_min = NULL, date_max = NULL )h3sdm_count_from_records( records, aoi_sf, res = 7, expand_factor = 0.1, lon_col = "x", lat_col = "y", species_col = NULL, count_type = c("richness", "detections", "individuals"), presence_col = NULL, abundance_col = NULL, confidence_col = NULL, confidence_threshold = NULL, date_col = NULL, date_min = NULL, date_max = NULL )
records |
|
aoi_sf |
|
res |
|
expand_factor |
|
lon_col |
|
lat_col |
|
species_col |
|
count_type |
|
presence_col |
|
abundance_col |
|
confidence_col |
|
confidence_threshold |
|
date_col |
|
date_min |
|
date_max |
|
sf object with columns:
H3 index of the hexagon.
Numeric count per hexagon (richness, detections, or individuals).
MULTIPOLYGON of each hexagon.
data(cr_outline_c, package = "h3sdm") my_records <- data.frame( x = c(-84.1, -84.2, -83.9, -84.0, -84.1), y = c(9.9, 10.1, 9.8, 9.95, 10.0), Especie = c("Ara macao", "Ara macao", "Pharomachrus mocinno", "Tapirus bairdii", "Ara macao"), Presencia = c(1, 1, 1, 1, 0) ) richness_hex <- h3sdm_count_from_records( records = my_records, aoi_sf = cr_outline_c, res = 7, lon_col = "x", lat_col = "y", species_col = "Especie", count_type = "richness", presence_col = "Presencia" )data(cr_outline_c, package = "h3sdm") my_records <- data.frame( x = c(-84.1, -84.2, -83.9, -84.0, -84.1), y = c(9.9, 10.1, 9.8, 9.95, 10.0), Especie = c("Ara macao", "Ara macao", "Pharomachrus mocinno", "Tapirus bairdii", "Ara macao"), Presencia = c(1, 1, 1, 1, 0) ) richness_hex <- h3sdm_count_from_records( records = my_records, aoi_sf = cr_outline_c, res = 7, lon_col = "x", lat_col = "y", species_col = "Especie", count_type = "richness", presence_col = "Presencia" )
Combines species presence–absence data with environmental predictors. It also calculates centroid coordinates (x and y) for each hexagon grid cell.
h3sdm_data(pa_sf, predictors_sf)h3sdm_data(pa_sf, predictors_sf)
pa_sf |
An |
predictors_sf |
An |
An sf object containing species presence–absence, environmental predictor variables,
and centroid coordinates for each hexagon cell.
## Not run: my_species_pa <- h3sdm_pa("Panthera onca", res = 6) my_predictors <- h3sdm_predictors(my_species_pa) combined_data <- h3sdm_data(my_species_pa, my_predictors) ## End(Not run)## Not run: my_species_pa <- h3sdm_pa("Panthera onca", res = 6) my_predictors <- h3sdm_predictors(my_species_pa) combined_data <- h3sdm_data(my_species_pa, my_predictors) ## End(Not run)
Computes a set of performance metrics for a single fitted species distribution model.
Includes standard yardstick metrics such as ROC AUC, accuracy, sensitivity,
specificity, F1-score, Kappa, as well as ecological metrics such as the
True Skill Statistic (TSS) and Boyce index.
This function is designed as a helper for evaluating models produced by
h3sdm_fit_model or h3sdm_fit_models.
h3sdm_eval_metrics( fitted_model, presence_data = NULL, truth_col = "presence", pred_col = ".pred_1" )h3sdm_eval_metrics( fitted_model, presence_data = NULL, truth_col = "presence", pred_col = ".pred_1" )
fitted_model |
A fitted model object, typically the output of |
presence_data |
Optional. An |
truth_col |
Character. Name of the column containing the true presence/absence values
(default |
pred_col |
Character. Name of the column containing predicted probabilities
(default |
This function centralizes model evaluation for a single fitted H3SDM model, combining both general classification metrics and ecological indices. It is especially useful for systematically comparing model performance across species or modeling approaches.
A tibble with one row per metric, containing:
Metric name (e.g., "roc_auc", "tss", "boyce").
Estimator type (usually "binary").
Metric value.
Standard error (NA for TSS and Boyce).
Lower bound of the 95% confidence interval (NA for TSS and Boyce).
Upper bound of the 95% confidence interval (NA for TSS and Boyce).
## Not run: # Assuming 'fitted' is the result of h3sdm_fit_model() metrics <- h3sdm_eval_metrics( fitted_model = fitted, presence_data = presence_sf, truth_col = "presence", pred_col = ".pred_1" ) print(metrics) ## End(Not run)## Not run: # Assuming 'fitted' is the result of h3sdm_fit_model() metrics <- h3sdm_eval_metrics( fitted_model = fitted, presence_data = presence_sf, truth_col = "presence", pred_col = ".pred_1" ) print(metrics) ## End(Not run)
Creates a DALEX explainer for a species distribution model fitted
with h3sdm_fit_model(). Prepares response and predictor variables,
ensuring that all columns used during model training (including h3_address
and coordinates) are included. The explainer can be used for feature
importance, model residuals, and other DALEX diagnostics.
h3sdm_explain(model, data, response = "presence", label = "h3sdm workflow")h3sdm_explain(model, data, response = "presence", label = "h3sdm workflow")
model |
A fitted workflow returned by |
data |
A |
response |
Character string specifying the name of the response column.
Must be a binary factor or numeric vector (0/1). Defaults to |
label |
Character string specifying a label for the explainer. Defaults
to |
An object of class explainer from the DALEX package, ready to be
used with feature_importance(), model_performance(), predict_parts(),
and other DALEX functions.
library(h3sdm) library(DALEX) library(parsnip) dat <- data.frame( x1 = rnorm(20), x2 = rnorm(20), presence = factor(sample(0:1, 20, replace = TRUE)) ) model <- logistic_reg() |> fit(presence ~ x1 + x2, data = dat) explainer <- h3sdm_explain(model, data = dat, response = "presence") feature_importance(explainer)library(h3sdm) library(DALEX) library(parsnip) dat <- data.frame( x1 = rnorm(20), x2 = rnorm(20), presence = factor(sample(0:1, 20, replace = TRUE)) ) model <- logistic_reg() |> fit(presence ~ x1 + x2, data = dat) explainer <- h3sdm_explain(model, data = dat, response = "presence") feature_importance(explainer)
Extracts and calculates the area proportion of each land-use/land-cover (LULC)
category found within each input polygon of the sf_hex_grid. This function
is tailored for categorical rasters and ensures accurate, sub-pixel weighted statistics.
h3sdm_extract_cat(spat_raster_cat, sf_hex_grid, proportion = TRUE)h3sdm_extract_cat(spat_raster_cat, sf_hex_grid, proportion = TRUE)
spat_raster_cat |
A single-layer |
sf_hex_grid |
An |
proportion |
Logical. If |
The function uses a custom function with exactextractr::exact_extract to
perform three critical steps:
Filtering NA/NaN: Raster cells with missing values (NA) are explicitly
excluded from the calculation, preventing the creation of a _prop_NaN column.
Area Consolidation: It sums the coverage fractions for all fragments belonging to the same category within the same hexagon, which is essential when polygons have been clipped or fragmented.
Numerical Ordering: The final columns are explicitly sorted based on the
numerical value of the category (e.g., _prop_70 appears before _prop_80)
to correct the default alphanumeric sorting behavior of tidyr::pivot_wider.
An sf object identical to sf_hex_grid, but with new columns
appended for each categorical value found in the raster. Column names follow the
pattern <layer_name>_prop_<category_value>. Columns are numerically ordered
by the category value.
library(sf) library(terra) # Create a simple categorical SpatRaster lulc <- terra::rast( nrows = 20, ncols = 20, xmin = -85.0, xmax = -83.0, ymin = 9.0, ymax = 11.0, crs = "EPSG:4326" ) terra::values(lulc) <- sample(1:4, terra::ncell(lulc), replace = TRUE) names(lulc) <- "landuse" # Define categorical levels explicitly levels(lulc) <- data.frame( value = 1:4, class = c("forest", "grassland", "urban", "water") ) # Create a simple hexagon grid as sf polygons (smaller than raster extent) hex_grid <- sf::st_make_grid( sf::st_as_sfc(sf::st_bbox(c( xmin = -84.5, xmax = -83.5, ymin = 9.5, ymax = 10.5 ), crs = sf::st_crs(4326))), n = c(3, 3), square = FALSE ) h7 <- sf::st_sf(h3_address = paste0("hex_", seq_along(hex_grid)), geometry = hex_grid) # Extract categorical raster values by hexagon lulc_p <- h3sdm_extract_cat(lulc, h7, proportion = TRUE) head(lulc_p)library(sf) library(terra) # Create a simple categorical SpatRaster lulc <- terra::rast( nrows = 20, ncols = 20, xmin = -85.0, xmax = -83.0, ymin = 9.0, ymax = 11.0, crs = "EPSG:4326" ) terra::values(lulc) <- sample(1:4, terra::ncell(lulc), replace = TRUE) names(lulc) <- "landuse" # Define categorical levels explicitly levels(lulc) <- data.frame( value = 1:4, class = c("forest", "grassland", "urban", "water") ) # Create a simple hexagon grid as sf polygons (smaller than raster extent) hex_grid <- sf::st_make_grid( sf::st_as_sfc(sf::st_bbox(c( xmin = -84.5, xmax = -83.5, ymin = 9.5, ymax = 10.5 ), crs = sf::st_crs(4326))), n = c(3, 3), square = FALSE ) h7 <- sf::st_sf(h3_address = paste0("hex_", seq_along(hex_grid)), geometry = hex_grid) # Extract categorical raster values by hexagon lulc_p <- h3sdm_extract_cat(lulc, h7, proportion = TRUE) head(lulc_p)
Calculates the area-weighted mean value for each layer in a numeric
SpatRaster (or single layer) within each polygon feature of an sf object.
This function is designed to efficiently summarize continuous environmental variables
(such as bioclimatic data) for predefined spatial units (e.g., H3 hexagons).
It utilizes exactextractr to ensure highly precise zonal statistics by
accounting for sub-pixel coverage fractions.
h3sdm_extract_num(spat_raster_multi, sf_hex_grid)h3sdm_extract_num(spat_raster_multi, sf_hex_grid)
spat_raster_multi |
A |
sf_hex_grid |
An |
The function relies on exactextractr::exact_extract with fun = "weighted_mean"
and weights = "area". This methodology is crucial for maintaining spatial
accuracy when polygons are irregular or small relative to the raster resolution.
A critical check (nrow match) is performed before binding columns to ensure
data integrity
and prevent misalignment errors.
An sf object identical to sf_hex_grid, but with new columns
appended. The new column names match the original SpatRaster layer names.
The values represent the area-weighted mean for that variable within each polygon.
library(sf) library(terra) # Create a SpatRaster stack with two numeric layers (e.g., bioclimatic variables) bio1 <- terra::rast( nrows = 10, ncols = 10, xmin = -84.5, xmax = -83.5, ymin = 9.5, ymax = 10.5, crs = "EPSG:4326" ) bio2 <- bio1 terra::values(bio1) <- runif(terra::ncell(bio1), 15, 30) terra::values(bio2) <- runif(terra::ncell(bio2), 500, 3000) names(bio1) <- "bio1_temp" names(bio2) <- "bio12_precip" bio <- c(bio1, bio2) # Create a simple hexagon grid as sf polygons hex_grid <- sf::st_make_grid( sf::st_as_sfc(sf::st_bbox(c( xmin = -84.5, xmax = -83.5, ymin = 9.5, ymax = 10.5 ), crs = sf::st_crs(4326))), n = c(3, 3), square = FALSE ) h7 <- sf::st_sf(h3_address = paste0("hex_", seq_along(hex_grid)), geometry = hex_grid) # Extract numeric raster values by hexagon (mean per cell) bio_p <- h3sdm_extract_num(bio, h7) head(bio_p)library(sf) library(terra) # Create a SpatRaster stack with two numeric layers (e.g., bioclimatic variables) bio1 <- terra::rast( nrows = 10, ncols = 10, xmin = -84.5, xmax = -83.5, ymin = 9.5, ymax = 10.5, crs = "EPSG:4326" ) bio2 <- bio1 terra::values(bio1) <- runif(terra::ncell(bio1), 15, 30) terra::values(bio2) <- runif(terra::ncell(bio2), 500, 3000) names(bio1) <- "bio1_temp" names(bio2) <- "bio12_precip" bio <- c(bio1, bio2) # Create a simple hexagon grid as sf polygons hex_grid <- sf::st_make_grid( sf::st_as_sfc(sf::st_bbox(c( xmin = -84.5, xmax = -83.5, ymin = 9.5, ymax = 10.5 ), crs = sf::st_crs(4326))), n = c(3, 3), square = FALSE ) h7 <- sf::st_sf(h3_address = paste0("hex_", seq_along(hex_grid)), geometry = hex_grid) # Extract numeric raster values by hexagon (mean per cell) bio_p <- h3sdm_extract_num(bio, h7) head(bio_p)
Fits a Species Distribution Model (SDM) workflow to resampling data (cross-validation). This function is the main training step and optionally configures the results to be used with the 'stacks' package. Supports both classification (presence/absence) and regression (count-based) models, detected automatically from the workflow mode.
h3sdm_fit_model( workflow, data_split, presence_data = NULL, truth_col = NULL, pred_col = NULL, for_stacking = FALSE, ... )h3sdm_fit_model( workflow, data_split, presence_data = NULL, truth_col = NULL, pred_col = NULL, for_stacking = FALSE, ... )
workflow |
A 'workflow' object from tidymodels (e.g., GAM or Random Forest). |
data_split |
An 'rsplit' or 'rset' object (e.g., result of vfold_cv or spatial_block_cv). |
presence_data |
(Optional) Original presence data (used for extended metrics). |
truth_col |
Column name of the response variable. Defaults to |
pred_col |
Column name for the prediction of the class of interest. Defaults to
|
for_stacking |
Logical. If |
... |
Arguments passed on to other functions (e.g., to |
A list with three elements:
cv_model: The result of fit_resamples().
final_model: The model fitted to the entire training set (first split).
metrics: Extended evaluation metrics (if presence_data is provided).
Fits one or more species distribution models using tidymodels workflows and a specified resampling scheme, then computes standard metrics (ROC AUC, accuracy, sensitivity, specificity, F1-score, Kappa) along with TSS (True Skill Statistic) and the Boyce index for model evaluation. Returns both the fitted models and a comparative metrics table.
h3sdm_fit_models( workflows, data_split, presence_data = NULL, truth_col = "presence", pred_col = ".pred_1" )h3sdm_fit_models( workflows, data_split, presence_data = NULL, truth_col = "presence", pred_col = ".pred_1" )
workflows |
A named list of tidymodels workflows created with |
data_split |
A resampling object (e.g., from |
presence_data |
An |
truth_col |
Character. Name of the column containing true presence/absence values (default |
pred_col |
Character. Name of the column containing predicted probabilities (default |
A list with two elements:
A list of fitted models returned by h3sdm_fit_model().
A tibble with one row per model per metric, including standard yardstick metrics, TSS, and Boyce index.
## Not run: # Example requires prepared recipes and resampling objects mod_log <- logistic_reg() %>% set_engine("glm") %>% set_mode("classification") mod_rf <- rand_forest() %>% set_engine("ranger") %>% set_mode("classification") workflows_list <- list( logistic = h3sdm_workflow(mod_log, my_recipe), rf = h3sdm_workflow(mod_rf, my_recipe) ) results <- h3sdm_fit_models( workflows = workflows_list, data_split = my_cv_folds, presence_data = presence_sf ) metrics_table <- results$metrics ## End(Not run)## Not run: # Example requires prepared recipes and resampling objects mod_log <- logistic_reg() %>% set_engine("glm") %>% set_mode("classification") mod_rf <- rand_forest() %>% set_engine("ranger") %>% set_mode("classification") workflows_list <- list( logistic = h3sdm_workflow(mod_log, my_recipe), rf = h3sdm_workflow(mod_rf, my_recipe) ) results <- h3sdm_fit_models( workflows = workflows_list, data_split = my_cv_folds, presence_data = presence_sf ) metrics_table <- results$metrics ## End(Not run)
Crea una cuadrícula de hexágonos H3 que cubre un área de interés (sf_object),
asegurando que las celdas se ajusten a la extensión del área y se recorten
opcionalmente al contorno del AOI.
Esta función es equivalente a la usada en los módulos de paisaje de h3sdm,
pero con el nombre estandarizado para mantener consistencia en el paquete.
h3sdm_get_grid(sf_object, res = 6, expand_factor = 0.1, clip_to_aoi = TRUE)h3sdm_get_grid(sf_object, res = 6, expand_factor = 0.1, clip_to_aoi = TRUE)
sf_object |
Objeto |
res |
Entero entre 1 y 16. Define la resolución del índice H3. Valores mayores producen hexágonos más pequeños. |
expand_factor |
Valor numérico que amplía ligeramente el bounding box
del AOI antes de generar los hexágonos. Por defecto |
clip_to_aoi |
Lógico ( |
Un objeto sf con los hexágonos H3 correspondientes al área de interés,
con geometrías válidas (MULTIPOLYGON).
## Not run: library(sf) # Crear un polígono de ejemplo cr <- st_as_sf(data.frame( lon = c(-85, -85, -83, -83, -85), lat = c(9, 11, 11, 9, 9) ), coords = c("lon", "lat"), crs = 4326) |> summarise(geometry = st_combine(geometry)) |> st_cast("POLYGON") # Generar cuadrícula H3 h5 <- h3sdm_get_grid(cr, res = 5) plot(st_geometry(h5)) ## End(Not run)## Not run: library(sf) # Crear un polígono de ejemplo cr <- st_as_sf(data.frame( lon = c(-85, -85, -83, -83, -85), lat = c(9, 11, 11, 9, 9) ), coords = c("lon", "lat"), crs = 4326) |> summarise(geometry = st_combine(geometry)) |> st_cast("POLYGON") # Generar cuadrícula H3 h5 <- h3sdm_get_grid(cr, res = 5) plot(st_geometry(h5)) ## End(Not run)
Downloads species occurrence records from providers (e.g., GBIF, iNaturalist,
BiodataCR) and filters them by the exact polygonal boundary of the Area of
Interest (AOI). Providers supported by spocc (e.g., "gbif",
"inat") are queried via spocc::occ(). "biodatacr" is
queried via rbiodatacr::bdcr_occurrences() and its output is
standardized to the same sf format.
h3sdm_get_records( species, aoi_sf, providers = NULL, limit = 500, remove_duplicates = FALSE, date = NULL )h3sdm_get_records( species, aoi_sf, providers = NULL, limit = 500, remove_duplicates = FALSE, date = NULL )
species |
Character string specifying the species name to query
(e.g., |
aoi_sf |
An |
providers |
Character vector of data providers to query.
Accepted values: any provider supported by |
limit |
Numeric. Maximum number of records to retrieve per provider. Default is 500. |
remove_duplicates |
Logical. If |
date |
Character vector specifying a date range
(e.g., |
When "biodatacr" is included in providers, the function calls
rbiodatacr::bdcr_occurrences() and standardizes its output
(decimalLatitude/decimalLongitude) to the same sf
geometry format used by the spocc providers. Records from all
providers are then combined and clipped to the AOI.
An sf object of points with the filtered occurrence records
whose geometry falls strictly within the aoi_sf boundary. If no
records are found, an empty sf object with the expected structure
is returned.
library(sf) aoi_sf <- sf::st_as_sf( data.frame( lon = c(-84.5, -83.5, -83.5, -84.5, -84.5), lat = c(9.5, 9.5, 10.5, 10.5, 9.5) ) |> {\(d) sf::st_sfc(sf::st_polygon(list(as.matrix(d))), crs = 4326)}(), data.frame(id = 1) ) # GBIF only records <- h3sdm_get_records( species = "Puma concolor", aoi_sf = aoi_sf, providers = "gbif", limit = 100 ) # GBIF + BiodataCR (Costa Rica) records_cr <- h3sdm_get_records( species = "Agalychnis callidryas", aoi_sf = aoi_sf, providers = c("gbif", "biodatacr"), limit = 200 )library(sf) aoi_sf <- sf::st_as_sf( data.frame( lon = c(-84.5, -83.5, -83.5, -84.5, -84.5), lat = c(9.5, 9.5, 10.5, 10.5, 9.5) ) |> {\(d) sf::st_sfc(sf::st_polygon(list(as.matrix(d))), crs = 4326)}(), data.frame(id = 1) ) # GBIF only records <- h3sdm_get_records( species = "Puma concolor", aoi_sf = aoi_sf, providers = "gbif", limit = 100 ) # GBIF + BiodataCR (Costa Rica) records_cr <- h3sdm_get_records( species = "Agalychnis callidryas", aoi_sf = aoi_sf, providers = c("gbif", "biodatacr"), limit = 200 )
This function downloads occurrence records for one or more species and counts the number of records falling inside each H3 hexagon covering the specified Area of Interest (AOI).
h3sdm_get_records_by_hexagon( species, aoi_sf, res = 6, providers = NULL, remove_duplicates = FALSE, date = NULL, expand_factor = 0.1, limit = 500 )h3sdm_get_records_by_hexagon( species, aoi_sf, res = 6, providers = NULL, remove_duplicates = FALSE, date = NULL, expand_factor = 0.1, limit = 500 )
species |
Character vector of species names to query (e.g., |
aoi_sf |
An |
res |
Numeric. H3 resolution level (default 6), determining hexagon size. |
providers |
Character vector of data providers (e.g., "gbif"). If |
remove_duplicates |
Logical. If |
date |
Character vector specifying a date range (e.g., |
expand_factor |
Numeric. Factor to expand the AOI bounding box before generating the H3 grid. Default is 0.1. |
limit |
Numeric. Maximum number of records to retrieve per species per provider. Default is 500. |
Download Species Records and Count Occurrences per H3 Hexagon
For each species:
An H3 grid is generated across the AOI using h3sdm_get_grid().
Occurrence records are downloaded using h3sdm_get_records().
Points are joined to the hexagonal grid with sf::st_join().
Counts of points per hexagon are calculated.
Counts are merged into the main hex grid.
The function ensures column names derived from species names are safe in R by replacing spaces with underscores and handles API failures gracefully.
An sf object containing the H3 hexagonal grid (MULTIPOLYGON) with
additional integer columns for each species (spaces replaced by underscores) showing
the count of occurrence records in each hexagon. Hexagons with no records have 0.
h3sdm_get_grid, h3sdm_get_records
library(sf) # Create a simple AOI polygon in Costa Rica aoi_sf <- sf::st_as_sf( data.frame(id = 1), geometry = sf::st_sfc( sf::st_polygon(list(matrix( c(-84.5, 9.5, -83.5, 9.5, -83.5, 10.5, -84.5, 10.5, -84.5, 9.5), ncol = 2, byrow = TRUE ))), crs = 4326 ) ) hex_counts <- h3sdm_get_records_by_hexagon( species = c("Agalychnis callidryas", "Smilisca baudinii"), aoi_sf = aoi_sf, res = 7, providers = "gbif", limit = 100 ) print(hex_counts)library(sf) # Create a simple AOI polygon in Costa Rica aoi_sf <- sf::st_as_sf( data.frame(id = 1), geometry = sf::st_sfc( sf::st_polygon(list(matrix( c(-84.5, 9.5, -83.5, 9.5, -83.5, 10.5, -84.5, 10.5, -84.5, 9.5), ncol = 2, byrow = TRUE ))), crs = 4326 ) ) hex_counts <- h3sdm_get_records_by_hexagon( species = c("Agalychnis callidryas", "Smilisca baudinii"), aoi_sf = aoi_sf, res = 7, providers = "gbif", limit = 100 ) print(hex_counts)
Combines presence hexagons with pseudo-absences sampled in environmental space. Pseudo-absences are selected by clustering the environmental conditions of hexagons without presence records using k-means, then choosing the hexagon closest to each cluster centroid. This ensures pseudo-absences cover the full range of environmental conditions available in the AOI, reducing bias from spatially clustered occurrence records.
h3sdm_pa(pres_sf, predictors_sf, n_pseudoabs = 500, buffer_k = 1L)h3sdm_pa(pres_sf, predictors_sf, n_pseudoabs = 500, buffer_k = 1L)
pres_sf |
|
predictors_sf |
|
n_pseudoabs |
|
buffer_k |
|
The function scales all numeric predictor columns before clustering. Non-numeric columns and columns with zero variance are excluded from clustering. Pseudo-absences are selected as the hexagon nearest to each k-means centroid in scaled environmental space (Euclidean distance).
This function is designed to be used after h3sdm_pres() and
h3sdm_predictors() in the following workflow:
pres <- h3sdm_pres("Species name", aoi_sf, res = 7)
num_vars <- h3sdm_extract_num(raster_stack, grid)
predictors <- h3sdm_predictors(num_vars)
pa <- h3sdm_pa(pres, predictors, n_pseudoabs = 500)
sf object with columns:
h3_address: H3 index of the hexagon.
presence: factor with levels "0" (pseudo-absence) and
"1" (presence).
geometry: MULTIPOLYGON of each hexagon.
## Not run: data(cr_outline_c, package = "h3sdm") pres <- h3sdm_pres("Agalychnis callidryas", cr_outline_c, res = 7) grid <- h3sdm_get_grid(cr_outline_c, res = 7) num_vars <- h3sdm_extract_num(bio, grid) predictors <- h3sdm_predictors(num_vars) pa <- h3sdm_pa(pres, predictors, n_pseudoabs = 300) ## End(Not run)## Not run: data(cr_outline_c, package = "h3sdm") pres <- h3sdm_pres("Agalychnis callidryas", cr_outline_c, res = 7) grid <- h3sdm_get_grid(cr_outline_c, res = 7) num_vars <- h3sdm_extract_num(bio, grid) predictors <- h3sdm_predictors(num_vars) pa <- h3sdm_pa(pres, predictors, n_pseudoabs = 300) ## End(Not run)
Adapts a user-provided dataset with presence records (from personal fieldwork,
BiodataCR, or any other source) into a hexagonal presence/pseudo-absence dataset
ready for analysis with h3sdm. The input can be a data.frame with coordinate
columns or an sf object. Coordinates are assumed to be in WGS84 (EPSG:4326).
h3sdm_pa_from_records( records, aoi_sf, res = 6, n_pseudoabs = 500, expand_factor = 0.1, lon_col = "lon", lat_col = "lat", species_col = NULL, predictors_sf = NULL, geospatial_filter = TRUE, buffer_k = 1L )h3sdm_pa_from_records( records, aoi_sf, res = 6, n_pseudoabs = 500, expand_factor = 0.1, lon_col = "lon", lat_col = "lat", species_col = NULL, predictors_sf = NULL, geospatial_filter = TRUE, buffer_k = 1L )
records |
|
aoi_sf |
|
res |
|
n_pseudoabs |
|
expand_factor |
|
lon_col |
|
lat_col |
|
species_col |
|
predictors_sf |
|
geospatial_filter |
|
buffer_k |
|
sf object with columns:
H3 index of the hexagon.
Factor with levels "0" (pseudo-absence) and "1" (presence).
Species name (only if species_col is provided).
MULTIPOLYGON of each hexagon.
data(cr_outline_c, package = "h3sdm") my_records <- data.frame( lon = c(-84.1, -84.2, -83.9), lat = c(9.9, 10.1, 9.8), species = "Agalychnis callidryas" ) dataset <- h3sdm_pa_from_records( records = my_records, aoi_sf = cr_outline_c, res = 7, n_pseudoabs = 100, lon_col = "lon", lat_col = "lat", species_col = "species" )data(cr_outline_c, package = "h3sdm") my_records <- data.frame( lon = c(-84.1, -84.2, -83.9), lat = c(9.9, 10.1, 9.8), species = "Agalychnis callidryas" ) dataset <- h3sdm_pa_from_records( records = my_records, aoi_sf = cr_outline_c, res = 7, n_pseudoabs = 100, lon_col = "lon", lat_col = "lat", species_col = "species" )
Uses a fitted tidymodels workflow (from h3sdm_fit_model or a standalone workflow)
to predict species presence probabilities or counts on a new spatial H3 grid.
Automatically generates centroid coordinates (x and y) if missing.
The new_data must contain the same predictor variables as used in model training.
Model mode (classification or regression) is detected automatically.
h3sdm_predict(fit_object, new_data)h3sdm_predict(fit_object, new_data)
fit_object |
A fitted |
new_data |
An |
An sf object with the original geometry and a new column prediction containing
the predicted probability of presence (classification) or predicted count (regression)
for each hexagon.
h3sdm_fit_model(), h3sdm_aoa()
## Not run: # Predict presence probabilities on a new hex grid predictions_sf <- h3sdm_predict( fit_object = fitted_model, new_data = grid_sf ) ## End(Not run)## Not run: # Predict presence probabilities on a new hex grid predictions_sf <- h3sdm_predict( fit_object = fitted_model, new_data = grid_sf ) ## End(Not run)
This function merges predictor variables from multiple sf objects
into a single sf object. It preserves the geometry from the first
input and joins columns from the other sf objects using a common
key (h3_address or ID).
h3sdm_predictors(...)h3sdm_predictors(...)
... |
Two or more |
The function uses a left join based on the h3_address column if present,
otherwise it falls back to ID. Geometries from the right-hand side sf
objects are dropped to avoid conflicts, and the final geometry is cast
to MULTIPOLYGON.
An sf object containing the geometry of the first input and
all predictor columns from all provided sf objects.
## Not run: # Combine sf objects with different predictor types into one combined <- h3sdm_predictors(num_sf, cat_sf, it_sf) head(combined) ## End(Not run)## Not run: # Combine sf objects with different predictor types into one combined <- h3sdm_predictors(num_sf, cat_sf, it_sf) head(combined) ## End(Not run)
Generates a hexagonal grid over the AOI, downloads species occurrence records,
and assigns them to hexagons. Returns only hexagons with at least one presence
record. This is the first step of a two-stage workflow where pseudo-absences
are generated later using h3sdm_pa() after environmental variables
have been extracted with h3sdm_extract_num() and related functions.
h3sdm_pres( species, aoi_sf, res = 6, providers = NULL, remove_duplicates = FALSE, date = NULL, limit = 500, expand_factor = 0.1 )h3sdm_pres( species, aoi_sf, res = 6, providers = NULL, remove_duplicates = FALSE, date = NULL, limit = 500, expand_factor = 0.1 )
species |
|
aoi_sf |
|
res |
|
providers |
|
remove_duplicates |
|
date |
|
limit |
|
expand_factor |
|
Unlike h3sdm_pa(), this function does not sample pseudo-absences.
It is intended to be used as the first step of a workflow where environmental
variables are extracted for the full hexagonal grid before pseudo-absences
are selected in environmental space using h3sdm_pa().
sf object with columns:
h3_address: H3 index of the hexagon.
presence: integer column with value 1 for all returned hexagons.
geometry: MULTIPOLYGON of each hexagon.
## Not run: data(cr_outline_c, package = "h3sdm") pres <- h3sdm_pres("Agalychnis callidryas", cr_outline_c, res = 7) ## End(Not run)## Not run: data(cr_outline_c, package = "h3sdm") pres <- h3sdm_pres("Agalychnis callidryas", cr_outline_c, res = 7) ## End(Not run)
Prepares an sf object with H3 hexagonal data for modeling with the
tidymodels ecosystem. Extracts centroid coordinates, assigns appropriate
roles to the variables automatically, and returns a ready-to-use recipe for
modeling species distributions.
h3sdm_recipe(data, response_col = "presence")h3sdm_recipe(data, response_col = "presence")
data |
An |
response_col |
|
This function prepares spatial H3 grid data for species distribution modeling:
Extracts centroid coordinates (x and y) from MULTIPOLYGON geometries.
Removes the geometry column to create a purely tabular dataset for tidymodels.
Assigns roles to columns:
response_col → "outcome" (target variable)
h3_address → "id" (used for joining predictions later)
x and y → "spatial_predictor"
All other columns are assigned "predictor" role.
A tidymodels recipe object (class "h3sdm_recipe") ready for modeling.
## Not run: # Presence/absence model (default) sdm_recipe <- h3sdm_recipe(combined_data) # Count-based model sdm_recipe <- h3sdm_recipe(combined_data, response_col = "count") ## End(Not run)## Not run: # Presence/absence model (default) sdm_recipe <- h3sdm_recipe(combined_data) # Count-based model sdm_recipe <- h3sdm_recipe(combined_data, response_col = "count") ## End(Not run)
This function prepares an sf object for use in a Species Distribution
Model (SDM) workflow with the 'mgcv' GAM engine within the 'tidymodels'
ecosystem. Extracts centroid coordinates and assigns appropriate roles to
all variables, including the response variable and spatial coordinates.
h3sdm_recipe_gam(data, response_col = "presence")h3sdm_recipe_gam(data, response_col = "presence")
data |
An |
response_col |
|
Assigned Roles:
outcome: the column specified in response_col.
id: "h3_address" (cell identifier, not used for modeling).
predictor: all other variables, including x and y
for the GAM spatial smooth term (s(x, y, bs = "tp")).
A recipe object of class h3sdm_recipe_gam,
ready to be chained with additional preprocessing steps.
Other h3sdm_tools:
h3sdm_stack_fit(),
h3sdm_workflow_gam()
library(sf) library(recipes) set.seed(42) n <- 20 pts <- sf::st_as_sf( data.frame( h3_address = paste0("hex_", seq_len(n)), presence = sample(0:1, n, replace = TRUE), count = sample(0:9, n, replace = TRUE), bio1_temp = runif(n, 15, 30), bio12_precip = runif(n, 500, 3000) ), geometry = sf::st_sfc( lapply(seq_len(n), function(i) { sf::st_point(c(runif(1, -84.5, -83.5), runif(1, 9.5, 10.5))) }), crs = 4326 ) ) # Presence/absence model (default) gam_rec <- h3sdm_recipe_gam(pts) # Count-based model gam_rec <- h3sdm_recipe_gam(pts, response_col = "count")library(sf) library(recipes) set.seed(42) n <- 20 pts <- sf::st_as_sf( data.frame( h3_address = paste0("hex_", seq_len(n)), presence = sample(0:1, n, replace = TRUE), count = sample(0:9, n, replace = TRUE), bio1_temp = runif(n, 15, 30), bio12_precip = runif(n, 500, 3000) ), geometry = sf::st_sfc( lapply(seq_len(n), function(i) { sf::st_point(c(runif(1, -84.5, -83.5), runif(1, 9.5, 10.5))) }), crs = 4326 ) ) # Presence/absence model (default) gam_rec <- h3sdm_recipe_gam(pts) # Count-based model gam_rec <- h3sdm_recipe_gam(pts, response_col = "count")
Generates a spatially aware cross-validation split for species distribution modeling using H3 hexagonal grids. This helps avoid inflated model performance estimates caused by spatial autocorrelation, producing more robust model evaluation.
h3sdm_spatial_cv(data, method = "block", v = 5, ...)h3sdm_spatial_cv(data, method = "block", v = 5, ...)
data |
An |
method |
Character. The spatial resampling method to use:
|
v |
Integer. Number of folds (default = 5). |
... |
Additional arguments passed to the underlying |
Spatial cross-validation avoids overly optimistic performance estimates by ensuring that training and testing data are spatially separated.
"block": Divides the spatial domain into contiguous blocks.
"cluster": Groups locations into spatial clusters before splitting.
An rsplit object (from rsample) representing the spatial CV folds.
## Not run: # Example: Create spatial cross-validation splits for H3 data # Block spatial CV with default folds spatial_cv_block <- h3sdm_spatial_cv(combined_data, method = "block") # Cluster spatial CV with 10 folds spatial_cv_cluster <- h3sdm_spatial_cv(combined_data, method = "cluster", v = 10) ## End(Not run)## Not run: # Example: Create spatial cross-validation splits for H3 data # Block spatial CV with default folds spatial_cv_block <- h3sdm_spatial_cv(combined_data, method = "block") # Cluster spatial CV with 10 folds spatial_cv_cluster <- h3sdm_spatial_cv(combined_data, method = "cluster", v = 10) ## End(Not run)
This function combines the process of creating the model stack, optimizing the
weights (blend_predictions), and fitting the base models to the complete
training set (fit_members()) into a single step.
Warning: It does not follow the canonical tidymodels flow but is convenient.
It requires that the fitting results were generated using h3sdm_fit_model(..., for_stacking = TRUE).
h3sdm_stack_fit(..., non_negative = TRUE, metric = NULL)h3sdm_stack_fit(..., non_negative = TRUE, metric = NULL)
... |
List objects that are the result of |
non_negative |
Logical. If |
metric |
The metric used to optimize the combination of weights. |
A list containing two elements: blended_model (the stack after blending)
and final_model (a fully fitted model_stack object).
The final_model is ready for direct prediction with predict().
Other h3sdm_tools:
h3sdm_recipe_gam(),
h3sdm_workflow_gam()
Combines a model specification and a prepared recipe into a single tidymodels
workflow. This workflow is suitable for species distribution modeling using H3
hexagonal grids and can be directly fitted or cross-validated.
h3sdm_workflow(model_spec, recipe)h3sdm_workflow(model_spec, recipe)
model_spec |
A |
recipe |
A |
The function creates a workflow that combines preprocessing and modeling
steps. This encapsulation allows consistent model training and evaluation
with tidymodels functions like fit() or fit_resamples(), and is
particularly useful when applying multiple models in parallel.
Choosing the model mode:
For presence/absence data: use set_mode("classification").
For count data (species richness, detections, individuals):
use set_mode("regression").
A workflow object ready to be used for model fitting with fit()
or cross-validation.
## Not run: library(parsnip) # --- Presence/absence model --- rf_spec_pa <- rand_forest() %>% set_engine("ranger") %>% set_mode("classification") rec_pa <- h3sdm_recipe(combined_data) wf_pa <- h3sdm_workflow(model_spec = rf_spec_pa, recipe = rec_pa) # --- Count-based model --- rf_spec_count <- rand_forest() %>% set_engine("ranger") %>% set_mode("regression") rec_count <- h3sdm_recipe(combined_data, response_col = "count") wf_count <- h3sdm_workflow(model_spec = rf_spec_count, recipe = rec_count) ## End(Not run)## Not run: library(parsnip) # --- Presence/absence model --- rf_spec_pa <- rand_forest() %>% set_engine("ranger") %>% set_mode("classification") rec_pa <- h3sdm_recipe(combined_data) wf_pa <- h3sdm_workflow(model_spec = rf_spec_pa, recipe = rec_pa) # --- Count-based model --- rf_spec_count <- rand_forest() %>% set_engine("ranger") %>% set_mode("regression") rec_count <- h3sdm_recipe(combined_data, response_col = "count") wf_count <- h3sdm_workflow(model_spec = rf_spec_count, recipe = rec_count) ## End(Not run)
This function constructs a workflow object by combining a GAM model
specification (gen_additive_mod with the mgcv engine) with either
a recipe object or an explicit model formula.
It is optimized for Species Distribution Models (SDM) that use smooth splines,
ensuring that the specialized GAM formula (containing s() terms) is
correctly passed to the model, even when a recipe is provided for general
data preprocessing.
h3sdm_workflow_gam(gam_spec, recipe = NULL, formula = NULL)h3sdm_workflow_gam(gam_spec, recipe = NULL, formula = NULL)
gam_spec |
A |
recipe |
(Optional) A |
formula |
(Optional) A |
Formula Priority:
If only recipe is provided, the workflow uses the
recipe's implicit formula (e.g., outcome ~ .).
If recipe and formula are provided, the workflow
uses the recipe for data preprocessing but explicitly passes the
formula to the mgcv engine for fitting, enabling the use
of specialized terms like s(x, y).
Choosing the model mode and family:
For presence/absence data: use set_mode("classification").
The mgcv engine uses binomial() family by default.
For count data (species richness, detections, individuals):
use set_mode("regression") and specify
set_engine("mgcv", family = poisson()).
A workflow object, ready for fitting with fit() or
resampling with fit_resamples() or tune_grid().
Other h3sdm_tools:
h3sdm_recipe_gam(),
h3sdm_stack_fit()
## Not run: library(parsnip) # --- Presence/absence model (binomial) --- gam_spec_pa <- gen_additive_mod() %>% set_engine("mgcv") %>% set_mode("classification") gam_formula_pa <- presence ~ s(bio1) + s(bio12) + s(x, y, bs = "tp") base_rec_pa <- h3sdm_recipe_gam(data) h3sdm_wf_pa <- h3sdm_workflow_gam( gam_spec = gam_spec_pa, recipe = base_rec_pa, formula = gam_formula_pa ) # --- Count-based model (Poisson) --- gam_spec_count <- gen_additive_mod() %>% set_engine("mgcv", family = poisson()) %>% set_mode("regression") gam_formula_count <- count ~ s(bio1) + s(bio12) + s(x, y, bs = "tp") base_rec_count <- h3sdm_recipe_gam(data, response_col = "count") h3sdm_wf_count <- h3sdm_workflow_gam( gam_spec = gam_spec_count, recipe = base_rec_count, formula = gam_formula_count ) ## End(Not run)## Not run: library(parsnip) # --- Presence/absence model (binomial) --- gam_spec_pa <- gen_additive_mod() %>% set_engine("mgcv") %>% set_mode("classification") gam_formula_pa <- presence ~ s(bio1) + s(bio12) + s(x, y, bs = "tp") base_rec_pa <- h3sdm_recipe_gam(data) h3sdm_wf_pa <- h3sdm_workflow_gam( gam_spec = gam_spec_pa, recipe = base_rec_pa, formula = gam_formula_pa ) # --- Count-based model (Poisson) --- gam_spec_count <- gen_additive_mod() %>% set_engine("mgcv", family = poisson()) %>% set_mode("regression") gam_formula_count <- count ~ s(bio1) + s(bio12) + s(x, y, bs = "tp") base_rec_count <- h3sdm_recipe_gam(data, response_col = "count") h3sdm_wf_count <- h3sdm_workflow_gam( gam_spec = gam_spec_count, recipe = base_rec_count, formula = gam_formula_count ) ## End(Not run)
Creates a list of tidymodels workflows from multiple model specifications and a prepared recipe. This is useful for comparing different modeling approaches in species distribution modeling using H3 hexagonal grids.
h3sdm_workflows(model_specs, recipe)h3sdm_workflows(model_specs, recipe)
model_specs |
A named list of |
recipe |
A |
This function automates the creation of workflows for multiple model specifications. Each workflow combines the same preprocessing steps (recipe) with a different modeling method, facilitating systematic comparison of models.
Choosing the model mode:
For presence/absence data: use set_mode("classification")
for all model specifications.
For count data (species richness, detections, individuals):
use set_mode("regression") for all model specifications.
A named list of workflow objects, one per model specification.
## Not run: library(parsnip) # --- Presence/absence models --- specs_pa <- list( rf = rand_forest() %>% set_engine("ranger") %>% set_mode("classification"), glm = logistic_reg() %>% set_engine("glm") %>% set_mode("classification") ) rec_pa <- h3sdm_recipe(combined_data) wfs_pa <- h3sdm_workflows(model_specs = specs_pa, recipe = rec_pa) # --- Count-based models --- specs_count <- list( rf = rand_forest() %>% set_engine("ranger") %>% set_mode("regression"), xgb = boost_tree() %>% set_engine("xgboost") %>% set_mode("regression") ) rec_count <- h3sdm_recipe(combined_data, response_col = "count") wfs_count <- h3sdm_workflows(model_specs = specs_count, recipe = rec_count) ## End(Not run)## Not run: library(parsnip) # --- Presence/absence models --- specs_pa <- list( rf = rand_forest() %>% set_engine("ranger") %>% set_mode("classification"), glm = logistic_reg() %>% set_engine("glm") %>% set_mode("classification") ) rec_pa <- h3sdm_recipe(combined_data) wfs_pa <- h3sdm_workflows(model_specs = specs_pa, recipe = rec_pa) # --- Count-based models --- specs_count <- list( rf = rand_forest() %>% set_engine("ranger") %>% set_mode("regression"), xgb = boost_tree() %>% set_engine("xgboost") %>% set_mode("regression") ) rec_count <- h3sdm_recipe(combined_data, response_col = "count") wfs_count <- h3sdm_workflows(model_specs = specs_count, recipe = rec_count) ## End(Not run)
A dataset containing presence and pseudo-absence records for the species Silverstoneia flotator in Costa Rica, generated using H3 hexagonal grids at resolution 7.
recordsrecords
An sf object with columns:
H3 index of the hexagon
factor with levels "0" (pseudo-absence) and "1" (presence)
MULTIPOLYGON of each hexagon
Generated using h3sdm_pa() with occurrence data from GBIF
(https://www.gbif.org).
data(records) head(records) table(records$presence)data(records) head(records) table(records$presence)