
Multi-pedon workflow: fitting retention curves at scale
Source:vignettes/multi-pedon-workflow.Rmd
multi-pedon-workflow.RmdOverview
In landscape-scale soil hydrology studies, Van Genuchten parameters must be fitted independently at each sampling location (pedon). With hundreds or thousands of pedons, sequential NLS fitting becomes a throughput bottleneck. This vignette demonstrates:
- Constructing a realistic multi-pedon dataset from synthetic field observations.
- Running
fit_swrc()sequentially and in parallel withworkers = parallel::detectCores(logical = FALSE). - Mapping the fitted saturated water content (θ_s) and shape parameter (n) back to pedon locations.
Setup
library(tidysoilwater)
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
library(ggplot2)1 Generate a synthetic pedon network
We simulate a grid of 200 pedons spread across a hypothetical study area. Each pedon is assigned one of four dominant soil textures (loam, sandy loam, clay loam, or silt loam) based on its position, mimicking the kind of spatial structure you would expect from a soil survey.
set.seed(2047)
# Van Genuchten parameters from Van Genuchten (1980, Table 1)
# Van Genuchten parameters for four textures with n > 1.5 — these are
# well-conditioned for NLS fitting. Soils with n close to 1 (e.g. clay,
# silty clay loam) have flat retention curves that make NLS poorly
# conditioned; a real workflow would require more robust starting values
# or a Bayesian approach for those classes.
vg_reference <- tribble(
~texture, ~theta_r, ~theta_s, ~alpha, ~n,
"sand", 0.045, 0.430, 0.145, 2.68,
"loamy sand", 0.057, 0.410, 0.124, 2.28,
"sandy loam", 0.065, 0.410, 0.075, 1.89,
"loam", 0.078, 0.430, 0.036, 1.56
)
# 200 pedons on a 20 × 10 spatial grid (arbitrary units)
pedons <- expand.grid(x = seq(1, 20), y = seq(1, 10)) |>
as_tibble() |>
mutate(
pedon_id = row_number(),
# Texture follows a simple catena: coarser soils in the west,
# finer (loam) in the east — a typical hillslope pattern
texture = case_when(
x <= 5 ~ "sand",
x <= 10 ~ "loamy sand",
x <= 15 ~ "sandy loam",
TRUE ~ "loam"
)
) |>
left_join(vg_reference, by = "texture")
glimpse(pedons)
#> Rows: 200
#> Columns: 8
#> $ x <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18…
#> $ y <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2…
#> $ pedon_id <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18…
#> $ texture <chr> "sand", "sand", "sand", "sand", "sand", "loamy sand", "loamy …
#> $ theta_r <dbl> 0.045, 0.045, 0.045, 0.045, 0.045, 0.057, 0.057, 0.057, 0.057…
#> $ theta_s <dbl> 0.43, 0.43, 0.43, 0.43, 0.43, 0.41, 0.41, 0.41, 0.41, 0.41, 0…
#> $ alpha <dbl> 0.145, 0.145, 0.145, 0.145, 0.145, 0.124, 0.124, 0.124, 0.124…
#> $ n <dbl> 2.68, 2.68, 2.68, 2.68, 2.68, 2.28, 2.28, 2.28, 2.28, 2.28, 1…2 Simulate field observations
Laboratory or field tensiometer measurements typically cover 8–12 pressure heads from saturation (h = 0) to permanent wilting point (h = 15 000 cm). We generate synthetic θ(h) observations by evaluating the VG model at each pedon’s true parameters, then adding realistic measurement noise (σ = 0.005 m³/m³).
h_obs <- c(0, 1, 5, 10, 50, 100, 300, 500, 1000, 3000, 5000, 15000)
# Each row of pedons becomes a set of (h, theta) observations
obs_data <- pedons |>
rowwise() |>
mutate(
# Slight spatial heterogeneity: jitter parameters within ±10 %
alpha_loc = alpha * runif(1, 0.90, 1.10),
n_loc = max(1.05, n * runif(1, 0.90, 1.10)),
theta_r_loc = theta_r * runif(1, 0.90, 1.10),
theta_s_loc = theta_s * runif(1, 0.90, 1.10),
obs = list({
theta_true <- swrc_van_genuchten(
tibble(h = h_obs),
theta_r = theta_r_loc, theta_s = theta_s_loc,
alpha = alpha_loc, n = n_loc,
h = h
)$.theta
tibble(
h = h_obs,
theta = pmax(0, theta_true + rnorm(length(h_obs), sd = 0.005))
)
})
) |>
tidyr::unnest(obs) |>
ungroup()
cat(sprintf("%d pedons × %d pressure heads = %d observations total\n",
n_distinct(obs_data$pedon_id),
length(h_obs),
nrow(obs_data)))
#> 200 pedons × 12 pressure heads = 2400 observations totalSpot-check two pedons from contrasting texture zones:
filter(obs_data, pedon_id %in% c(3, 178)) |>
mutate(label = paste0("Pedon ", pedon_id, " (", texture, ")")) |>
ggplot(aes(h + 0.1, theta)) +
geom_point(size = 2) +
scale_x_log10(labels = scales::label_log()) +
facet_wrap(~label) +
labs(
x = "Pressure head h (cm, log scale)",
y = expression(theta~(m^3/m^3)),
title = "Synthetic retention observations — two example pedons"
)
3 Fit Van Genuchten parameters per pedon
3.1 Sequential baseline
t_seq <- system.time(
fits_seq <- suppressWarnings(
obs_data |>
group_by(pedon_id, x, y, texture) |>
fit_swrc(theta_col = theta, h_col = h, workers = 1L)
)
)["elapsed"]
cat(sprintf("Sequential: %.2f s for %d pedons (%.1f ms / pedon)\n",
t_seq, nrow(fits_seq), t_seq / nrow(fits_seq) * 1000))
#> Sequential: 0.86 s for 200 pedons (4.3 ms / pedon)
cat(sprintf("Converged: %d / %d (%.0f %%)\n",
sum(fits_seq$convergence),
nrow(fits_seq),
mean(fits_seq$convergence) * 100))
#> Converged: 200 / 200 (100 %)3.2 Parallel fit with all physical cores
# R CMD check restricts parallel forking; cap to 2 in that environment
in_check <- isTRUE(as.logical(Sys.getenv("_R_CHECK_LIMIT_CORES_", "FALSE")))
n_cores <- if (in_check) 2L else parallel::detectCores(logical = FALSE)
cat(sprintf("Workers used: %d\n", n_cores))
#> Workers used: 4
t_par <- system.time(
fits_par <- suppressWarnings(
obs_data |>
group_by(pedon_id, x, y, texture) |>
fit_swrc(theta_col = theta, h_col = h, workers = n_cores)
)
)["elapsed"]
cat(sprintf("Parallel (%d workers): %.2f s — speedup %.1fx\n",
n_cores, t_par, t_seq / t_par))
#> Parallel (4 workers): 0.54 s — speedup 1.6x3.3 Verify consistency
The sequential and parallel paths run identical NLS fits; parameter estimates must match to machine precision.
fits_seq_ord <- arrange(fits_seq, pedon_id)
fits_par_ord <- arrange(fits_par, pedon_id)
max_diff_theta_s <- max(abs(fits_seq_ord$theta_s - fits_par_ord$theta_s),
na.rm = TRUE)
max_diff_alpha <- max(abs(fits_seq_ord$alpha - fits_par_ord$alpha),
na.rm = TRUE)
cat(sprintf("Max |Δθ_s| seq vs par: %.2e\n", max_diff_theta_s))
#> Max |Δθ_s| seq vs par: 0.00e+00
cat(sprintf("Max |Δα| seq vs par: %.2e\n", max_diff_alpha))
#> Max |Δα| seq vs par: 0.00e+00
stopifnot(max_diff_theta_s < 1e-8, max_diff_alpha < 1e-8)4 Map fitted parameters
With 200 pedons on a regular grid, we can visualise how the fitted θ_s and shape parameter n vary across the landscape.
fits_seq |>
filter(convergence) |>
ggplot(aes(x, y, fill = theta_s)) +
geom_tile(width = 0.95, height = 0.95) +
scale_fill_distiller(palette = "Blues", direction = 1,
name = expression(theta[s]~(m^3/m^3))) +
coord_equal() +
labs(
title = "Fitted saturated water content (θ_s) across the study area",
x = "Easting (arbitrary)", y = "Northing (arbitrary)"
) +
theme_minimal()
fits_seq |>
filter(convergence) |>
ggplot(aes(x, y, fill = n)) +
geom_tile(width = 0.95, height = 0.95) +
scale_fill_distiller(palette = "RdYlBu", direction = 1,
name = "n (−)") +
coord_equal() +
labs(
title = "Fitted Van Genuchten shape parameter n",
subtitle = "Sandy soils (left) have higher n than clay soils (centre-right)",
x = "Easting (arbitrary)", y = "Northing (arbitrary)"
) +
theme_minimal()
5 Spatial integration with sf (optional)
When pedon coordinates are in a real CRS, the fitted parameters can
be attached directly to an sf point layer for further
geostatistical analysis or export as GeoPackage / Shapefile.
pedons_sf <- fits_seq |>
filter(convergence) |>
sf::st_as_sf(coords = c("x", "y"), crs = NA_integer_)
# Inspect: each feature carries fitted VG parameters
print(pedons_sf[1:4, c("texture", "theta_r", "theta_s", "alpha", "n")])
#> Simple feature collection with 4 features and 5 fields
#> Geometry type: POINT
#> Dimension: XY
#> Bounding box: xmin: 1 ymin: 1 xmax: 4 ymax: 1
#> CRS: NA
#> # A tibble: 4 × 6
#> texture theta_r theta_s alpha n geometry
#> <chr> <dbl> <dbl> <dbl> <dbl> <POINT>
#> 1 sand 0.0471 0.429 0.157 2.52 (1 1)
#> 2 sand 0.0431 0.393 0.148 2.49 (2 1)
#> 3 sand 0.0450 0.456 0.142 2.71 (3 1)
#> 4 sand 0.0446 0.445 0.143 2.83 (4 1)With a real CRS you can then write:
sf::st_write(pedons_sf, "fitted_vg_params.gpkg")6 Real-world data sources
In practice, the obs_data tibble would be built from one
of these sources:
| Source | Format | Typical h range | θ unit |
|---|---|---|---|
| Hypres / UNSODA | CSV | 0–15 000 cm | m³/m³ |
| SoilGrids point extraction | JSON / raster | — | volumetric fraction |
| Field tensiometer loggers | CSV / NetCDF | 0–800 hPa | m³/m³ or % |
| Laboratory pressure-plate | CSV | 0–15 000 cm | m³/m³ |
Convert pressure in hPa to cm water head with the approximation 1 hPa ≈ 1.02 cm H₂O.
Summary
| Step | Code |
|---|---|
| Generate observations | swrc_van_genuchten() per pedon |
| Sequential fit (baseline) | group_by(pedon_id) |> fit_swrc(workers = 1) |
| Parallel fit | fit_swrc(workers = parallel::detectCores(logical = FALSE)) |
| Attach to spatial layer | sf::st_as_sf(fits, coords = c(‘x’,‘y’), crs = …) |
| Export | sf::st_write(pedons_sf, ‘fitted_vg.gpkg’) |