Pedotransfer functions: estimating VG parameters from soil texture • tidysoilwater

library(tidysoilwater)
library(dplyr)
library(tibble)
library(ggplot2)

What are pedotransfer functions?

Pedotransfer functions (PTFs) are empirical regression equations that estimate soil hydraulic parameters from readily available soil properties: sand/silt/clay fractions, organic matter content, and bulk density. They allow hydraulic characterisation of soils where no retention measurements exist — the common case in most landscape-scale studies.

This vignette implements the Wösten et al. (1999) continuous PTFs for the Van Genuchten parameter set (θ_r, θ_s, α, n, K_sat) using European topsoil data.

Wösten, J. H. M., Lilly, A., Nemes, A., & Le Bas, C. (1999). Development and use of a database of hydraulic properties of European soils. Geoderma, 90(3–4), 169–185. https://doi.org/10.1016/S0016-7061(98)00132-3

1. The Wösten (1999) PTF equations

These PTFs use sand (%), silt (%), clay (%), organic matter (% OM), and bulk density (g/cm³, BD) as predictors.

#' Apply Wösten (1999) continuous PTFs to estimate Van Genuchten parameters.
#'
#' @param sand    Sand content (%)
#' @param silt    Silt content (%)
#' @param clay    Clay content (%)
#' @param om      Organic matter content (%). If unknown, use ~1.0 for
#'                mineral topsoils and ~0.5 for subsoils.
#' @param bd      Bulk density (g/cm³). Typical range 1.0–1.6.
#' @param topsoil Logical; TRUE for A-horizon, FALSE for B/C horizons.
#' @return Tibble with one row of estimated VG parameters.
wosten_ptf <- function(sand, silt, clay, om, bd, topsoil = TRUE) {
  # Notation follows Wösten (1999) Table 4 (continuous PTF)
  ts <- as.integer(topsoil)

  # Saturated water content (cm³/cm³)
  theta_s <- 0.7919 + 0.001691 * clay - 0.29619 * bd -
             0.000001491 * silt^2 + 0.0000821 * om^2 +
             0.02427 / clay + 0.01113 / silt + 0.01472 * log(silt) -
             0.0000733 * om * clay - 0.000619 * bd * clay -
             0.001183 * bd * om - 0.0001664 * topsoil * silt

  # Log alpha (alpha in 1/cm)
  log_alpha <- -14.96 + 0.03135 * clay + 0.0351 * silt +
               0.646 * om + 15.29 * bd - 0.192 * ts -
               4.671 * bd^2 - 0.000781 * clay^2 - 0.00957 * om^2 +
               0.00000235 * silt^2 - 0.221 * om / clay +
               0.02264 * bd * clay + 0.00036 * silt * om -
               0.0791 * bd / om - 0.04991 * silt * bd +
               0.0009936 * silt * om - 0.0002698 * sand * clay -
               0.07215 * log(om)
  alpha <- exp(log_alpha)

  # Log n
  log_n <- -25.23 - 0.02195 * clay + 0.0074 * silt -
           0.1794 * om + 45.5 * bd - 7.24 * bd^2 +
           0.0003658 * clay^2 + 0.002885 * om^2 -
           12.81 / bd - 0.1524 / silt - 0.01958 / om -
           0.2876 * log(silt) - 0.0709 * log(om) -
           44.6 * log(bd) - 0.02264 * bd * clay +
           0.00896 * silt * om + 0.00718 * sand * om
  n <- exp(log_n)

  # Log K_sat (cm/day)
  log_ks <- 7.341 + 0.092 * clay - 0.0031 * silt^2 +
            0.0003881 * clay^2 - 0.3899 * bd -
            0.00251 * silt * clay + 0.0003048 * silt^2 -
            0.001 * clay^2 + 0.0023 * silt - 0.04673 * om +
            0.01559 * ts + 0.00959 * log(silt) - 0.08741 * log(om)
  ks <- exp(log_ks)

  tibble::tibble(
    theta_r = 0,          # Wösten (1999) sets theta_r = 0 for continuous PTF
    theta_s = theta_s,
    alpha   = alpha,
    n       = n,
    ks      = ks
  )
}

Note: The Wösten (1999) continuous PTF sets θ_r = 0, which is appropriate for the equation set but differs from the full Van Genuchten formulation. Use it as-is for comparative purposes; for fitting, a small positive θ_r can be assumed (e.g. 0.02–0.05 for mineral soils).

2. Apply PTF to a soil survey dataset

# Representative soil survey data for five profiles
survey <- tribble(
  ~site,             ~sand, ~silt, ~clay, ~om,  ~bd,
  "Sandy ridge",       78,    12,    10,  1.5,  1.55,
  "Sandy loam slope",  62,    22,    16,  2.1,  1.48,
  "Loam terrace",      42,    38,    20,  2.8,  1.40,
  "Clay loam flat",    28,    38,    34,  3.2,  1.30,
  "Heavy clay valley", 12,    28,    60,  4.5,  1.18
)

# Apply PTF row by row (small dataset — purrr approach)
ptf_params <- purrr::pmap_dfr(
  survey,
  function(site, sand, silt, clay, om, bd) {
    ptf <- wosten_ptf(sand, silt, clay, om, bd, topsoil = TRUE)
    bind_cols(tibble(site = site), ptf)
  }
) |>
  # The VG model requires n > 1. The Wösten (1999) PTF can predict n ≤ 1 for
  # heavy clay soils; clamp to 1.001 as a practical lower bound and flag.
  mutate(
    n_raw = n,
    n     = pmax(n, 1.001),
    ptf_n_clamped = n_raw < 1.001
  )

# Report any clamped rows
if (any(ptf_params$ptf_n_clamped)) {
  message(sprintf(
    "Note: n clamped to 1.001 for %d site(s) where PTF predicted n ≤ 1:\n  %s",
    sum(ptf_params$ptf_n_clamped),
    paste(ptf_params$site[ptf_params$ptf_n_clamped], collapse = ", ")
  ))
}
#> Note: n clamped to 1.001 for 5 site(s) where PTF predicted n ≤ 1:
#>   Sandy ridge, Sandy loam slope, Loam terrace, Clay loam flat, Heavy clay valley

ptf_params <- ptf_params |> select(-n_raw, -ptf_n_clamped)
ptf_params
#> # A tibble: 5 × 6
#>   site              theta_r theta_s alpha     n     ks
#>   <chr>               <dbl>   <dbl> <dbl> <dbl>  <dbl>
#> 1 Sandy ridge             0   0.374 0.138  1.00 947.  
#> 2 Sandy loam slope        0   0.403 0.166  1.00 322.  
#> 3 Loam terrace            0   0.432 0.171  1.00  10.6 
#> 4 Clay loam flat          0   0.471 0.248  1.00   6.44
#> 5 Heavy clay valley       0   0.520 0.346  1.00  35.2

3. Generate SWRC curves from PTF parameters

heads <- 10^seq(0, 5, by = 0.1)

ptf_curves <- ptf_params |>
  tidyr::crossing(h = heads) |>
  mutate(m = 1 - 1 / n) |>
  swrc_van_genuchten(
    theta_r = theta_r,
    theta_s = theta_s,
    alpha   = alpha,
    n       = n,
    h       = h
  )

ggplot(ptf_curves, aes(x = h, y = .theta, colour = site)) +
  geom_line(linewidth = 0.9) +
  scale_x_log10(labels = scales::label_comma()) +
  labs(
    x      = "Matric potential |h| (cm)",
    y      = expression(theta~(m^3/m^3)),
    colour = "Site",
    title  = "PTF-estimated soil water retention curves",
    caption = "Wösten (1999) continuous PTF"
  )

4. Derive PAWC from PTF parameters

pawc_ptf <- ptf_params |>
  tidyr::crossing(h = c(330, 15000)) |>
  mutate(m = 1 - 1 / n) |>
  swrc_van_genuchten(theta_r = theta_r, theta_s = theta_s,
                     alpha = alpha, n = n, h = h) |>
  mutate(label = if_else(h == 330, "theta_fc", "theta_wp")) |>
  select(site, label, .theta) |>
  tidyr::pivot_wider(names_from = label, values_from = .theta) |>
  mutate(pawc = theta_fc - theta_wp)

pawc_ptf |>
  select(site, theta_fc, theta_wp, pawc) |>
  arrange(desc(pawc))
#> # A tibble: 5 × 4
#>   site              theta_fc theta_wp    pawc
#>   <chr>                <dbl>    <dbl>   <dbl>
#> 1 Heavy clay valley    0.517    0.515 0.00197
#> 2 Clay loam flat       0.469    0.467 0.00178
#> 3 Loam terrace         0.430    0.429 0.00163
#> 4 Sandy loam slope     0.402    0.400 0.00152
#> 5 Sandy ridge          0.373    0.371 0.00141

ggplot(pawc_ptf,
       aes(x = reorder(site, pawc), y = pawc)) +
  geom_col(fill = "steelblue") +
  coord_flip() +
  labs(
    x = NULL,
    y = "PAWC (m³/m³)",
    title = "Plant-available water capacity from PTF estimates"
  )

5. PTF vs fitted: how well do they compare?

Where measured (h, θ) data exist, it is worth comparing PTF estimates to fitted parameters.

set.seed(3109)

# Simulate lab-measured retention for the loam terrace (site 3)
loam_true <- ptf_params |> filter(site == "Loam terrace")
lab_heads  <- c(0, 10, 33, 100, 330, 1000, 5000, 15000)
lab_obs <- tibble(
  h     = lab_heads,
  theta = with(loam_true,
    theta_r + (theta_s - theta_r) /
      (1 + (alpha * lab_heads)^n)^(1 - 1/n)
  ) + rnorm(length(lab_heads), 0, 0.008)
)

# Fit VG to the simulated lab data
fitted <- fit_swrc(lab_obs, theta_col = theta, h_col = h)
#> Warning in stats::nls(formula = vg_formula, data = fit_df, start =
#> as.list(start_all), : Convergence failure: singular convergence (7)

comparison <- bind_rows(
  mutate(loam_true |> select(theta_r, theta_s, alpha, n), source = "PTF"),
  mutate(fitted     |> select(theta_r, theta_s, alpha, n), source = "Fitted")
) |>
  filter(!is.na(theta_r), theta_s > theta_r, n > 1)

comparison
#> # A tibble: 2 × 5
#>   theta_r theta_s   alpha     n source
#>     <dbl>   <dbl>   <dbl> <dbl> <chr> 
#> 1   0       0.432 0.171    1.00 PTF   
#> 2   0.431   0.440 0.00309 17.5  Fitted

curves_comparison <- comparison |>
  tidyr::crossing(h = heads) |>
  mutate(m = 1 - 1 / n) |>
  swrc_van_genuchten(theta_r = theta_r, theta_s = theta_s,
                     alpha = alpha, n = n, h = h)

ggplot() +
  geom_line(data  = curves_comparison,
            aes(x = h, y = .theta, colour = source, linetype = source),
            linewidth = 0.9) +
  geom_point(data = lab_obs, aes(x = h, y = theta), size = 2.5) +
  scale_x_log10(labels = scales::label_comma()) +
  labs(
    x      = "Matric potential |h| (cm)",
    y      = expression(theta~(m^3/m^3)),
    colour = "Source", linetype = "Source",
    title  = "Loam terrace: PTF estimate vs NLS fit to lab data",
    caption = "Points = simulated lab observations"
  )
#> Warning in scale_x_log10(labels = scales::label_comma()): log-10
#> transformation introduced infinite values.

6. Real-world PTF data sources

Source	Parameters	Coverage	Notes
Wösten et al. (1999)	θ_s, α, n, K_sat	European soils	Used in this vignette
Rawls & Brakensiek (1985)	θ_r, θ_s, α, n	US soils	Class-mean or continuous
Schaap et al. (2001) — Rosetta	θ_r, θ_s, α, n, K_sat	Global	Neural network; `rosettaAPI` R package
SoilGrids + Cosby (1984)	B-C parameters	Global 250m	Via `soilDB` R package

For most applications, Rosetta 3 (Schaap 2001, updated Zhang & Schaap 2017) provides the best accuracy for global use. The rosettaAPI package provides access to the web API.

Summary

The PTF workflow integrates seamlessly with tidysoilwater:

soil_data |>
  mutate(wosten_ptf(sand, silt, clay, om, bd)) |>   # add PTF columns
  crossing(h = heads) |>
  mutate(m = 1 - 1 / n) |>
  swrc_van_genuchten(theta_r, theta_s, alpha, n, h)  # evaluate curve

PTF-derived parameters are best treated as priors — useful for rapid assessment but best validated against local laboratory data when available.