GSbench fits and compares genomic-selection models — GBLUP, ridge marker effects, and machine-learning methods — through one interface, using breeding-relevant cross-validation. This vignette walks through a full workflow.
For a reproducible example we simulate a population: 300 lines, 2000 SNPs, 50 QTL, narrow-sense heritability 0.5.
Filter low-call-rate, low-MAF and monomorphic markers, and impute the rest.
gblup() estimates variance components by REML. (Its
GEBVs match rrBLUP::mixed.solve to within 6e-5
— see the package tests.)
fit <- gblup(sim$pheno, geno)
fit
#> <gblup>
#> Vu = 50.27, Ve = 37.31, h2 = 0.574
#> 300 individuals; intercept = -0.4076; marker effects available
cor(fit$gebv, sim$bv) # accuracy against the true breeding values
#> [1] 0.6941135Because geno was supplied, the fit also carries ridge
marker effects, so it can predict new genotypes:
gs_fit() exposes the same API for every model family.
Which are available depends on which optional packages you have
installed:
gs_cv() runs breeding-relevant cross-validation. Random
k-fold predicts untested lines (CV1); leave_group_out holds
out whole families or environments.
cv <- gs_cv(sim$pheno, geno, models = "gblup", k = 5, reps = 1, seed = 1)
cv
#> <gs_cv: kfold>
#> 5-fold x 1 rep(s)
#> model mean sd n_folds
#> gblup 0.159 0.076 5
#> (accuracy = predictive ability, cor(pred, observed) on held-out data)
# leave-one-family-out, with a toy family structure
fam <- rep(1:6, length.out = nrow(geno))
gs_cv(sim$pheno, geno, models = "gblup",
scheme = "leave_group_out", groups = fam)
#> <gs_cv: leave_group_out>
#> model mean sd n_folds
#> gblup 0.196 0.166 6
#> (accuracy = predictive ability, cor(pred, observed) on held-out data)gs_ensemble() combines base models into a super-learner:
each model’s out-of-fold predictions are used to learn non-negative
weights (summing to one), and the final predictor is that weighted
blend.
With several base models installed, the weights show how much each contributes.
gs_benchmark() runs every available model — including
the ensemble — under one cross-validation and reports predictive ability
so you can compare them fairly.
bench <- gs_benchmark(sim$pheno, geno, models = c("gblup", "ensemble"),
k = 5, seed = 1)
bench$accuracy
#> model mean sd n_folds
#> 1 ensemble 0.2837168 0.04406776 5
#> 2 gblup 0.1590768 0.07638873 5For an additive trait like this one, GBLUP and elastic net are typically strongest; tree methods trail; the ensemble lands near the top without you having to pick the winner in advance.