The .632 estimator for the log loss error rate is calculated for a given classifier. The .632+ estimator is an extension that reduces overfitting and is run by default.
Arguments
- data
data frame with rows as samples, columns as features
- class
true/reference class vector used for supervised learning
- algorithm
character string for classifier. See
splendid
for possible options.- pred
vector of OOB predictions using the same classifier as
algorithm
.- test.id
vector of test set indices for each bootstrap replicate
- train.id
vector of training set indices for each bootstrap replicate
- plus
logical; if
TRUE
(default), the .632+ estimator is calculated. Otherwise, the .632 estimator is calculated.
References
Friedman, Jerome, Trevor Hastie, and Robert Tibshirani. The elements of statistical learning. Vol. 1. New York: Springer series in statistics, 2001.
Efron, Bradley and Tibshirani, Robert (1997), "Improvements on Cross-Validation: The .632+ Bootstrap Method," Journal of American Statistical Association, 92, 438, 548-560.
Examples
if (FALSE) { # \dontrun{
data(hgsc)
class <- as.factor(attr(hgsc, "class.true"))
set.seed(1)
train.id <- boot_train(data = hgsc, class = class, n = 5)
test.id <- boot_test(train.id = train.id)
mod <- purrr::map(train.id, ~ classification(hgsc[., ], class[.], "xgboost"))
pred <- purrr::pmap(list(mod = mod, test.id = test.id, train.id = train.id),
prediction, data = hgsc, class = class)
error_632(hgsc, class, "xgboost", pred, test.id, train.id, plus = FALSE)
error_632(hgsc, class, "xgboost", pred, test.id, train.id, plus = TRUE)
} # }