.632(+) Estimator for log loss error rate

The .632 estimator for the log loss error rate is calculated for a given classifier. The .632+ estimator is an extension that reduces overfitting and is run by default.

Usage

error_632(data, class, algorithm, pred, test.id, train.id, plus = TRUE)

Arguments

data: data frame with rows as samples, columns as features
class: true/reference class vector used for supervised learning
algorithm: character string for classifier. See splendid for possible options.
pred: vector of OOB predictions using the same classifier as algorithm.
test.id: vector of test set indices for each bootstrap replicate
train.id: vector of training set indices for each bootstrap replicate
plus: logical; if TRUE (default), the .632+ estimator is calculated. Otherwise, the .632 estimator is calculated.

Value

the .632(+) log loss error rate

Details

This function is intended to be used internally by splendid_model.

References

Friedman, Jerome, Trevor Hastie, and Robert Tibshirani. The elements of statistical learning. Vol. 1. New York: Springer series in statistics, 2001.

Efron, Bradley and Tibshirani, Robert (1997), "Improvements on Cross-Validation: The .632+ Bootstrap Method," Journal of American Statistical Association, 92, 438, 548-560.

Author

Derek Chiu

Examples

if (FALSE) { # \dontrun{
data(hgsc)
class <- as.factor(attr(hgsc, "class.true"))
set.seed(1)
train.id <- boot_train(data = hgsc, class = class, n = 5)
test.id <- boot_test(train.id = train.id)
mod <- purrr::map(train.id, ~ classification(hgsc[., ], class[.], "xgboost"))
pred <- purrr::pmap(list(mod = mod, test.id = test.id, train.id = train.id),
prediction, data = hgsc, class = class)
error_632(hgsc, class, "xgboost", pred, test.id, train.id, plus = FALSE)
error_632(hgsc, class, "xgboost", pred, test.id, train.id, plus = TRUE)
} # }