External validity indices compare a predicted clustering result with a reference class or gold standard.
Arguments
- pred.lab
predicted labels generated by classifier
- ref.lab
reference labels for the observations
- method
method of computing the entropy. Can be any one of "emp", "mm", "shrink", or "sg".
Value
ev_nmi
returns the normalized mutual information.
ev_confmat
returns a tibble of the following summary statistics using yardstick::summary.conf_mat()
:
accuracy
: Accuracykap
: Cohen's kappasens
: Sensitivityspec
: Specificityppv
: Positive predictive valuenpv
: Negative predictive valuemcc
: Matthews correlation coefficientj_index
: Youden's J statisticbal_accuracy
: Balanced accuracydetection_prevalence
: Detection prevalenceprecision
: alias forppv
recall
: alias forsens
f_meas
: F Measure
Details
ev_nmi
calculates the normalized mutual information
ev_confmat
calculates a variety of statistics associated with
confusion matrices. Accuracy, Cohen's kappa, and Matthews correlation
coefficient have direct multiclass definitions, whereas all other
metrics use macro-averaging.
Note
ev_nmi
is adapted from infotheo::mutinformation()
References
Strehl A, Ghosh J. Cluster ensembles: a knowledge reuse framework for combining multiple partitions. J. Mach. Learn. Res. 2002;3:583-617.
Examples
set.seed(1)
E <- matrix(rep(sample(1:4, 1000, replace = TRUE)), nrow = 100, byrow =
FALSE)
x <- sample(1:4, 100, replace = TRUE)
y <- sample(1:4, 100, replace = TRUE)
ev_nmi(x, y)
#> [1] 0.05665824
ev_confmat(x, y)
#> # A tibble: 13 × 3
#> .metric .estimator .estimate
#> <chr> <chr> <dbl>
#> 1 accuracy multiclass 0.36
#> 2 kap multiclass 0.137
#> 3 sens macro 0.349
#> 4 spec macro 0.785
#> 5 ppv macro 0.344
#> 6 npv macro 0.785
#> 7 mcc multiclass 0.138
#> 8 j_index macro 0.134
#> 9 bal_accuracy macro 0.567
#> 10 detection_prevalence macro 0.25
#> 11 precision macro 0.344
#> 12 recall macro 0.349
#> 13 f_meas macro 0.345