External validity indices — external

External validity indices compare a predicted clustering result with a reference class or gold standard.

Usage

ev_nmi(pred.lab, ref.lab, method = "emp")

ev_confmat(pred.lab, ref.lab)

Arguments

pred.lab: predicted labels generated by classifier
ref.lab: reference labels for the observations
method: method of computing the entropy. Can be any one of "emp", "mm", "shrink", or "sg".

Value

ev_nmi returns the normalized mutual information.

ev_confmat returns a tibble of the following summary statistics using yardstick::summary.conf_mat():

accuracy: Accuracy
kap: Cohen's kappa
sens: Sensitivity
spec: Specificity
ppv: Positive predictive value
npv: Negative predictive value
mcc: Matthews correlation coefficient
j_index: Youden's J statistic
bal_accuracy: Balanced accuracy
detection_prevalence: Detection prevalence
precision: alias for ppv
recall: alias for sens
f_meas: F Measure

Details

ev_nmi calculates the normalized mutual information

ev_confmat calculates a variety of statistics associated with confusion matrices. Accuracy, Cohen's kappa, and Matthews correlation coefficient have direct multiclass definitions, whereas all other metrics use macro-averaging.

Note

ev_nmi is adapted from infotheo::mutinformation()

References

Strehl A, Ghosh J. Cluster ensembles: a knowledge reuse framework for combining multiple partitions. J. Mach. Learn. Res. 2002;3:583-617.

Author

Johnson Liu, Derek Chiu

Examples

set.seed(1)
E <- matrix(rep(sample(1:4, 1000, replace = TRUE)), nrow = 100, byrow =
              FALSE)
x <- sample(1:4, 100, replace = TRUE)
y <- sample(1:4, 100, replace = TRUE)
ev_nmi(x, y)
#> [1] 0.05665824
ev_confmat(x, y)
#> # A tibble: 13 × 3
#>    .metric              .estimator .estimate
#>    <chr>                <chr>          <dbl>
#>  1 accuracy             multiclass     0.36 
#>  2 kap                  multiclass     0.137
#>  3 sens                 macro          0.349
#>  4 spec                 macro          0.785
#>  5 ppv                  macro          0.344
#>  6 npv                  macro          0.785
#>  7 mcc                  multiclass     0.138
#>  8 j_index              macro          0.134
#>  9 bal_accuracy         macro          0.567
#> 10 detection_prevalence macro          0.25 
#> 11 precision            macro          0.344
#> 12 recall               macro          0.349
#> 13 f_meas               macro          0.345