External validity indices compare a predicted clustering result with a reference class or gold standard.

ev_nmi(pred.lab, ref.lab, method = "emp")

ev_confmat(pred.lab, ref.lab)

Arguments

pred.lab

predicted labels generated by classifier

ref.lab

reference labels for the observations

method

method of computing the entropy. Can be any one of "emp", "mm", "shrink", or "sg".

Value

ev_nmi returns the normalized mutual information. ev_confmat returns a tibble of the following summary statistics using yardstick::summary.conf_mat():

  • accuracy: Accuracy

  • kap: Cohen's kappa

  • sens: Sensitivity

  • spec: Specificity

  • ppv: Positive predictive value

  • npv: Negative predictive value

  • mcc: Matthews correlation coefficient

  • j_index: Youden's J statistic

  • bal_accuracy: Balanced accuracy

  • detection_prevalence: Detection prevalence

  • precision: alias for ppv

  • recall: alias for sens

  • f_meas: F Measure

Details

ev_nmi calculates the normalized mutual information

ev_confmat calculates a variety of statistics associated with confusion matrices. Accuracy, Cohen's kappa, and Matthews correlation coefficient have direct multiclass definitions, whereas all other metrics use macro-averaging.

Note

ev_nmi is adapted from infotheo::mutinformation()

References

Strehl A, Ghosh J. Cluster ensembles: a knowledge reuse framework for combining multiple partitions. J. Mach. Learn. Res. 2002;3:583-617.

Author

Johnson Liu, Derek Chiu

Examples

set.seed(1)
E <- matrix(rep(sample(1:4, 1000, replace = TRUE)), nrow = 100, byrow =
              FALSE)
x <- sample(1:4, 100, replace = TRUE)
y <- sample(1:4, 100, replace = TRUE)
ev_nmi(x, y)
#> [1] 0.05665824
ev_confmat(x, y)
#> # A tibble: 13 × 3
#>    .metric              .estimator .estimate
#>    <chr>                <chr>          <dbl>
#>  1 accuracy             multiclass     0.36 
#>  2 kap                  multiclass     0.137
#>  3 sens                 macro          0.349
#>  4 spec                 macro          0.785
#>  5 ppv                  macro          0.344
#>  6 npv                  macro          0.785
#>  7 mcc                  multiclass     0.138
#>  8 j_index              macro          0.134
#>  9 bal_accuracy         macro          0.567
#> 10 detection_prevalence macro          0.25 
#> 11 precision            macro          0.344
#> 12 recall               macro          0.349
#> 13 f_meas               macro          0.345