Evaluates algorithms on internal/external validation indices. Poor performing algorithms can be trimmed from the ensemble. The remaining algorithms can be given weights before use in consensus functions.
Usage
consensus_evaluate(
data,
...,
cons.cl = NULL,
ref.cl = NULL,
k.method = NULL,
plot = FALSE,
trim = FALSE,
reweigh = FALSE,
n = 5,
lower = 0,
upper = 1
)
Arguments
- data
data matrix with rows as samples and columns as variables
- ...
any number of objects outputted from
consensus_cluster()
- cons.cl
matrix of cluster assignments from consensus functions such as
kmodes
andmajority_voting
- ref.cl
reference class
- k.method
determines the method to choose k when no reference class is given. When
ref.cl
is notNULL
, k is the number of distinct classes ofref.cl
. Otherwise the input fromk.method
chooses k. The default is to use the PAC to choose the best k(s). Specifying an integer as a user-desired k will override the best k chosen by PAC. Finally, specifying "all" will produce consensus results for all k. The "all" method is implicitly performed when there is only one k used.- plot
logical; if
TRUE
,graph_all
is called- trim
logical; if
TRUE
, algorithms that score low on internal indices will be trimmed out- reweigh
logical; if
TRUE
, after trimming out poor performing algorithms, each algorithm is reweighed depending on its internal indices.- n
an integer specifying the top
n
algorithms to keep after trimming off the poor performing ones using Rank Aggregation. If the total number of algorithms is less thann
no trimming is done.- lower
the lower bound that determines what is ambiguous
- upper
the upper bound that determines what is ambiguous
Value
consensus_evaluate
returns a list with the following elements
k
: ifref.cl
is notNULL
, this is the number of distinct classes in the reference; otherwise the chosenk
is determined by the one giving the largest mean PAC across algorithmspac
: a data frame showing the PAC for each combination of algorithm and cluster sizeii
: a list of data frames for all k showing internal evaluation indicesei
: a data frame showing external evaluation indices fork
trim.obj
: A list with 4 elementsalg.keep
: algorithms keptalg.remove
: algorithms removedrank.matrix
: a matrix of ranked algorithms for every internal evaluation indextop.list
: final order of ranked algorithmsE.new
: A new version of aconsensus_cluster
data object
Details
This function always returns internal indices. If ref.cl
is not NULL
,
external indices are additionally shown. Relevant graphical displays are also
outputted. Algorithms are ranked across internal indices using Rank
Aggregation. Only the top n
algorithms are kept, the rest are trimmed.
Examples
# Consensus clustering for multiple algorithms
set.seed(911)
x <- matrix(rnorm(500), ncol = 10)
CC <- consensus_cluster(x, nk = 3:4, reps = 10, algorithms = c("ap", "km"),
progress = FALSE)
# Evaluate algorithms on internal/external indices and trim algorithms:
# remove those ranking low on internal indices
set.seed(1)
ref.cl <- sample(1:4, 50, replace = TRUE)
z <- consensus_evaluate(x, CC, ref.cl = ref.cl, n = 1, trim = TRUE)
str(z, max.level = 2)
#> List of 5
#> $ k : int 4
#> $ pac :'data.frame': 2 obs. of 3 variables:
#> ..$ k : chr [1:2] "3" "4"
#> ..$ AP: num [1:2] 0.505 0.48
#> ..$ KM: num [1:2] 0.514 0.498
#> $ ii :List of 2
#> ..$ 3:'data.frame': 2 obs. of 11 variables:
#> ..$ 4:'data.frame': 2 obs. of 11 variables:
#> $ ei :List of 1
#> ..$ 4:'data.frame': 2 obs. of 19 variables:
#> $ trim.obj:List of 5
#> ..$ alg.keep : chr "KM"
#> ..$ alg.remove : chr "AP"
#> ..$ rank.matrix:List of 1
#> ..$ top.list :List of 1
#> ..$ E.new :List of 1