Functions to predict class labels on the Out-Of-Bag (test) set for different classifiers.
prediction(
mod,
data,
class = NULL,
test.id = NULL,
train.id = NULL,
threshold = 0,
standardize = FALSE,
...
)
# S3 method for default
prediction(
mod,
data,
class = NULL,
test.id = NULL,
train.id = NULL,
threshold = 0,
standardize = FALSE,
...
)
# S3 method for pamrtrained
prediction(
mod,
data,
class = NULL,
test.id = NULL,
train.id = NULL,
threshold = 0,
standardize = FALSE,
...
)
# S3 method for knn
prediction(
mod,
data,
class = NULL,
test.id = NULL,
train.id = NULL,
threshold = 0,
standardize = FALSE,
...
)
model object from classification()
data frame with rows as samples, columns as features
true/reference class vector used for supervised learning
integer vector of indices for test set. If NULL
(default),
all samples are used.
integer vector of indices for training set. If NULL
(default), all samples are used.
a number between 0 and 1 indicating the lowest maximum class probability below which a sample will be unclassified.
logical; if TRUE
, the training sets are standardized on
features to have mean zero and unit variance. The test sets are
standardized using the vectors of centers and standard deviations used in
corresponding training sets.
additional arguments to be passed to or from methods
A factor of predicted classes with labels in the same order as true
class. If mod
is a "pamr"
classifier, the return value is a list of
length 2: the predicted class, and the threshold value.
The knn
and pamr
prediction methods use the train.id
and class
arguments for additional modelling steps before prediction. For knn
, the
modelling and prediction are performed in one step, so the function takes in
both training and test set identifiers. For pamr
, the classifier needs to
be cross-validated on the training set in order to find a shrinkage threshold
with the minimum CV error to use in prediction on the test set. The other
prediction methods make use of the default method.
data(hgsc)
class <- attr(hgsc, "class.true")
set.seed(1)
training.id <- sample(seq_along(class), replace = TRUE)
test.id <- which(!seq_along(class) %in% training.id)
mod <- classification(hgsc[training.id, ], class[training.id], "slda")
pred <- prediction(mod, hgsc, class, test.id)
table(true = class[test.id], pred)
#> pred
#> true DIF.C4 IMM.C2 MES.C1 PRO.C5
#> DIF.C4 43 4 3 3
#> IMM.C2 5 33 3 0
#> MES.C1 3 2 38 1
#> PRO.C5 17 6 2 24