Train, predict, and evaluate classification models

Usage

splendid_model(
  data,
  class,
  algorithms = NULL,
  n = 1,
  seed_boot = NULL,
  seed_samp = NULL,
  seed_alg = NULL,
  convert = FALSE,
  rfe = FALSE,
  ova = FALSE,
  standardize = FALSE,
  sampling = c("none", "up", "down", "smote"),
  stratify = FALSE,
  plus = TRUE,
  threshold = 0,
  trees = 100,
  tune = FALSE,
  vi = FALSE
)

Arguments

data: data frame with rows as samples, columns as features
class: true/reference class vector used for supervised learning
algorithms: character vector of algorithms to use for supervised learning. See Algorithms section for possible options. By default, this argument is NULL, in which case all algorithms are used.
n: number of bootstrap replicates to generate
seed_boot: random seed used for reproducibility in bootstrapping training sets for model generation
seed_samp: random seed used for reproducibility in subsampling training sets for model generation
seed_alg: random seed used for reproducibility when running algorithms with an intrinsic random element (random forests)
convert: logical; if TRUE, converts all categorical variables in data to dummy variables. Certain algorithms only work with such limitations (e.g. LDA).
rfe: logical; if TRUE, run Recursive Feature Elimination as a feature selection method for "lda", "rf", and "svm" algorithms.
ova: logical; if TRUE, a One-Vs-All classification approach is performed for every algorithm in algorithms. The relevant results are prefixed with the string ova_.
standardize: logical; if TRUE, the training sets are standardized on features to have mean zero and unit variance. The test sets are standardized using the vectors of centers and standard deviations used in corresponding training sets.
sampling: the default is "none", in which no subsampling is performed. Other options include "up" (Up-sampling the minority class), "down" (Down-sampling the majority class), and "smote" (synthetic points for the minority class and down-sampling the majority class). Subsampling is only applicable to the training set.
stratify: logical; if TRUE, the bootstrap resampling is performed within each strata of class to ensure the bootstrap sample contains the same proportions of each strata as the original data.
plus: logical; if TRUE (default), the .632+ estimator is calculated. Otherwise, the .632 estimator is calculated.
threshold: a number between 0 and 1 indicating the lowest maximum class probability below which a sample will be unclassified.
trees: number of trees to use in "rf"
tune: logical; if TRUE, algorithms with hyperparameters are tuned
vi: logical; if TRUE, model-based variable importance scores are returned for each algorithm if available. Otherwise, SHAP-based VI scores are calculated.

Algorithms

The classification algorithms currently supported are:

Prediction Analysis for Microarrays ("pam")
Support Vector Machines ("svm")
Random Forests ("rf")
Linear Discriminant Analysis ("lda")
Shrinkage Linear Discriminant Analysis ("slda")
Shrinkage Diagonal Discriminant Analysis ("sdda")
Multinomial Logistic Regression using
- Generalized Linear Model with no penalization ("mlr_glm")
- GLM with LASSO penalty ("mlr_lasso")
- GLM with ridge penalty ("mlr_ridge")
- GLM with elastic net penalty ("mlr_enet")
- Neural Networks ("mlr_nnet")
Neural Networks ("nnet")
Naive Bayes ("nbayes")
AdaBoost.M1 ("adaboost_m1")
Extreme Gradient Boosting ("xgboost")
K-Nearest Neighbours ("knn")

Examples

data(hgsc)
class <- attr(hgsc, "class.true")
sl_result <- splendid_model(hgsc, class, n = 1, algorithms = "xgboost")