Run a grid search in a nested cross validation.
nrcv_rusranger(
x,
y,
searchspace,
nouterfolds = 5,
ninnerfolds = 5,
nrepcv = 2,
...
)matrix/data.frame, feature matrix, see ranger() for
details.
numeric/factor, classification labels, see ranger() for
details.
data.frame, hyperparameters to tune. Column names have
to match the argument names of ranger()/rusranger().
integer(1), number of outer cross validation folds.
integer(1), number of inner cross validation folds.
integer(1), number repeats of inner cross validations.
further arguments passed to gs_rusranger().
list, with an element per nouterfolds containing the following
subelements:
model selected ranger model.
indextrain index of the used training items.
indextest index of the used test items.
prediction predictions results.
truth original labels/classes.
performance resulting performance (AUC).
selectedparams select hyperparameters.
gridsearch data.frame, results of the grid search.
nouterfolds integer(1).
ninnerfolds integer(1).
nrepcv integer(1).
The reported performance could slightly differ from the median performance
in the reported gridsearch. After the gridsearch rusranger is trained again
with the best hyperparameters which results in a new subsampling.
set.seed(20220324)
iris <- subset(iris, Species != "setosa")
searchspace <- expand.grid(
mtry = c(2, 3),
num.trees = c(500, 1000)
)
## n(outer|inner) folds and nrepcv are too low for real world applications,
## and are just used for demonstration and to keep the run time of the examples
## low
nrcv_rusranger(
iris[-5], as.numeric(iris$Species == "versicolor"),
searchspace = searchspace, nouterfolds = 3, ninnerfolds = 3, nrepcv = 1
)
#> [[1]]
#> [[1]]$model
#> Ranger result
#>
#> Call:
#> ranger(x = as.data.frame(x), y = y, probability = probability, classification = classification, min.node.size = min.node.size, replace = replace, case.weights = .caseweights(y, replace = replace), sample.fraction = .samplefraction(y), ..., keep.inbag = FALSE)
#>
#> Type: Probability estimation
#> Number of trees: 500
#> Sample size: 66
#> Number of independent variables: 4
#> Mtry: 3
#> Target node size: 10
#> Variable importance mode: none
#> Splitrule: gini
#> OOB prediction error (Brier s.): NaN
#>
#> [[1]]$indextrain
#> 21 22 23 24 25 26 27 28 29 210 211 212 213 214 215 216 217 218 219 220
#> 4 8 13 14 15 19 20 26 29 30 34 35 38 39 43 44 49 51 55 58
#> 221 222 223 224 225 226 227 228 229 230 231 232 233 234 31 32 33 34 35 36
#> 60 61 63 67 69 74 75 81 83 88 91 94 99 100 1 3 5 9 12 16
#> 37 38 39 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326
#> 18 22 25 27 28 36 37 47 48 50 62 65 70 71 73 76 77 82 84 85
#> 327 328 329 330 331 332
#> 89 90 92 96 97 98
#>
#> [[1]]$indextest
#> 11 12 13 14 15 16 17 18 19 110 111 112 113 114 115 116 117 118 119 120
#> 2 6 7 10 11 17 21 23 24 31 32 33 40 41 42 45 46 52 53 54
#> 121 122 123 124 125 126 127 128 129 130 131 132 133 134
#> 56 57 59 64 66 68 72 78 79 80 86 87 93 95
#>
#> [[1]]$prediction
#> [1] 0.0000000 0.0000000 0.0000000 0.0025000 0.0000000 0.0000000 0.9737500
#> [8] 0.1420357 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
#> [15] 0.0000000 0.0000000 0.0000000 0.9913333 1.0000000 1.0000000 1.0000000
#> [22] 0.0160000 1.0000000 0.9737500 1.0000000 1.0000000 0.9737500 0.9737500
#> [29] 1.0000000 0.5730190 1.0000000 1.0000000 0.9913333 1.0000000
#>
#> [[1]]$truth
#> [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
#>
#> [[1]]$performance
#> [1] 0.01557093
#>
#> [[1]]$selectedparams
#> mtry num.trees
#> 2 3 500
#>
#> [[1]]$gridsearch
#> mtry num.trees Min Q1 Median Q3 Max
#> 1 2 500 0.008333333 0.008333333 0.008333333 0.008333333 0.008333333
#> 2 3 500 0.017857143 0.017857143 0.017857143 0.017857143 0.017857143
#> 3 2 1000 0.000000000 0.000000000 0.000000000 0.000000000 0.000000000
#> 4 3 1000 0.000000000 0.000000000 0.000000000 0.000000000 0.000000000
#>
#> [[1]]$nouterfolds
#> [1] 3
#>
#> [[1]]$ninnerfolds
#> [1] 3
#>
#> [[1]]$nrepcv
#> [1] 1
#>
#>
#> [[2]]
#> [[2]]$model
#> Ranger result
#>
#> Call:
#> ranger(x = as.data.frame(x), y = y, probability = probability, classification = classification, min.node.size = min.node.size, replace = replace, case.weights = .caseweights(y, replace = replace), sample.fraction = .samplefraction(y), ..., keep.inbag = FALSE)
#>
#> Type: Probability estimation
#> Number of trees: 1000
#> Sample size: 66
#> Number of independent variables: 4
#> Mtry: 2
#> Target node size: 10
#> Variable importance mode: none
#> Splitrule: gini
#> OOB prediction error (Brier s.): NaN
#>
#> [[2]]$indextrain
#> 11 12 13 14 15 16 17 18 19 110 111 112 113 114 115 116 117 118 119 120
#> 2 6 7 10 11 17 21 23 24 31 32 33 40 41 42 45 46 52 53 54
#> 121 122 123 124 125 126 127 128 129 130 131 132 133 134 31 32 33 34 35 36
#> 56 57 59 64 66 68 72 78 79 80 86 87 93 95 1 3 5 9 12 16
#> 37 38 39 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326
#> 18 22 25 27 28 36 37 47 48 50 62 65 70 71 73 76 77 82 84 85
#> 327 328 329 330 331 332
#> 89 90 92 96 97 98
#>
#> [[2]]$indextest
#> 21 22 23 24 25 26 27 28 29 210 211 212 213 214 215 216 217 218 219 220
#> 4 8 13 14 15 19 20 26 29 30 34 35 38 39 43 44 49 51 55 58
#> 221 222 223 224 225 226 227 228 229 230 231 232 233 234
#> 60 61 63 67 69 74 75 81 83 88 91 94 99 100
#>
#> [[2]]$prediction
#> [1] 0.045900000 0.494400000 0.056433333 0.007100794 0.000250000 0.090016667
#> [7] 0.026650000 0.001555556 0.007411905 0.003250000 0.776325397 0.004017460
#> [13] 0.056433333 0.000250000 0.013533333 0.045900000 0.026400000 0.988466667
#> [19] 1.000000000 0.987714286 1.000000000 0.988466667 0.992714286 0.983325397
#> [25] 1.000000000 0.919846032 0.981514286 1.000000000 1.000000000 0.983325397
#> [31] 0.992714286 0.981514286 0.988466667 0.951325397
#>
#> [[2]]$truth
#> [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
#>
#> [[2]]$performance
#> [1] 0
#>
#> [[2]]$selectedparams
#> mtry num.trees
#> 3 2 1000
#>
#> [[2]]$gridsearch
#> mtry num.trees Min Q1 Median Q3 Max
#> 1 2 500 0.033333333 0.033333333 0.033333333 0.033333333 0.033333333
#> 2 3 500 0.020661157 0.020661157 0.020661157 0.020661157 0.020661157
#> 3 2 1000 0.057851240 0.057851240 0.057851240 0.057851240 0.057851240
#> 4 3 1000 0.008547009 0.008547009 0.008547009 0.008547009 0.008547009
#>
#> [[2]]$nouterfolds
#> [1] 3
#>
#> [[2]]$ninnerfolds
#> [1] 3
#>
#> [[2]]$nrepcv
#> [1] 1
#>
#>
#> [[3]]
#> [[3]]$model
#> Ranger result
#>
#> Call:
#> ranger(x = as.data.frame(x), y = y, probability = probability, classification = classification, min.node.size = min.node.size, replace = replace, case.weights = .caseweights(y, replace = replace), sample.fraction = .samplefraction(y), ..., keep.inbag = FALSE)
#>
#> Type: Probability estimation
#> Number of trees: 500
#> Sample size: 68
#> Number of independent variables: 4
#> Mtry: 2
#> Target node size: 10
#> Variable importance mode: none
#> Splitrule: gini
#> OOB prediction error (Brier s.): NaN
#>
#> [[3]]$indextrain
#> 11 12 13 14 15 16 17 18 19 110 111 112 113 114 115 116 117 118 119 120
#> 2 6 7 10 11 17 21 23 24 31 32 33 40 41 42 45 46 52 53 54
#> 121 122 123 124 125 126 127 128 129 130 131 132 133 134 21 22 23 24 25 26
#> 56 57 59 64 66 68 72 78 79 80 86 87 93 95 4 8 13 14 15 19
#> 27 28 29 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226
#> 20 26 29 30 34 35 38 39 43 44 49 51 55 58 60 61 63 67 69 74
#> 227 228 229 230 231 232 233 234
#> 75 81 83 88 91 94 99 100
#>
#> [[3]]$indextest
#> 31 32 33 34 35 36 37 38 39 310 311 312 313 314 315 316 317 318 319 320
#> 1 3 5 9 12 16 18 22 25 27 28 36 37 47 48 50 62 65 70 71
#> 321 322 323 324 325 326 327 328 329 330 331 332
#> 73 76 77 82 84 85 89 90 92 96 97 98
#>
#> [[3]]$prediction
#> [1] 0.22633333 0.24873333 0.02433333 0.02433333 0.00000000 0.02633333
#> [7] 0.00000000 0.00000000 0.02433333 0.02633333 0.72173333 0.05220000
#> [13] 0.02633333 0.00000000 0.00000000 0.00000000 0.97300000 0.95382857
#> [19] 0.24197143 1.00000000 1.00000000 0.99100000 0.52140000 1.00000000
#> [25] 0.31340000 0.45557143 0.50225714 1.00000000 0.99040000 1.00000000
#> [31] 0.88473333 1.00000000
#>
#> [[3]]$truth
#> [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
#>
#> [[3]]$performance
#> [1] 0.0234375
#>
#> [[3]]$selectedparams
#> mtry num.trees
#> 1 2 500
#>
#> [[3]]$gridsearch
#> mtry num.trees Min Q1 Median Q3 Max
#> 1 2 500 0 0 0 0 0
#> 2 3 500 0 0 0 0 0
#> 3 2 1000 0 0 0 0 0
#> 4 3 1000 0 0 0 0 0
#>
#> [[3]]$nouterfolds
#> [1] 3
#>
#> [[3]]$ninnerfolds
#> [1] 3
#>
#> [[3]]$nrepcv
#> [1] 1
#>
#>