Run a grid search in a nested cross validation.
nested_gridsearch(
x,
y,
searchspace,
FUN,
nouterfolds = 5,
ninnerfolds = 5,
nrepcv = 2,
...
)
matrix
/data.frame
, feature matrix, see ranger()
for
details.
numeric
/factor
, classification labels, see ranger()
for
details.
data.frame
, hyperparameters to tune. Column names have
to match the argument names of FUN
.
function
function to optimize.
integer(1)
, number of outer cross validation folds.
integer(1)
, number of inner cross validation folds.
integer(1)
, number repeats of inner cross validations.
further arguments passed to gs_rusranger()
.
list
, with an element per nouterfolds
containing the following
subelements:
indextrain index of the used training items.
indextest index of the used test items.
performance resulting performance (AUC).
selectedparams select hyperparameters.
gridsearch data.frame
, results of the grid search.
nouterfolds integer(1)
.
ninnerfolds integer(1)
.
nrepcv integer(1)
.
The reported performance could slightly differ from the median performance
in the reported gridsearch. After the gridsearch FUN
is trained again
with the best hyperparameters which results in a new subsampling.
set.seed(20220324)
iris <- subset(iris, Species != "setosa")
searchspace <- expand.grid(
mtry = c(2, 3),
num.trees = c(500, 1000)
)
## n(outer|inner) folds and nrepcv are too low for real world applications,
## and are just used for demonstration and to keep the run time of the examples
## low
nrcv_rusranger(
iris[-5], as.numeric(iris$Species == "versicolor"),
searchspace = searchspace, nouterfolds = 3, ninnerfolds = 3, nrepcv = 1
)
#> [[1]]
#> [[1]]$model
#> Ranger result
#>
#> Call:
#> ranger(x = as.data.frame(x), y = y, probability = probability, classification = classification, min.node.size = min.node.size, replace = replace, case.weights = .caseweights(y, replace = replace), sample.fraction = .samplefraction(y), ..., keep.inbag = FALSE)
#>
#> Type: Probability estimation
#> Number of trees: 500
#> Sample size: 66
#> Number of independent variables: 4
#> Mtry: 3
#> Target node size: 10
#> Variable importance mode: none
#> Splitrule: gini
#> OOB prediction error (Brier s.): NaN
#>
#> [[1]]$indextrain
#> 21 22 23 24 25 26 27 28 29 210 211 212 213 214 215 216 217 218 219 220
#> 4 8 13 14 15 19 20 26 29 30 34 35 38 39 43 44 49 51 55 58
#> 221 222 223 224 225 226 227 228 229 230 231 232 233 234 31 32 33 34 35 36
#> 60 61 63 67 69 74 75 81 83 88 91 94 99 100 1 3 5 9 12 16
#> 37 38 39 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326
#> 18 22 25 27 28 36 37 47 48 50 62 65 70 71 73 76 77 82 84 85
#> 327 328 329 330 331 332
#> 89 90 92 96 97 98
#>
#> [[1]]$indextest
#> 11 12 13 14 15 16 17 18 19 110 111 112 113 114 115 116 117 118 119 120
#> 2 6 7 10 11 17 21 23 24 31 32 33 40 41 42 45 46 52 53 54
#> 121 122 123 124 125 126 127 128 129 130 131 132 133 134
#> 56 57 59 64 66 68 72 78 79 80 86 87 93 95
#>
#> [[1]]$prediction
#> [1] 0.0000000 0.0000000 0.0000000 0.0025000 0.0000000 0.0000000 0.9737500
#> [8] 0.1420357 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
#> [15] 0.0000000 0.0000000 0.0000000 0.9913333 1.0000000 1.0000000 1.0000000
#> [22] 0.0160000 1.0000000 0.9737500 1.0000000 1.0000000 0.9737500 0.9737500
#> [29] 1.0000000 0.5730190 1.0000000 1.0000000 0.9913333 1.0000000
#>
#> [[1]]$truth
#> [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
#>
#> [[1]]$performance
#> [1] 0.01557093
#>
#> [[1]]$selectedparams
#> mtry num.trees
#> 2 3 500
#>
#> [[1]]$gridsearch
#> mtry num.trees Min Q1 Median Q3 Max
#> 1 2 500 0.008333333 0.008333333 0.008333333 0.008333333 0.008333333
#> 2 3 500 0.017857143 0.017857143 0.017857143 0.017857143 0.017857143
#> 3 2 1000 0.000000000 0.000000000 0.000000000 0.000000000 0.000000000
#> 4 3 1000 0.000000000 0.000000000 0.000000000 0.000000000 0.000000000
#>
#> [[1]]$nouterfolds
#> [1] 3
#>
#> [[1]]$ninnerfolds
#> [1] 3
#>
#> [[1]]$nrepcv
#> [1] 1
#>
#>
#> [[2]]
#> [[2]]$model
#> Ranger result
#>
#> Call:
#> ranger(x = as.data.frame(x), y = y, probability = probability, classification = classification, min.node.size = min.node.size, replace = replace, case.weights = .caseweights(y, replace = replace), sample.fraction = .samplefraction(y), ..., keep.inbag = FALSE)
#>
#> Type: Probability estimation
#> Number of trees: 1000
#> Sample size: 66
#> Number of independent variables: 4
#> Mtry: 2
#> Target node size: 10
#> Variable importance mode: none
#> Splitrule: gini
#> OOB prediction error (Brier s.): NaN
#>
#> [[2]]$indextrain
#> 11 12 13 14 15 16 17 18 19 110 111 112 113 114 115 116 117 118 119 120
#> 2 6 7 10 11 17 21 23 24 31 32 33 40 41 42 45 46 52 53 54
#> 121 122 123 124 125 126 127 128 129 130 131 132 133 134 31 32 33 34 35 36
#> 56 57 59 64 66 68 72 78 79 80 86 87 93 95 1 3 5 9 12 16
#> 37 38 39 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326
#> 18 22 25 27 28 36 37 47 48 50 62 65 70 71 73 76 77 82 84 85
#> 327 328 329 330 331 332
#> 89 90 92 96 97 98
#>
#> [[2]]$indextest
#> 21 22 23 24 25 26 27 28 29 210 211 212 213 214 215 216 217 218 219 220
#> 4 8 13 14 15 19 20 26 29 30 34 35 38 39 43 44 49 51 55 58
#> 221 222 223 224 225 226 227 228 229 230 231 232 233 234
#> 60 61 63 67 69 74 75 81 83 88 91 94 99 100
#>
#> [[2]]$prediction
#> [1] 0.045900000 0.494400000 0.056433333 0.007100794 0.000250000 0.090016667
#> [7] 0.026650000 0.001555556 0.007411905 0.003250000 0.776325397 0.004017460
#> [13] 0.056433333 0.000250000 0.013533333 0.045900000 0.026400000 0.988466667
#> [19] 1.000000000 0.987714286 1.000000000 0.988466667 0.992714286 0.983325397
#> [25] 1.000000000 0.919846032 0.981514286 1.000000000 1.000000000 0.983325397
#> [31] 0.992714286 0.981514286 0.988466667 0.951325397
#>
#> [[2]]$truth
#> [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
#>
#> [[2]]$performance
#> [1] 0
#>
#> [[2]]$selectedparams
#> mtry num.trees
#> 3 2 1000
#>
#> [[2]]$gridsearch
#> mtry num.trees Min Q1 Median Q3 Max
#> 1 2 500 0.033333333 0.033333333 0.033333333 0.033333333 0.033333333
#> 2 3 500 0.020661157 0.020661157 0.020661157 0.020661157 0.020661157
#> 3 2 1000 0.057851240 0.057851240 0.057851240 0.057851240 0.057851240
#> 4 3 1000 0.008547009 0.008547009 0.008547009 0.008547009 0.008547009
#>
#> [[2]]$nouterfolds
#> [1] 3
#>
#> [[2]]$ninnerfolds
#> [1] 3
#>
#> [[2]]$nrepcv
#> [1] 1
#>
#>
#> [[3]]
#> [[3]]$model
#> Ranger result
#>
#> Call:
#> ranger(x = as.data.frame(x), y = y, probability = probability, classification = classification, min.node.size = min.node.size, replace = replace, case.weights = .caseweights(y, replace = replace), sample.fraction = .samplefraction(y), ..., keep.inbag = FALSE)
#>
#> Type: Probability estimation
#> Number of trees: 500
#> Sample size: 68
#> Number of independent variables: 4
#> Mtry: 2
#> Target node size: 10
#> Variable importance mode: none
#> Splitrule: gini
#> OOB prediction error (Brier s.): NaN
#>
#> [[3]]$indextrain
#> 11 12 13 14 15 16 17 18 19 110 111 112 113 114 115 116 117 118 119 120
#> 2 6 7 10 11 17 21 23 24 31 32 33 40 41 42 45 46 52 53 54
#> 121 122 123 124 125 126 127 128 129 130 131 132 133 134 21 22 23 24 25 26
#> 56 57 59 64 66 68 72 78 79 80 86 87 93 95 4 8 13 14 15 19
#> 27 28 29 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226
#> 20 26 29 30 34 35 38 39 43 44 49 51 55 58 60 61 63 67 69 74
#> 227 228 229 230 231 232 233 234
#> 75 81 83 88 91 94 99 100
#>
#> [[3]]$indextest
#> 31 32 33 34 35 36 37 38 39 310 311 312 313 314 315 316 317 318 319 320
#> 1 3 5 9 12 16 18 22 25 27 28 36 37 47 48 50 62 65 70 71
#> 321 322 323 324 325 326 327 328 329 330 331 332
#> 73 76 77 82 84 85 89 90 92 96 97 98
#>
#> [[3]]$prediction
#> [1] 0.22633333 0.24873333 0.02433333 0.02433333 0.00000000 0.02633333
#> [7] 0.00000000 0.00000000 0.02433333 0.02633333 0.72173333 0.05220000
#> [13] 0.02633333 0.00000000 0.00000000 0.00000000 0.97300000 0.95382857
#> [19] 0.24197143 1.00000000 1.00000000 0.99100000 0.52140000 1.00000000
#> [25] 0.31340000 0.45557143 0.50225714 1.00000000 0.99040000 1.00000000
#> [31] 0.88473333 1.00000000
#>
#> [[3]]$truth
#> [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
#>
#> [[3]]$performance
#> [1] 0.0234375
#>
#> [[3]]$selectedparams
#> mtry num.trees
#> 1 2 500
#>
#> [[3]]$gridsearch
#> mtry num.trees Min Q1 Median Q3 Max
#> 1 2 500 0 0 0 0 0
#> 2 3 500 0 0 0 0 0
#> 3 2 1000 0 0 0 0 0
#> 4 3 1000 0 0 0 0 0
#>
#> [[3]]$nouterfolds
#> [1] 3
#>
#> [[3]]$ninnerfolds
#> [1] 3
#>
#> [[3]]$nrepcv
#> [1] 1
#>
#>