Impute NA values with the logmean, mean, minimal or maximum reference value.

impute_df(x, limits, method = c("logmean", "mean", "min", "max"))

Arguments

x

data.frame, with the columns: "age", numeric, "sex", factor and more user defined numeric columns that should be imputed.

limits

data.frame, reference table, has to have the columns: "age", numeric (same units as in age, e.g. days or years, age of 0 matches all ages), "sex", factor (same levels for male and female as sex and a special level "both"), "param", character with the laboratory parameter name that have to match the column name in x, "lower" and "upper", numeric for the lower and upper reference limits.

method

character, imputation method. method = "logmean" (default) replaces all NA with its corresponding logged mean values for the reference table limits (for subsequent use of the zlog score, use method = "mean" for *z* score calculation). For method = "min"ormethod = "max"` the lower or the upper limits are used.

Value

data.frame, the same as x but missing values are replaced by the corresponding logmean, mean, minimal or maximal reference values depending on the chosen method.

Note

Imputation should be done prior to z()/zlog() transformation. Afterwards the NA could replaced by zero (for mean-imputation) via d[is.na(d)] <- 0.

Author

Sebastian Gibb

Examples

l <- data.frame(
    param = c("alb", "bili"),
    age = c(0, 0),
    sex = c("both", "both"),
    units = c("mg/l", "µmol/l"),
    lower = c(35, 2),
    upper = c(52, 21)
)
x <- data.frame(
    age = 40:48,
    sex = rep(c("female", "male"), c(5, 4)),
    # from Hoffmann et al. 2017
    alb = c(42, NA, 38, NA, 50, 42, 27, 31, 24),
    bili = c(11, 9, NA, NA, 22, 42, NA, 200, 20)
)
impute_df(x, l)
#>   age    sex      alb       bili
#> 1  40 female 42.00000  11.000000
#> 2  41 female 42.66146   9.000000
#> 3  42 female 38.00000   6.480741
#> 4  43 female 42.66146   6.480741
#> 5  44 female 50.00000  22.000000
#> 6  45   male 42.00000  42.000000
#> 7  46   male 27.00000   6.480741
#> 8  47   male 31.00000 200.000000
#> 9  48   male 24.00000  20.000000
impute_df(x, l, method = "min")
#>   age    sex alb bili
#> 1  40 female  42   11
#> 2  41 female  35    9
#> 3  42 female  38    2
#> 4  43 female  35    2
#> 5  44 female  50   22
#> 6  45   male  42   42
#> 7  46   male  27    2
#> 8  47   male  31  200
#> 9  48   male  24   20
zlog_df(impute_df(x, l), l)
#>   age    sex        alb      bili
#> 1  40 female -0.1547222 0.8819855
#> 2  41 female  0.0000000 0.5474516
#> 3  42 female -1.1456903 0.0000000
#> 4  43 female  0.0000000 0.0000000
#> 5  44 female  1.5716234 2.0375165
#> 6  45   male -0.1547222 3.1154950
#> 7  46   male -4.5294925 0.0000000
#> 8  47   male -3.1616084 5.7172179
#> 9  48   male -5.6957115 1.8786269