This dataset includes laboratory diagnostics for the complete blood counts without differentiation, C-reactive protein and procalictonin for patients admitted at the University Hospital Leipzig from 2014 to 2019 as Training and from 2020 to 2021 as Validation set, respectively. For external validation the same laboratory values were taken from 2015 to 2020 at the University Hospital Greifswald.
sbcdata
A data.table
with 18 columns/variables:
integer
, identification number of the case, unique for each
center.
integer
, age of the patient in years.
character
, "W
" for female and "M"
for male.
character
, diagnosis, could be "Control"
, "SIRS"
and "Sepsis"
. See below for details.
character
, center, one of "Greifswald"
or "Leipzig"
.
character
, center, one of "Training"
or "Validation"
.
character
, sender/origin which send the blood sample to the
laboratory. See sendercodes
for a description of all possible codes.
integer
, counter for episodes on intensive care units,
is incremented by one after each discharge from the intensive care unit.
integer
, (relative) time when the blood was analysed.
The first timepoint for each case is always set to zero.
character
, the name/type of the intensive care unit where
the patient/case has to be admitted to. See sendercodes
for a
description of all possible codes.
integer
, time in seconds until the patient/case has to be
admitted to the TargetIcu
intensive care unit.
This time is negative if the patient/case is already on the
intensive care unit TargetIcu
.
double
, C-reactive protein in mg/l.
double
, hemoglobin in mmol/l.
double
, mean corpuscular volume in fl.
double
, procalcitonin in Gpt/l.
double
, platelets in Tpt/l.
double
, red blood count in Gpt/l.
double
, white blood count in ng/ml.
The Diagnosis
was based on ICD10-GM codes. "Sepsis"
was assumed for:
A02.1
A20.7
A22.7
A24.1
A26.7
A32.7
A39.2, A39.3, A39.4
A40.0, A40.1, A40.2, A40.3, A40.8, A40.9
A41.0, A41.1, A41.2, A41.3, A41.4, A41.51, A41.52, A41.58, A41.8, A41.9
A42.7
B37.7
R57.2
R65.1
If the ICD10 code was R65.x without any of the sepsis-related codes above the
Diagnoses
"SIRS"
was used (except R65.2).
Everything else is labeled as "Control
".
For the Center
"Greifswald"
there are a few entries with duplicated
time points Time
for the same Id
and Sender
with different laboratory
values. This happens due to the analyses of multiple blood samples from the
same patient at the same time in the same run of the analyser.
It could not be decided which one is the correct/better one so removal is
suggested. An example could be found below.
In the Center
"Greifswald"
the sender codes are not as detailed as in
"Leipzig"
. That's why "OTHER"
occures more often and "AMB"
fewer
than in "Leipzig"
.
At the Center
"Greifswald"
the admission/discharge timepoint was recorded
in and extracted from the clinical information system. These data were not
available for the Center
"Leipzig"
. There the first/last blood sample
taken on an intensive care unit was taken as timepoint for
admission/discharge (which is not necessarly part of the dataset).
That's why the first blood sample on an intensive care
unit could have a SecToIcu
of zero in contrast to a negative value for
Center
"Greifswald"
. If needed this could be harmonized by adding the
first SecToIcu
for an intensive care unit to all SecToIcu
for each
Episode
in the "Greifswald"
and/or "Leipzig"
subset.
An example could be found below.
In a few cases there is a mismatch between the timepoint of
admission/discharge extracted from the clinical information system at
Center
"Greifswald"
and the entry in Center
. It could happen that the
Sender
is a non-ICU ward and the SecToIcu
time is negative. According to
the admission data the patient was already on an ICU but the laboratory order
was taken from a non-ICU ward. Mostly this time difference is around a few
minutes (a blood sample was taken on the non-ICU ward, but the analysis in
the laboratory started after transfer to the ICU).
## remove duplicate laboratory entries (see text above for explanation)
greifswald <- subset(sbcdata, Center == "Greifswald")
dup <- duplicated(greifswald[, .(Id, Time)]) |
duplicated(greifswald[, .(Id, Time)], fromLast = TRUE)
mean(dup)
#> [1] 0.0008293431
greifswald <- greifswald[!dup,]
## adjust SecToIcu for subset Greifswald (see text above for explanation)
greifswald <- subset(sbcdata, Center == "Greifswald")
## create helper columns
greifswald[, isNewWard := (
c(FALSE, Id[-1] == Id[-.N]) & # same case
c(FALSE, Sender[-1] != Sender[-.N]) # new ward
)]
#> Id Age Sex Diagnosis Center Set Sender Episode
#> <int> <int> <char> <char> <char> <char> <char> <int>
#> 1: 1 25 W Control Greifswald Validation AMB 1
#> 2: 2 75 M Control Greifswald Validation GEN 1
#> 3: 3 77 W Sepsis Greifswald Validation OTHER 1
#> 4: 3 77 W Sepsis Greifswald Validation OTHER 1
#> 5: 3 77 W Sepsis Greifswald Validation OTHER 1
#> ---
#> 665583: 169140 56 M Control Greifswald Validation GEN 1
#> 665584: 169140 56 M Control Greifswald Validation GEN 1
#> 665585: 169140 56 M Control Greifswald Validation GEN 1
#> 665586: 169141 60 W Control Greifswald Validation ED 1
#> 665587: 169141 60 W Control Greifswald Validation GEN 1
#> Time TargetIcu SecToIcu CRP HGB MCV PCT PLT RBC WBC
#> <num> <char> <num> <num> <num> <num> <num> <int> <num> <num>
#> 1: 0 <NA> NA 15.5 7.0 80.5 NA 264 4.2 8.40
#> 2: 0 <NA> NA 7.4 8.4 87.9 NA 260 4.8 8.47
#> 3: 0 <NA> NA 96.1 4.8 81.7 NA 385 3.0 13.20
#> 4: 318840 <NA> NA 57.0 4.4 82.2 NA 416 2.8 14.20
#> 5: 578640 <NA> NA 93.4 5.7 82.0 0.22 437 3.5 13.80
#> ---
#> 665583: 118380 <NA> NA NA 8.7 88.1 NA 200 4.7 6.20
#> 665584: 168660 <NA> NA 95.0 8.7 88.4 NA 233 4.7 6.92
#> 665585: 340440 <NA> NA 63.6 8.1 87.6 NA 225 4.5 3.70
#> 665586: 0 <NA> NA NA 9.1 90.0 NA 337 4.8 10.80
#> 665587: 70740 <NA> NA 4.7 9.7 91.7 NA 371 5.2 11.60
#> Excluded Label isNewWard
#> <lgcl> <char> <lgcl>
#> 1: FALSE Control FALSE
#> 2: FALSE Control FALSE
#> 3: TRUE Control FALSE
#> 4: TRUE Control FALSE
#> 5: TRUE Control FALSE
#> ---
#> 665583: FALSE Control FALSE
#> 665584: FALSE Control FALSE
#> 665585: FALSE Control FALSE
#> 665586: FALSE Control FALSE
#> 665587: FALSE Control TRUE
greifswald[, isIcuAdmission := isNewWard & grepl("ICU", Sender)]
#> Id Age Sex Diagnosis Center Set Sender Episode
#> <int> <int> <char> <char> <char> <char> <char> <int>
#> 1: 1 25 W Control Greifswald Validation AMB 1
#> 2: 2 75 M Control Greifswald Validation GEN 1
#> 3: 3 77 W Sepsis Greifswald Validation OTHER 1
#> 4: 3 77 W Sepsis Greifswald Validation OTHER 1
#> 5: 3 77 W Sepsis Greifswald Validation OTHER 1
#> ---
#> 665583: 169140 56 M Control Greifswald Validation GEN 1
#> 665584: 169140 56 M Control Greifswald Validation GEN 1
#> 665585: 169140 56 M Control Greifswald Validation GEN 1
#> 665586: 169141 60 W Control Greifswald Validation ED 1
#> 665587: 169141 60 W Control Greifswald Validation GEN 1
#> Time TargetIcu SecToIcu CRP HGB MCV PCT PLT RBC WBC
#> <num> <char> <num> <num> <num> <num> <num> <int> <num> <num>
#> 1: 0 <NA> NA 15.5 7.0 80.5 NA 264 4.2 8.40
#> 2: 0 <NA> NA 7.4 8.4 87.9 NA 260 4.8 8.47
#> 3: 0 <NA> NA 96.1 4.8 81.7 NA 385 3.0 13.20
#> 4: 318840 <NA> NA 57.0 4.4 82.2 NA 416 2.8 14.20
#> 5: 578640 <NA> NA 93.4 5.7 82.0 0.22 437 3.5 13.80
#> ---
#> 665583: 118380 <NA> NA NA 8.7 88.1 NA 200 4.7 6.20
#> 665584: 168660 <NA> NA 95.0 8.7 88.4 NA 233 4.7 6.92
#> 665585: 340440 <NA> NA 63.6 8.1 87.6 NA 225 4.5 3.70
#> 665586: 0 <NA> NA NA 9.1 90.0 NA 337 4.8 10.80
#> 665587: 70740 <NA> NA 4.7 9.7 91.7 NA 371 5.2 11.60
#> Excluded Label isNewWard isIcuAdmission
#> <lgcl> <char> <lgcl> <lgcl>
#> 1: FALSE Control FALSE FALSE
#> 2: FALSE Control FALSE FALSE
#> 3: TRUE Control FALSE FALSE
#> 4: TRUE Control FALSE FALSE
#> 5: TRUE Control FALSE FALSE
#> ---
#> 665583: FALSE Control FALSE FALSE
#> 665584: FALSE Control FALSE FALSE
#> 665585: FALSE Control FALSE FALSE
#> 665586: FALSE Control FALSE FALSE
#> 665587: FALSE Control TRUE FALSE
## recalculate SecToIcu
greifswald[, SecToIcu := SecToIcu - SecToIcu[isIcuAdmission][Episode]]
#> Id Age Sex Diagnosis Center Set Sender Episode
#> <int> <int> <char> <char> <char> <char> <char> <int>
#> 1: 1 25 W Control Greifswald Validation AMB 1
#> 2: 2 75 M Control Greifswald Validation GEN 1
#> 3: 3 77 W Sepsis Greifswald Validation OTHER 1
#> 4: 3 77 W Sepsis Greifswald Validation OTHER 1
#> 5: 3 77 W Sepsis Greifswald Validation OTHER 1
#> ---
#> 665583: 169140 56 M Control Greifswald Validation GEN 1
#> 665584: 169140 56 M Control Greifswald Validation GEN 1
#> 665585: 169140 56 M Control Greifswald Validation GEN 1
#> 665586: 169141 60 W Control Greifswald Validation ED 1
#> 665587: 169141 60 W Control Greifswald Validation GEN 1
#> Time TargetIcu SecToIcu CRP HGB MCV PCT PLT RBC WBC
#> <num> <char> <num> <num> <num> <num> <num> <int> <num> <num>
#> 1: 0 <NA> NA 15.5 7.0 80.5 NA 264 4.2 8.40
#> 2: 0 <NA> NA 7.4 8.4 87.9 NA 260 4.8 8.47
#> 3: 0 <NA> NA 96.1 4.8 81.7 NA 385 3.0 13.20
#> 4: 318840 <NA> NA 57.0 4.4 82.2 NA 416 2.8 14.20
#> 5: 578640 <NA> NA 93.4 5.7 82.0 0.22 437 3.5 13.80
#> ---
#> 665583: 118380 <NA> NA NA 8.7 88.1 NA 200 4.7 6.20
#> 665584: 168660 <NA> NA 95.0 8.7 88.4 NA 233 4.7 6.92
#> 665585: 340440 <NA> NA 63.6 8.1 87.6 NA 225 4.5 3.70
#> 665586: 0 <NA> NA NA 9.1 90.0 NA 337 4.8 10.80
#> 665587: 70740 <NA> NA 4.7 9.7 91.7 NA 371 5.2 11.60
#> Excluded Label isNewWard isIcuAdmission
#> <lgcl> <char> <lgcl> <lgcl>
#> 1: FALSE Control FALSE FALSE
#> 2: FALSE Control FALSE FALSE
#> 3: TRUE Control FALSE FALSE
#> 4: TRUE Control FALSE FALSE
#> 5: TRUE Control FALSE FALSE
#> ---
#> 665583: FALSE Control FALSE FALSE
#> 665584: FALSE Control FALSE FALSE
#> 665585: FALSE Control FALSE FALSE
#> 665586: FALSE Control FALSE FALSE
#> 665587: FALSE Control TRUE FALSE
## drop helper columns
greifswald[, `:=` (isNewWard = NULL, isIcuAdmission = NULL)]
#> Id Age Sex Diagnosis Center Set Sender Episode
#> <int> <int> <char> <char> <char> <char> <char> <int>
#> 1: 1 25 W Control Greifswald Validation AMB 1
#> 2: 2 75 M Control Greifswald Validation GEN 1
#> 3: 3 77 W Sepsis Greifswald Validation OTHER 1
#> 4: 3 77 W Sepsis Greifswald Validation OTHER 1
#> 5: 3 77 W Sepsis Greifswald Validation OTHER 1
#> ---
#> 665583: 169140 56 M Control Greifswald Validation GEN 1
#> 665584: 169140 56 M Control Greifswald Validation GEN 1
#> 665585: 169140 56 M Control Greifswald Validation GEN 1
#> 665586: 169141 60 W Control Greifswald Validation ED 1
#> 665587: 169141 60 W Control Greifswald Validation GEN 1
#> Time TargetIcu SecToIcu CRP HGB MCV PCT PLT RBC WBC
#> <num> <char> <num> <num> <num> <num> <num> <int> <num> <num>
#> 1: 0 <NA> NA 15.5 7.0 80.5 NA 264 4.2 8.40
#> 2: 0 <NA> NA 7.4 8.4 87.9 NA 260 4.8 8.47
#> 3: 0 <NA> NA 96.1 4.8 81.7 NA 385 3.0 13.20
#> 4: 318840 <NA> NA 57.0 4.4 82.2 NA 416 2.8 14.20
#> 5: 578640 <NA> NA 93.4 5.7 82.0 0.22 437 3.5 13.80
#> ---
#> 665583: 118380 <NA> NA NA 8.7 88.1 NA 200 4.7 6.20
#> 665584: 168660 <NA> NA 95.0 8.7 88.4 NA 233 4.7 6.92
#> 665585: 340440 <NA> NA 63.6 8.1 87.6 NA 225 4.5 3.70
#> 665586: 0 <NA> NA NA 9.1 90.0 NA 337 4.8 10.80
#> 665587: 70740 <NA> NA 4.7 9.7 91.7 NA 371 5.2 11.60
#> Excluded Label
#> <lgcl> <char>
#> 1: FALSE Control
#> 2: FALSE Control
#> 3: TRUE Control
#> 4: TRUE Control
#> 5: TRUE Control
#> ---
#> 665583: FALSE Control
#> 665584: FALSE Control
#> 665585: FALSE Control
#> 665586: FALSE Control
#> 665587: FALSE Control