This approach considers a contradiction if impossible combinations of data are observed in one participant. For example, if age of a participant is recorded repeatedly the value of age is (unfortunately) not able to decline. Most cases of contradictions rest on comparison of two variables.
Important to note, each value that is used for comparison may represent a possible characteristic but the combination of these two values is considered to be impossible. The approach does not consider implausible or inadmissible values.
ALGORITHM OF THIS IMPLEMENTATION:
CONTRADICTIONS
)Data from the package dataquieR
are loaded as shown
below:
load(system.file("extdata", "study_data.RData", package = "dataquieR"))
sd1 <- study_data
This example of study data has N=3000 observations. Study data variables have abstract and non-interpretable names; appropriate labels must be mapped from the metadata. Nonetheless, the study comprise the following characteristics:
v00000 | v00001 | v00002 | v00003 | v00004 | v00005 | v01003 | v01002 | v00103 | v00006 |
---|---|---|---|---|---|---|---|---|---|
3 | LEIIX715 | 0 | 49 | 127 | 77 | 49 | 0 | 40-49 | 3.8 |
1 | QHNKM456 | 0 | 47 | 114 | 76 | 47 | 0 | 40-49 | 1.9 |
1 | HTAOB589 | 0 | 50 | 114 | 71 | 50 | 0 | 50-59 | 0.8 |
5 | HNHFV585 | 0 | 48 | 120 | 65 | 48 | 0 | 40-49 | 3.8 |
1 | UTDLS949 | 0 | 56 | 119 | 78 | 56 | 0 | 50-59 | 4.1 |
5 | YQFGE692 | 1 | 47 | 133 | 81 | 47 | 1 | 40-49 | 9.5 |
1 | AVAEH932 | 0 | 53 | 114 | 78 | 53 | 0 | 50-59 | 5.0 |
3 | QDOPT378 | 1 | 48 | 116 | 86 | 48 | 1 | 40-49 | 9.6 |
3 | BMOAK786 | 0 | 44 | 115 | 71 | 44 | 0 | 40-49 | 2.0 |
5 | ZDKNF462 | 0 | 50 | 116 | 74 | 50 | 0 | 50-59 | 2.4 |
Data from the package dataquieR
are loaded as shown
below:
load(system.file("extdata", "meta_data.RData", package = "dataquieR"))
md1 <- meta_data
Information corresponding to the study data is kept in the table of static metadata. An interpretable label for each variable is also attached. Besides data type and labels of all variables further expected characteristics are stored in the metadata.
Regarding the following implementation the columns
CONTRADICTIONS
as well as MISSING_LIST
,
VALUE_LABELS
, and HARD_LIMITS
in the metadata
are particularly relevant.
The column of CONTRADICTION
contains only IDs for
explicit contradictions. Respective definition can be done in the
metadata but we recommend the use of an associated ShinyApp
(Chang et al. 2018,
Potter et al. 2016). See also
[Definition of contradictions].
VAR_NAMES | LABEL | MISSING_LIST | VALUE_LABELS | HARD_LIMITS | CONTRADICTIONS | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
3 | v00002 | SEX_0 | NA | 0 = females | 1 = males | NA | 1002 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
4 | v00003 | AGE_0 | NA | NA | [18;Inf) ] | 3 = (30, Inf] | [1;3] | 1009 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
16 | v00010 | ARM_CUFF_0 | 99980 | 99987 | 1 = (-Inf,20] | 2 = (20,30] | 3 = (30, Inf] | [1;3] | 1009 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
19 | v00013 | EXAM_DT_0 | NA | NA |
[2018-01-01 00:00:00 CET;) ]) a_lev <- unlist(strsplit(a_lev, SPLIT_CHAR, fixed = TRUE)) a_lev <- trimws(a_lev)
} # Never called, just for documentation. return(list( # nocov start FlaggedStudyData = summary_df1, SummaryTable = st1, SummaryData = summary_df2, SummaryPlot = p )) # nocov end }
OUTPUTOutput 1: FlaggedStudyData This implementation returns four objects. The dataframe FlaggedStudyData flags each observation in the study data that has one or more contradictions between different variables. For each applied check on the variables an additional column (names with the ID of the check) is added. The object can be accessed via AnyContradictions$FlaggedStudyData. Output 2: Summary table 1 The second output of the contradiction function is a data frame which
summarizes the no. of contradictions for each variable that has been
examined. This object is primarily used by the dataquieR-function
Output 3: Summary table 2 The third output summarizes this information quite similarly but also names the applied checks. This output can be used to provide an executive overview on the amount of contradictions.
Output 4: Summary plot The fourth output visualizes summarized information of output 2 and 3.
INTERPRETATIONAny contradiction in the study data should be resolved by appropriate data curation steps. Concept relations
Chang, W., Cheng, J., Allaire, J., Xie, Y., McPherson, J., et al.
(2018). Shiny: Web application framework for r, 2015. R Package Version
1, 14.
Potter, G., Wong, J., Alcaraz, I., Chi, P., et al. (2016). Web
application teaching tools for statistics using r and shiny. Technology
Innovations in Statistics Education 9.
|