The Data type mismatch indicator be calculated using int_datatype_matrix in the following way:

# Load dataquieR
library(dataquieR)

# Load data
sd1 <- prep_get_data_frame("ship")
# sd1 <- as.data.frame(sd1) # "untibble" it 

# Load metadata
file_name <- system.file("extdata", "ship_meta_v2.xlsx", package = "dataquieR")
prep_load_workbook_like_file(file_name)
meta_data_item <- prep_get_data_frame("item_level") # item_level is a sheet in ship_meta_v2.xlsx

# Apply indicator function
datatype_res <- int_datatype_matrix(
  study_data = sd1, 
  meta_data = meta_data_item, 
  label_col = "LONG_LABEL"
)

A plot and a table are provided to view the results:

datatype_res$SummaryPlot

datatype_res$SummaryData
Variables MATCH STUDY_SEGMENT
22 Participant ID Matching datatype INTRO
14 Examination date and time Matching datatype INTRO
23 Sex Matching datatype INTRO
1 Age Matching datatype INTRO
4 Blood pressure examiner Matching datatype SOMATOMETRY
3 Blood pressure device ID Matching datatype SOMATOMETRY
26 Systolic blood pressure 1 Matching datatype SOMATOMETRY
27 Systolic blood pressure 2 Matching datatype SOMATOMETRY
9 Diastolic blood pressure 1 Matching datatype SOMATOMETRY
10 Diastolic blood pressure 2 Matching datatype SOMATOMETRY
25 Somatometry examiner Matching datatype SOMATOMETRY
5 Body height Matching datatype SOMATOMETRY
6 Body height scale ID Matching datatype SOMATOMETRY
7 Body weight Matching datatype SOMATOMETRY
8 Body weight scale ID Matching datatype SOMATOMETRY
29 Waist circumference Non-matching datatype SOMATOMETRY
17 Interview examiner Matching datatype INTERVIEW
16 Highest educational level Matching datatype INTERVIEW
20 Marital status Matching datatype INTERVIEW
24 Smoking status Matching datatype INTERVIEW
12 Ever had stroke Matching datatype INTERVIEW
11 Ever had myocardial infarction Matching datatype INTERVIEW
18 Known diabetes Matching datatype INTERVIEW
2 Age of diabetes onset Matching datatype INTERVIEW
13 Ever taken birth control pills Matching datatype INTERVIEW
21 Monthly household income Matching datatype INTERVIEW
15 HDL-cholesterol Matching datatype LABORATORY
19 LDL-cholesterol Matching datatype LABORATORY
28 Total cholesterol Matching datatype LABORATORY


All datatype issues found by int_datatype_matrix should be checked data element by data element. For instance, a major issue was found in the variable WAIST_CIRC_0. This variable is in the study data with datatype character, which differs from the expected datatype float defined in the metadata. Some basic checks show the misuse of commas as the decimal delimiter.

int_inspect_char(sd1$waist)
Character Count
, 3
. 2144
0 933
1 1443
2 908
3 884
4 889
5 898
6 1018
7 1279
8 1355
9 1409
NA 3


To correct this issue, converting WAIST_CIRC_0 to datatype numeric will coerce respective values to NA’s, which should be avoided. Hence, we replace the comma with the correct delimiter and correct the datatype without losing data values. The resulting applicability plot shows no more issues.

# replace comma with the correct delimiter
sd1$waist <- as.numeric(gsub(",", ".", sd1$waist))

int_datatype_matrix(
  study_data = sd1, 
  meta_data = meta_data_item, 
  label_col = "LONG_LABEL"
)$SummaryPlot