Evaluating quality involves different steps depending on the complexity of the study data. For instance, in large observational studies, data may be collected across databases, and many examinations may be conducted where participation varies. Depending on the formal data structure, different analysis options may apply. To illustrate such cases, we use data from the Study of Health in Pomerania, START cohort, baseline examination (SHIP-START-0, 1997-2001). For further information on SHIP see Völzke et al. 2010. The example data and metadata are available here. See the introductory tutorial for instructions on importing these files into R, as well as details on their structure and contents.

Reporting data quality for SHIP can be divided into:

Integrity
- Structural data set error
- Data set combination error
  - Data record mismatch
  - Data element mismatch
- Value format error
  - Data type mismatch
  - Inhomogeneous value formats
  - Uncertain missingness status
Completeness
- Crude Missingness
  - Missing values
- Qualified missingness
  - Missing due to specified reason
  - Missingness rates (Non-response, Refusal and Drop-Out)
Consistency
- Range and value violations
  - Inadmissible or Uncertain numerical or time-date values
  - Inadmissible categorical values
  - Inadmissible standardized vocabulary
  - Inadmissible precision
- Contradictions
  - Logical or empirical contradictions
Accuracy
- Unexpected distributions
- Unexpected associations
  - Unexpected association strength
  - Unexpected association direction
  - Unexpected association form
- Disagreement of repeated measurements
  - Intra-Class reliability
  - Inter-Class reliability
  - Disagreement with gold standard

See the metadata tutorial for an overview of the needed metadata in each case.

Example data quality assessment of SHIP data

Back to Overview