Loading Tree…

Definition

Observed data values do not comply with admissible data values or value ranges.

Explanation

Checks related to range and value violations compare observed single data values against any defined set of admissible values or a range of values. Any variable type may be targeted by related checks.

The notion of admissibility/inadmissibility is preferred over possibilitiy/impossibility of values because the latter may be too strong. For example, in history there may have been adults weighing more than 500kg. However in a study, the range of admissible values may rather reflect what seems an appropriate limit in the context of the study and a much lower limit may be chosen. For example, an upper inadmissibility limit for weight is currently set to 250kg in the population-based German Study of Health in Pomerania.

Example

Example 1: Systolic blood pressure

For a systolic blood pressure measurement no values below 10mmHg or above 300mmHg are permitted. The permitted range [10;300] is annotated in metadata, against which data vlaues are compared. Implementations of the related “Inadmissible numerical value” indicator count the number and percent of violations of this rule.

Example 2: Categorical variable back pain yes/no

For a binary variable on “back pain today”, the answers 0=no; 1=yes; 9=no response are allowed.

However in 5 out of 1000 observations a “2” was coded.

Guidance

Any violation of an admissibility rule triggers a data cleaning process with the intention to replace the wrong value by the correct one. If this is not possible affected observations should at least be flagged to be adequately treated during analyses.

Performing inadmissibility checks does not safeguard against any errors within the allowed range. Such issues are addressed by indicators within the accuracy dimension.

Literature

  • Stausberg, J., D. Nasseh and M. Nonnemacher (2015). “Measuring data quality: A review of the literature between 2005 and 2013.” Stud Health Technol Inform 210: 712-716.

  • Weiskopf, N. G. and C. Weng (2013). “Methods and dimensions of electronic health record data quality assessment: enabling reuse for clinical research.” J Am Med Inform Assoc 20(1): 144-151.

  • Weiskopf NG, Bakken S, Hripcsak G, Weng C. A Data Quality Assessment Guideline for Electronic Health Record Data Reuse. EGEMS (Wash DC). 2017;5(1):14.

  • Lee K, Weiskopf N, Pathak J. A framework for data quality assessment in clinical research datasets. AMIA Annu Symp Proc 2017;2017:1080-9.

  • Kahn MG, Callahan TJ, Barnard J, et al. A Harmonized Data Quality Assessment Terminology and Framework for the Secondary Use of Electronic Health Record Data. EGEMS (Wash DC). 2016;4(1):1244.

  • https://www.ibm.com/support/knowledgecenter/SSQNUZ_2.5.0/cpd/organize/quality_violations.html#quality_violations__class