Introduction

In an epidemiological study, data may be grouped according to the different examinations, such as laboratory, bloood pressure or ultrasound measurements. The corresponding metadata to describe single segments is termed segment level.

How `dataquieR` uses segment level metadata

To analyze data quality at the segment level, the item level must include information about which variable corresponds to each segment in the column labelled STUDY_SEGMENT.

Segment level metadata for data quality reporting

STUDY_SEGMENT

This column includes the name of the study segment (as strings), defined for each variable.

SEGMENT_RECORD_COUNT

Specifies the number of expected data records in each study segment. The value must be an integer. The check will only be conducted if a number is entered.

For example, the data frame level count metadata may be:

STUDY_SEGMENT	SEGMENT_RECORD_COUNT
STUDY	3000
PHYS_EXAM	2000
LAB	1990
INTERVIEW	3000
QUESTIONNAIRE	2981

SEGMENT_ID_TABLE

The name of the table containing the reference IDs to be compared with the IDs in the targeted segment. The input must be a string and can refer to a spreadsheet in the same or another workbook or an URL.

In the example below, for the first four segments, the IDs are specified in the sheet called expected_ids of the same workbook. In contrast, the IDs for PART_QUESTIONNAIRE are provided in the pseudo_id sheet of the questionnaire_data.xlsx workbook. Since this is a different workbook, its path must be specified.

STUDY_SEGMENT	SEGMENT_ID_TABLE
STUDY	expected_id
PHYS_EXAM	expected_id
LAB	expected_id
INTERVIEW	expected_id
QUESTIONNAIRE	d:/data/questionnaire_data.xlsx \| pseudo_id

SEGMENT_RECORD_CHECK

A string that sets the type of check to be conducted when comparing the reference ID table with the IDs in a segment. Two checks are possible:

exact: tests for an exact match between SEGMENT_ID_REF_TABLE and the IDs in STUDY_SEGMENT, or
subset: expects that the IDs in STUDY_SEGMENT are a subset of SEGMENT_ID_REF_TABLE.

For instance, the PART_STUDY, PART_INTERVIEW and PART_QUESTIONNAIRE may comprise all participants from a study, while particular sections, such as PART_PHYS_EXAM and PART_LAB, may have only been collected from a smaller participant sample:

STUDY_SEGMENT	SEGMENT_RECORD_CHECK
STUDY	exact
PHYS_EXAM	subset
LAB	subset
INTERVIEW	exact
QUESTIONNAIRE	exact

SEGMENT_ID_VARS

Defines all variables to be used as one single ID variable (a combined key) in a segment. The list of variables must be a string in which each variable is separated by a pipe character (|).

For example, the ID for PART_PHYS_EXAM is defined by a combined key specified by a list of variables, where the key consist of the “PSEUDO_ID” and “CENTER_0” variables. For the rest of the variables, the ID is specified by the variable “v00001”:

STUDY_SEGMENT	SEGMENT_ID_VARS
STUDY	v00001
PHYS_EXAM	PSEUDO_ID \| CENTER_0
LAB	v00001
INTERVIEW	v00001
QUESTIONNAIRE	v00001

SEGMENT_UNIQUE_ROWS

Specifies whether identical data is permitted across rows in a segment (excluding ID variables). The input is a Boolean, meaning:

false: allow repeated rows, or
true: rows must be unique.

For instance, row repetitions may be allowed for PART_PHYS_EXAM and PART_LAB but not for the rest of the segments.

STUDY_SEGMENT	SEGMENT_UNIQUE_ROWS
STUDY	true
PHYS_EXAM	false
LAB	false
INTERVIEW	true
QUESTIONNAIRE	true

SEGMENT_PART_VARS

Provides the name of the variable that indicates participation in the respective segment. For instance:

STUDY_SEGMENT	SEGMENT_PART_VARS
STUDY	seg_study_part
PHYS_EXAM	seg_phys_exam_part
LAB	seg_lab_part
INTERVIEW	seg_interview_part
QUESTIONNAIRE	seg_questionnaire_part

In the study data, each segment participation variable contains participation and missing codes (e.g., -10000, 99980, 99981). If interpretation codes are provided in a separate table (e.g., segment_missing_table), the participation codes allow the calculation of qualified missingnes rates per segment.

Definition and use of segment level metadata

Introduction

How `dataquieR` uses segment level metadata

Segment level metadata for data quality reporting

STUDY_SEGMENT

SEGMENT_RECORD_COUNT

SEGMENT_ID_TABLE

SEGMENT_RECORD_CHECK

SEGMENT_ID_VARS

SEGMENT_UNIQUE_ROWS

SEGMENT_PART_VARS

Back to Metadata

Definition and use of segment level metadata

Introduction

How dataquieR uses segment level metadata

Segment level metadata for data quality reporting

STUDY_SEGMENT

SEGMENT_RECORD_COUNT

SEGMENT_ID_TABLE

SEGMENT_RECORD_CHECK

SEGMENT_ID_VARS

SEGMENT_UNIQUE_ROWS

SEGMENT_PART_VARS

Back to Metadata

How `dataquieR` uses segment level metadata