Univariate outliers are assessed based on statistical criteria. The function acc_robust_univariate_outlier identifies outliers according to the approaches of Tukey, 3SD, Hubert, and the heuristic approach of SigmaGap. It may be called as follows:

# Load dataquieR
library(dataquieR)

# Load data
sd1 <- prep_get_data_frame("ship")

# Load metadata
file_name <- system.file("extdata", "ship_meta_v2.xlsx", package = "dataquieR")
prep_load_workbook_like_file(file_name)
meta_data_item <- prep_get_data_frame("item_level") # item_level is a sheet in ship_meta_v2.xlsx

# Apply indicator function
UnivariateOutlier <- acc_robust_univariate_outlier(
  study_data = sd1,
  meta_data = meta_data_item,
  label_col = "LABEL"
)

The first output is a table that provides descriptive statistics and detected outliers according to the different criteria:

UnivariateOutlier$SummaryTable
Variables Mean SD Median Skewness Tukey (N) 6-Sigma (N) Hubert (N) Sigma-gap (N) Most likely (N) To low (N) To high (N) GRADING
ID 5431.06 1236.17 5428.50 0.00 0 0 0 0 0 0 0 0
AGE_0 49.87 16.18 50.00 -0.02 0 0 0 0 0 0 0 0
SBP_0.1 138.25 21.25 137.00 0.06 8 0 4 0 0 0 0 0
SBP_0.2 135.87 20.89 134.00 0.09 10 5 3 0 0 0 0 0
DBP_0.1 84.43 11.43 84.00 0.00 17 12 15 1 1 0 1 1
DBP_0.2 83.52 11.52 83.00 0.04 17 10 10 1 1 0 1 1
BODY_HEIGHT_0 168.22 9.25 168.00 0.00 1 1 1 0 0 0 0 0
BODY_WEIGHT_0 77.63 15.08 77.04 0.01 17 10 15 0 0 0 0 0
DIAB_AGE_ONSET_0 53.68 13.33 55.00 0.00 5 3 5 0 0 0 0 0
CHOLES_HDL_0 1.45 0.44 1.39 0.13 33 17 18 2 2 0 2 1
CHOLES_LDL_0 3.58 1.13 3.52 0.02 21 13 18 0 0 0 0 0
CHOLES_ALL_0 5.76 1.20 5.68 0.06 23 12 17 0 0 0 0 0


There are outliers according to at least two criteria in most variables, but only for the diastolic blood pressure variables (DBP_0.1 and DBP_0.2) two outliers have been detected using the Sigma-gap criterion.

To obtain a better insight on univariate distributions, a plot is provided (call it with UnivariateOutlier$SummaryPlotList). It highlights observations for each variable according to the number of violated rules (only the first four are shown here):

pl <- UnivariateOutlier$SummaryPlotList

invisible(lapply(head(pl, 4), print))