TY - JOUR
T1 - Comparison of 8 methods for univariate statistical exclusion of pathological subpopulations for indirect reference intervals and biological variation studies
AU - Tan, Rui Zhen
AU - Markus, Corey
AU - Vasikaran, Samuel
AU - Loh, Tze Ping
AU - for the APFCB Harmonization of Reference Intervals Working Group
PY - 2022/5
Y1 - 2022/5
N2 - Background: Indirect reference intervals and biological variation studies heavily rely on statistical methods to separate pathological and non-pathological subpopulations within the same dataset. In recognition of this, we compare the performance of eight univariate statistical methods for identification and exclusion of values originating from pathological subpopulations. Methods: The eight approaches examined were: Tukey's rule with and without Box-Cox transformation; median absolute deviation; double median absolute deviation; Gaussian mixture models; van der Loo (Vdl) methods 1 and 2; and the Kosmic approach. Using four scenarios including lognormal distributions and varying the conditions through the number of pathological populations, central location, spread and proportion for a total of 256 simulated mixed populations. A performance criterion of ± 0.05 fractional error from the true underlying lower and upper reference interval was chosen. Results: Overall, the Kosmic method was a standout with the highest number of scenarios lying within the acceptable error, followed by Vdl method 1 and Tukey's rule. Kosmic and Vdl method 1 appears to discriminate better the non-pathological reference population in the case of log-normal distributed data. When the proportion and spread of pathological subpopulations is high, the performance of statistical exclusion deteriorated considerably. Discussions: It is important that laboratories use a priori defined clinical criteria to minimise the proportion of pathological subpopulation in a dataset prior to analysis. The curated dataset should then be carefully examined so that the appropriate statistical method can be applied.
AB - Background: Indirect reference intervals and biological variation studies heavily rely on statistical methods to separate pathological and non-pathological subpopulations within the same dataset. In recognition of this, we compare the performance of eight univariate statistical methods for identification and exclusion of values originating from pathological subpopulations. Methods: The eight approaches examined were: Tukey's rule with and without Box-Cox transformation; median absolute deviation; double median absolute deviation; Gaussian mixture models; van der Loo (Vdl) methods 1 and 2; and the Kosmic approach. Using four scenarios including lognormal distributions and varying the conditions through the number of pathological populations, central location, spread and proportion for a total of 256 simulated mixed populations. A performance criterion of ± 0.05 fractional error from the true underlying lower and upper reference interval was chosen. Results: Overall, the Kosmic method was a standout with the highest number of scenarios lying within the acceptable error, followed by Vdl method 1 and Tukey's rule. Kosmic and Vdl method 1 appears to discriminate better the non-pathological reference population in the case of log-normal distributed data. When the proportion and spread of pathological subpopulations is high, the performance of statistical exclusion deteriorated considerably. Discussions: It is important that laboratories use a priori defined clinical criteria to minimise the proportion of pathological subpopulation in a dataset prior to analysis. The curated dataset should then be carefully examined so that the appropriate statistical method can be applied.
KW - Biological variation
KW - Data mining
KW - Indirect approach
KW - Outlier
KW - Outlier exclusion
KW - Reference intervals
UR - http://www.scopus.com/inward/record.url?scp=85124820863&partnerID=8YFLogxK
U2 - 10.1016/j.clinbiochem.2022.02.006
DO - 10.1016/j.clinbiochem.2022.02.006
M3 - Article
AN - SCOPUS:85124820863
SN - 0009-9120
VL - 103
SP - 16
EP - 24
JO - Clinical Biochemistry
JF - Clinical Biochemistry
ER -