TY - JOUR
T1 - Machine learning approaches for automatic cleaning of investigative drilling data
AU - Huang, Fei
AU - Qin, Hongyu
AU - Manafi, Masoud
AU - Juett, Ben
AU - Evans, Ben
PY - 2025/9/26
Y1 - 2025/9/26
N2 - Investigative drilling (ID) is an innovative measurement while drilling (MWD) technique implemented in various site investigation projects across Australia. While the automated drilling feature of ID substantially reduces noise in drilling data streams, data cleaning remains essential to remove anomalies for accurate strata classification and prediction of soil and rock properties. This study employed three machine learning algorithms – IsoForest, one-class SVM and DBSCAN – to automate the data cleaning process for ID rock drilling data. Two contexts were examined: (1) removing anomalies in rock drilling data, and (2) removing both anomalies and soil data in mixed rock drilling data. The analysis revealed that all three algorithms outperformed traditional statistical methods (the 3σ rule and IQR method) in both tasks, achieving a good balance between true and false positive rates, though hyperparameter tuning was required for one-class SVM and DBSCAN. Among them, IsoForest proved to be the best-performing algorithm, effectively removing anomalies without hyperparameter adjustment. Furthermore, IsoForest, combined with two-cluster K-means, eliminated both soil data and anomalies while preserving nearly all normal data. This strategy provides an efficient solution to reduce manual cleaning effort and enable the creation of large-scale, high-quality datasets for machine learning analysis of ID data.
AB - Investigative drilling (ID) is an innovative measurement while drilling (MWD) technique implemented in various site investigation projects across Australia. While the automated drilling feature of ID substantially reduces noise in drilling data streams, data cleaning remains essential to remove anomalies for accurate strata classification and prediction of soil and rock properties. This study employed three machine learning algorithms – IsoForest, one-class SVM and DBSCAN – to automate the data cleaning process for ID rock drilling data. Two contexts were examined: (1) removing anomalies in rock drilling data, and (2) removing both anomalies and soil data in mixed rock drilling data. The analysis revealed that all three algorithms outperformed traditional statistical methods (the 3σ rule and IQR method) in both tasks, achieving a good balance between true and false positive rates, though hyperparameter tuning was required for one-class SVM and DBSCAN. Among them, IsoForest proved to be the best-performing algorithm, effectively removing anomalies without hyperparameter adjustment. Furthermore, IsoForest, combined with two-cluster K-means, eliminated both soil data and anomalies while preserving nearly all normal data. This strategy provides an efficient solution to reduce manual cleaning effort and enable the creation of large-scale, high-quality datasets for machine learning analysis of ID data.
KW - DBSCAN
KW - Investigative drilling
KW - IsoForest
KW - machine learning
KW - measurement while drilling
KW - one-class SVM
UR - http://www.scopus.com/inward/record.url?scp=105017896153&partnerID=8YFLogxK
U2 - 10.1080/17486025.2025.2566311
DO - 10.1080/17486025.2025.2566311
M3 - Article
AN - SCOPUS:105017896153
SN - 1748-6025
JO - Geomechanics and Geoengineering
JF - Geomechanics and Geoengineering
ER -