Machine learning approaches for automatic cleaning of investigative drilling data

Fei Huang, Hongyu Qin, Masoud Manafi, Ben Juett, Ben Evans

Research output: Contribution to journalArticlepeer-review

9 Downloads (Pure)

Abstract

Investigative drilling (ID) is an innovative measurement while drilling (MWD) technique implemented in various site investigation projects across Australia. While the automated drilling feature of ID substantially reduces noise in drilling data streams, data cleaning remains essential to remove anomalies for accurate strata classification and prediction of soil and rock properties. This study employed three machine learning algorithms – IsoForest, one-class SVM and DBSCAN – to automate the data cleaning process for ID rock drilling data. Two contexts were examined: (1) removing anomalies in rock drilling data, and (2) removing both anomalies and soil data in mixed rock drilling data. The analysis revealed that all three algorithms outperformed traditional statistical methods (the 3σ rule and IQR method) in both tasks, achieving a good balance between true and false positive rates, though hyperparameter tuning was required for one-class SVM and DBSCAN. Among them, IsoForest proved to be the best-performing algorithm, effectively removing anomalies without hyperparameter adjustment. Furthermore, IsoForest, combined with two-cluster K-means, eliminated both soil data and anomalies while preserving nearly all normal data. This strategy provides an efficient solution to reduce manual cleaning effort and enable the creation of large-scale, high-quality datasets for machine learning analysis of ID data.
Original languageEnglish
Number of pages19
JournalGeomechanics and Geoengineering
DOIs
Publication statusE-pub ahead of print - 26 Sept 2025

Keywords

  • DBSCAN
  • Investigative drilling
  • IsoForest
  • machine learning
  • measurement while drilling
  • one-class SVM

Fingerprint

Dive into the research topics of 'Machine learning approaches for automatic cleaning of investigative drilling data'. Together they form a unique fingerprint.

Cite this