Rule-Based Knowledge Discovery via Anomaly Detection in Tabular Data

Asara Senaratne, Peter Christen, Graham Williams, Pouya Ghiasnezhad Omran

Research output: Contribution to journalConference articlepeer-review

44 Downloads (Pure)

Abstract

In this paper, we propose a novel approach to unsupervised detection of abnormal records in tabular data. We first characterize records in a tabular dataset using a set of features and then employ a one-class support vector machine classifier to characterize records as either normal or abnormal. We select the features that are most relevant in characterizing normal and abnormal records and apply clustering to identify groups of records that have similar characteristics according to these features. Using information-based measures, in the final step we identify the purest abnormal clusters to provide a descriptive representation that allows a user to better understand and identify abnormal records in the dataset. We evaluate our approach on datasets from three different domains, historical birth certificates, social network posts, and COVID-19 data. This evaluation demonstrates that our approach is well suited to identify anomalies in tabular data in an unsupervised manner while outperforming the baseline.

Original languageEnglish
Number of pages16
JournalCEUR Workshop Proceedings
Volume3433
Publication statusPublished - 2023
Externally publishedYes
EventAAAI 2023 Spring Symposium on Challenges Requiring the Combination of Machine Learning and Knowledge Engineering, AAAI-MAKE 2023 - San Francisco, United States
Duration: 27 Mar 202329 Mar 2023

Keywords

  • data quality enhancement
  • k-means clustering
  • One-class support vector machine
  • unsupervised learning

Fingerprint

Dive into the research topics of 'Rule-Based Knowledge Discovery via Anomaly Detection in Tabular Data'. Together they form a unique fingerprint.

Cite this