A Weak-Region Enhanced Bayesian Classification for Spam Content-Based Filtering

Vahid Nosrati, Mohsen Rahmani, Alireza Jolfaei, Sattar Seifollahi

Research output: Contribution to journalArticlepeer-review

8 Citations (Scopus)

Abstract

This article proposes an improved Bayesian scheme by focusing on the region in which Bayesian may fail to correctly identify labels and improve classification performance by handling those errors. Bayesian method, as a probabilistic classifier, uses Bayes' theorem to calculate the probability of an instance belonging to a class, where the class label with a maximum probability is assigned to the instance. In a spam detection problem, it can be considered that the prediction of the Bayesian classifier is weak when the probability obtained for classes spam and non-spam are close to each other. Therefore, we define a threshold to determine weak prediction against strong prediction. A hybrid strategy using a two-layer Bayesian approach is presented: basic Bayesian (BBayes) and corrected weak region Bayesian (CWRBayes), which are concerned with strong and weak predictions, respectively. Both techniques, BBayes and CWRBayes, have the same classification mechanism, but they use different feature selection mechanisms. The proposed methods are implemented and evaluated over two datasets of spam e-mails, and the results show that the proposed method has better performance than the baseline of the naïve Bayesian and some other Bayesian variants.

Original languageEnglish
Article number72
Number of pages18
JournalACM Transactions on Asian and Low-Resource Language Information Processing
Volume22
Issue number3
Early online date11 Jul 2022
DOIs
Publication statusPublished - 2 Apr 2023

Keywords

  • Bayesian
  • feature selection
  • spam detection
  • text classification

Fingerprint

Dive into the research topics of 'A Weak-Region Enhanced Bayesian Classification for Spam Content-Based Filtering'. Together they form a unique fingerprint.

Cite this