Abstract
This article proposes an improved Bayesian scheme by focusing on the region in which Bayesian may fail to correctly identify labels and improve classification performance by handling those errors. Bayesian method, as a probabilistic classifier, uses Bayes' theorem to calculate the probability of an instance belonging to a class, where the class label with a maximum probability is assigned to the instance. In a spam detection problem, it can be considered that the prediction of the Bayesian classifier is weak when the probability obtained for classes spam and non-spam are close to each other. Therefore, we define a threshold to determine weak prediction against strong prediction. A hybrid strategy using a two-layer Bayesian approach is presented: basic Bayesian (BBayes) and corrected weak region Bayesian (CWRBayes), which are concerned with strong and weak predictions, respectively. Both techniques, BBayes and CWRBayes, have the same classification mechanism, but they use different feature selection mechanisms. The proposed methods are implemented and evaluated over two datasets of spam e-mails, and the results show that the proposed method has better performance than the baseline of the naïve Bayesian and some other Bayesian variants.
Original language | English |
---|---|
Article number | 72 |
Number of pages | 18 |
Journal | ACM Transactions on Asian and Low-Resource Language Information Processing |
Volume | 22 |
Issue number | 3 |
Early online date | 11 Jul 2022 |
DOIs | |
Publication status | Published - 2 Apr 2023 |
Keywords
- Bayesian
- feature selection
- spam detection
- text classification