Bayesian Method with Clustering Algorithm for Credit Card Transaction Fraud Detection

Luis Jose S. Santos (lixsnx@gmail.com)
De La Salle University, Philippines
Shirlee R. Ocampo (shirlee.ocampo@dlsu.edu.ph)
De La Salle University, Manila, Philippines

Abstract

Card transaction fraud is prevalent anywhere. Even with current preventive measures like the Europay, MasterCard and Visa (EMV) chips, possible weaknesses or loopholes can be exploited by fraudsters. This paper explores Naïve Bayes Classifier and clustering algorithms to detect fraud in credit card transactions. Data on fraud labels and arrival times of transactions were simulated by the Markov Modulated Poisson Process. Amounts of transactions for genuine and fraud transactions were simulated based on two Gaussian distributions. Kinds of spenders and types of fraudsters serve as the bases for the parameters used in the simulation of the data. Using the simulated data, EM clustering algorithm with three different initializations and K-means were applied to cluster transaction amounts into high, medium and low. The Naïve Bayes classifier algorithm was then applied to classify the transactions as good or fraud for the simulated data of 9 types of fraudsters across all clustering algorithms. Simulations and analyses were done using R software. Results include comparisons of true positive rates, false positive rates, and detection accuracies among the nine types of fraudsters across all clustering algorithms. For 3 clusters, (high, medium, low transaction amounts), the Naïve Bayes Method with clustering algorithms resulted to an average of 76% true positive (TP) detection, 18% false positive (FP) detection, with an overall accuracy of 81%. The same averages of TP, FP, and overall accuracy were obtained using 2 clusters (high, and low). EM clustering algorithm generated TP, FP, and overall accuracy of 80%, 16%, and 83% respectively.

Keywords: credit card transaction, fraud detection, Naïve Bayes Classifier, clustering algorithms, Markov Modulated Poisson Process data simulation
JEL Classification: C11, C39

[Full Text]

Romanian Statistical Review 1/2018