Predicting Credit Card Fraud using Support Vector Machine

Introduction

Credit card fraud remains one of the most serious threats in digital payments, resulting in billions of dollars in annual losses worldwide. As fraudsters continually evolve with sophisticated techniques, traditional rule-based systems struggle to detect novel and subtle attack patterns.

Machine learning, particularly classification algorithms, has become essential for proactive fraud detection by learning complex patterns from historical transaction data.
This experiment investigates the application of Support Vector Machines (SVM) — a powerful supervised learning algorithm renowned for its effectiveness in high-dimensional spaces and strong theoretical foundation — to the task of credit card fraud detection.

Support Vector Machine (SVM)

SVM constructs an optimal hyperplane that separates classes while maximizing the margin — the distance to the nearest points (support vectors) from each class.

Key Concepts
  • The data points closest to the hyperplane are called support vectors. Only these points influence the position and orientation of the final hyperplane.
  • Maximizing the margin improves generalization performance on unseen data.
  • For datasets that are not perfectly separable, SVM introduces slack variables (ξᵢ) and a regularization parameter C to allow controlled misclassifications (soft margin SVM).

Primal Optimization Problem (Soft Margin SVM):

min (1/2) ||w||² + C Σ(i=1 to n) ξᵢ

Subject to:
yᵢ (wᵀ φ(xᵢ) + b) ≥ 1 − ξᵢ and ξᵢ ≥ 0 ∀i

Why SVM is Effective for Fraud Detection
  • Naturally handles high-dimensional feature spaces (common in anonymized transaction datasets)
  • Robust to noise and outliers when properly regularized
  • Performs well with relatively small numbers of positive (fraud) examples
  • Offers strong theoretical guarantees on generalization

Linear vs. Non-Linear SVM

Type When to Use Decision Boundary
Linear SVM Classes nearly separable by a straight line Straight hyperplane
Non-Linear SVM Real-world fraud: complex, overlapping patterns Curved/flexible boundary

Fraud data is almost always non-linearly separable → kernel SVM is essential.

Kernel Functions (The Kernel Trick)

Kernel Formula Best For / Characteristics
Linear K(xᵢ, xⱼ) = xᵢᵀ xⱼ Fast, good when data is almost linearly separable
Polynomial K(xᵢ, xⱼ) = (γ xᵢᵀ xⱼ + r)ᵈ Captures polynomial patterns; d = degree
RBF / Gaussian **`K(xᵢ, xⱼ) = exp(−γ
Sigmoid K(xᵢ, xⱼ) = tanh(γ xᵢᵀ xⱼ + r) Rarely used in practice

RBF kernel almost always wins in credit card fraud tasks.

Important Hyperparameters

Parameter Role Effect of Values
C Controls penalty for misclassification Small C → softer margin, more tolerant
Large C harder margin, less tolerant (can overfit)
γ (gamma) How far the influence of a single training point reaches Small γ smooth boundary
Large γ tight, complex boundary (overfitting risk)

Optimal (C, γ) are usually found via grid/random search with cross-validation.

Challenges in Credit Card Fraud Detection

  • Extreme class imbalance (fraud < 0.1–1% of transactions)
  • Concept drift (fraud patterns change over time)
  • Need for very high recall on fraud class while keeping false positives low

Because of imbalance, accuracy is misleading. Preferred metrics are:

  • Precision, Recall, F1-score (especially on the fraud class)
  • Precision-Recall AUC (PR-AUC) — more informative than ROC-AUC in highly imbalanced settings
  • Confusion Matrix and Cost-sensitive evaluation