This project analyzes credit card transaction data to detect fraudulent activities using rule-based methods in SQL.
-
Source: Credit Card Transactions Dataset
-
Size used: Sample 50,000 transactions
-
Includes customer ID, transaction, amount, location, and fraud label
Fraud Distribution
- Fraud accounts for 0.9% of all transactions
- Highly imbalanced dataset
- Fraudulent transactions have significantly higher values
- Average fraud transaction is ~8x larger than normal
Rule 1: Amount Anomaly
- Flag transactions > 3x customer average
- Recall β 46%
- Precision β 9%
Rule 2: Rapid Transactions
- Flag transactions within short time intervals
- Minimal impact on fraud detection
Combined Rules
- Recall β 47%
- Precision β 9%
- Fraud is strongly associated with unusually high transaction amounts
- Transaction frequency is not a strong fraud indicator in this dataset
- Rule-based detection produces many false positives
- Monitor high-value transactions in real-time
- Combine multiple behavioral signals
- Implement machine learning models for improved accuracy
- SQL (MySQL)
- Python (Pandas)
βββ data/ βββ sql/ β βββ fraud_detection.sql βββ README.md