AI/ML-powered URL and email phishing detection
Real-time phishing URL detection · Machine learning classification · Sub-10-second analysis
A full-stack AI-powered phishing detection system that analyses URLs in real time and classifies them as phishing or legitimate using a trained machine learning model. Built with a FastAPI backend, React frontend, and an ML pipeline trained on a 500,000-URL dataset.
Designed as a practical defensive security tool — the kind a SOC analyst or security engineer would actually use to triage suspicious links.
| Metric | Result |
|---|---|
| Training dataset size | 500,000 URLs |
| Detection accuracy | 85% |
| End-to-end analysis time | < 10 seconds |
| False positive rate | Minimised via feature engineering |
- Real-time URL analysis — paste any URL, get a phishing/legitimate verdict instantly
- ML classification — trained on 500K URLs with feature extraction (URL length, special characters, domain age indicators, subdomain depth, HTTPS presence, suspicious keywords)
- FastAPI backend — RESTful API with clean
/predictendpoint - React frontend — clean UI for URL submission and result display
- Confidence scoring — model outputs probability alongside binary classification
Phishing-Detection-Project/
├── backend/ # FastAPI application
│ ├── main.py # API endpoints
│ ├── model/ # Trained ML model files
│ └── utils/ # Feature extraction logic
├── ml_model/ # Model training pipeline
│ ├── train.py # Training script
│ ├── features.py # Feature engineering
│ └── evaluate.py # Model evaluation
├── phishing-detection-frontend/ # React application
│ ├── src/
│ │ ├── App.jsx # Main component
│ │ └── components/ # UI components
│ └── public/
├── requirements.txt # Python dependencies
└── .gitignore
| Layer | Technology |
|---|---|
| ML Model | Scikit-learn / Python |
| Feature Engineering | URL parsing, regex, custom extractors |
| Backend API | FastAPI (Python) |
| Frontend | React.js |
| Dataset | 500,000 labelled URLs |
# Clone the repo
git clone https://github.com/ANIMAALS/Phishing-Detection-Project.git
cd Phishing-Detection-Project
# Install dependencies
pip install -r requirements.txt
# Start FastAPI server
cd backend
uvicorn main:app --reloadAPI will be running at http://localhost:8000
cd phishing-detection-frontend
npm install
npm startFrontend will be running at http://localhost:3000
Endpoint: POST /predict
curl -X POST "http://localhost:8000/predict" \
-H "Content-Type: application/json" \
-d '{"url": "http://suspicious-login.xyz/paypal/verify"}'Response:
{
"url": "http://suspicious-login.xyz/paypal/verify",
"prediction": "phishing",
"confidence": 0.94
}User submits URL
↓
Feature Extraction
· URL length
· Special character count (@, -, //)
· Subdomain depth
· HTTPS presence
· Suspicious keyword match
· Domain structure analysis
↓
ML Model Inference
· Trained on 500K URLs
· Binary classification
· Confidence score output
↓
Result returned to UI (< 10 seconds)
The model was trained on a balanced dataset of 500,000 URLs — 250,000 phishing, 250,000 legitimate. Feature engineering extracts 15+ URL-based characteristics without making external DNS or WHOIS calls, keeping inference fast and offline-capable.
# Retrain the model
cd ml_model
python train.pyAnirudh N.S. — Cybersecurity Student, Dayananda Sagar University, Bengaluru
Part of a cybersecurity project portfolio alongside WatchDog 2.4 and KeyForge.