AI TraceFinder is a forensic machine learning platform that identifies the source scanner device used to digitize a document or image. Each scanner (brand/model) introduces unique noise, texture, and compression artifacts that serve as a fingerprint. By analyzing these patterns, AI TraceFinder enables fraud detection, authentication, and forensic validation in scanned documents.
- Collect and label scanned document datasets from multiple scanners
- Robust preprocessing (resize, grayscale, normalize, denoise)
- Extract scanner-specific features (noise, FFT, PRNU, texture descriptors)
- Train classification models (ML + CNN)
- Apply explainability tools (Grad-CAM, SHAP)
- Deploy an interactive app for scanner source identification
- Deliver accurate, interpretable results for forensic and legal use cases
-
Data Collection & Labeling
- Gather scans from 3–5 scanner models/brands
- Create a structured, labeled dataset
-
Preprocessing
- Resize, grayscale, normalize
- Optional: denoise to highlight artifacts
-
Feature Extraction
- PRNU patterns, FFT, texture descriptors (LBP, edge features)
-
Model Training
- Baseline ML: SVM, Random Forest, Logistic Regression
- Deep Learning: CNN with augmentation
-
Evaluation & Explainability
- Metrics: Accuracy, F1-score, Confusion Matrix
- Interpretability: Grad-CAM, SHAP feature maps
-
Deployment
- Streamlit app → upload scanned image → predict scanner model
- Display confidence score and key feature regions
- Source Attribution: Identify which scanner created a scanned copy of a document.
- Fraud Detection: Detect forgeries where unauthorized scanners were used.
- Legal Verification: Validate whether scanned evidence originated from approved devices.
- Tamper Resistance: Differentiate between authentic vs. tampered scans.
- Explainability: Provide visual evidence of how classification was made.
Input ➜ Preprocessing ➜ Feature Extraction + Modeling ➜ Evaluation & Explainability ➜ Prediction App
- W1: Dataset collection (min. 3–5 scanners), labeling, metadata analysis
- W2: Preprocessing pipeline (resize, grayscale, normalize, optional denoise)
- W3: Feature extraction (noise maps, FFT, LBP, texture descriptors)
- W4: Baseline ML models (SVM, RF, Logistic Regression) + evaluation
- W5: CNN model training with augmentation, hyperparameter tuning
- W6: Model evaluation (accuracy, F1, confusion matrix) + Grad-CAM/SHAP analysis
- W7: Streamlit app development → image upload, prediction, confidence output
- W8: Final documentation, results, presentation, and demo handover
ai-tracefinder/
├─ app.py
├─ src/
│ ├─ ingest/
│ ├─ preprocess/
│ ├─ features/
│ ├─ models/
│ ├─ explain/
│ └─ utils/
├─ data/
├─ notebooks/
├─ reports/
└─ README.md