- Team ID: T-0823769
- Team Name: ASHSUM
- Team Members:
- Sumit Pandey : 25-222056
- Ashirwad Sinha : 25-412666
This repository presents our solution to the NSSC 2025 Data Analytics Challenge, focused on analyzing the HLS4ML LHC Jet dataset.
Dataset: Drive Link
Our project is built around two major objectives:
-
Jet Classification
We classify particle jets into one of five categories using:- Deep Learning Approach: A fine-tuned ResNet-18 model trained on jet image data.
- Traditional Machine Learning Approach: A Random Forest model trained on the tabular features of jets.
-
Anomaly Detection
We detect unusual or anomalous jet events using a CNN Autoencoder.- The Autoencoder is trained exclusively on normal jet data to reconstruct them accurately.
- Significant reconstruction error signals potential anomalies.
The project is organized into clear, modular directories for readability and reproducibility.
NSSC_DataAnalytics_Team_ASHSUM143/
├── 📂 Notebook/
│ ├── 1_Data_Preprocessing.ipynb
│ ├── 2_CNN_Model.ipynb
│ ├── 3_Tabular_Models.ipynb
│ ├── 4_Model_Comparison.ipynb
│ └── 5_Anomaly_Detection.ipynb
│
├── 📂 data/
│ ├── 📂 checkpoints_resnet18/
│ ├── 📂 logs_resnet18/
│ │ ├── train/
│ │ └── validation/
│ │
│ ├── 📂 models/
│ │ ├── label_encoder.joblib
│ │ ├── pca.joblib
│ │ ├── resnet18_jet_classifier_best_model.keras
│ │ ├── rf_baseline.joblib
│ │ ├── rf_pca.joblib
│ │ └── scaler.joblib
│ │
│ ├── 📂 results/
│ │ ├── 📂 plots/
│ │ │ ├── anomaly_error_distribution.png
│ │ │ ├── cnn_accuracy_loss_curves.png
│ │ │ ├── cnn_confusion_matrix.png
│ │ │ ├── cnn_misclassified_jets.png
│ │ │ ├── confusion_matrix_pca.png
│ │ │ ├── confusion_matrix_raw.png
│ │ │ ├── model_comparison.png
│ │ │ ├── pca_scree_plot.png
│ │ │ └── top_5_anomalies.png
│ │ │
│ │ ├── anomaly_scores.csv
│ │ ├── cnn_summary.csv
│ │ ├── sample_images.npy
│ │ ├── tabular_feature_summary.csv
│ │ └── tabular_summary.csv
│ │
│ ├── 📂 train_data/ (Contains 36 .h5 files)
│ └── 📂 validation_data/ (Contains 27 .h5 files)
|
|── 📁 Report/
| ├── NSSC_Report_T-0823769.docx
|
├── README.md
└── requirements.txt
- Handled missing values, normalization, and reshaping for image-based models.
- Ensured class balance using stratified sampling techniques.
- Visualized jet distributions, feature importance, and correlations.
- Compared tabular and image-based feature representations.
- Fine-tuned on jet images.
- Implemented using PyTorch.
- Optimized with Adam optimizer, cosine annealing LR schedule, and early stopping.
- Used for interpretable classification on structured features.
- Tuned with GridSearchCV for best performance.
- Built a CNN Autoencoder to reconstruct normal jet events.
- Reconstruction error used as the anomaly score.
- Evaluated using ROC AUC and precision-recall metrics.
- Compared metrics such as Accuracy, F1-Score, ROC-AUC, and Inference Time.
- Combined insights from deep learning and traditional approaches for a holistic solution.
| Model | Acc | F1 | ROC AUC | Notes |
|---|---|---|---|---|
| RF | 0.8225 | 0.8198 | 0.9012 | Best overall |
| PCA RF | 0.7939 | 0.7901 | 0.8834 | Fast and efficient |
| CNN | 0.7437 | 0.7392 | 0.8645 | Learns jet structure |
- Python 3.10+
- PyTorch
- Scikit-learn
- Matplotlib, Seaborn
- NumPy, Pandas
- TensorBoard (for training visualization)
To reproduce our results:
# Clone the repository
git clone https://github.com/5umitpandey/IIT_Kh_NSSC_Data_Analytics_Hackathon.git
cd IIT_Kh_NSSC_Data_Analytics_Hackathon
# Install dependencies
pip install -r requirements.txt
# Run notebooks
jupyter notebook Notebook/




