- This project is a Machine Learning based web application that classifies Email or SMS messages as Spam or Not Spam.
- It uses Natural Language Processing (NLP) techniques to preprocess the text and a Multinomial Naive Bayes model to detect spam messages.
- The application is deployed using Streamlit, making it easy for anyone to use without coding.
- Spam messages are unwanted and can cause security risks. This project aims to automatically classify messages as Spam or Not Spam, helping users filter unwanted messages and improve communication efficiency.
- Python β Programming language
- Pandas β Data manipulation
- NumPy β Numerical operations
- NLTK β Natural Language Processing (tokenization, stopword removal, stemming)
- Scikit-learn β Machine Learning library
- TF-IDF Vectorizer β Feature extraction from text
- Multinomial Naive Bayes β Machine Learning model for classification
- Streamlit β Web app deployment
Text Preprocessing:
- Convert text to lowercase
- Tokenization (splitting text into words)
- Stopword removal (removing common words like βisβ, βtheβ)
- Stemming (reducing words to their root form)
Feature Extraction:
- Using TF-IDF Vectorizer to convert text into numerical features
Model Training:
- Tested multiple models
Best Model:
- Multinomial Naive Bayes
Evaluation Metrics:
- Accuracy: Measures overall correctness
- Precision: Measures how many predicted spam messages are actually spam
-
- Clone the repository or download the ZIP file
-
- Install required packages:
- pip install pandas numpy nltk scikit-learn streamlit
- Install required packages:
-
- Run the Streamlit app:
- streamlit run app.py
- Run the Streamlit app:
-
- Open the web app in your browser
-
- Enter any Email or SMS text and click Predict to see if itβs Spam or Not Spam