Team Pakka Nerds

SecureParse

A secure file redaction service that automatically detects and redacts sensitive information from various file types.

Prerequisites

Python 3.8 or higher
Node.js 14 or higher
pip3
System dependencies (installed automatically by setup script):
- tesseract-ocr
- python3-dev

Manual Setup

If the setup script doesn't work for your system, you can install dependencies manually:

Install system dependencies:
- For Ubuntu/Debian: sudo apt-get install tesseract-ocr python3-dev
- For Arch Linux: sudo pacman -Sy tesseract
- For Fedora: sudo dnf install tesseract
- For macOS: brew install tesseract
Create and activate Python virtual environment:

python3 -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

Install Python dependencies:

pip install --upgrade pip
pip install -r redaction/requirements.txt
python -m spacy download en_core_web_lg

Install Node.js dependencies:

cd server
npm install

Start the server:

npm start

The server will be running at http://localhost:3000

Supported File Types

Images: PNG, JPEG
Documents: PDF, DOCX, PPTX, XLSX
Text: TXT, RTF, CSV, JSON, XML

Features

Automatic detection and redaction of sensitive information
Support for multiple file types
Real-time processing
User-friendly interface

License

See the LICENSE file for details.

🌟 Features

Smart Detection: Identifies 20+ PII types (emails, phones, IDs, etc.) using Microsoft Presidio
Accurate Redaction: Maintains content structure after redaction
Web Interface: Simple drag-and-drop UI
Secure Processing: Files processed in-memory (never stored permanently)

🛠️ Tech Stack

Frontend:

HTML5/CSS3
JavaScript (ES6+)

Backend:

Node.js (Express)
Python 3.8+ (Flask)
Key Modules:
- Microsoft Presidio (analysis)
- Pytesseract (text extraction)

🚀 Installation

Prerequisites

Node.js v16+
Python 3.8+

# Clone repository
https://github.com/nbdevanandan/hack-the-future

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
redaction		redaction
server		server
.gitignore		.gitignore
Containerfile		Containerfile
LICENSE		LICENSE
Patient Medical Report-Test.pdf		Patient Medical Report-Test.pdf
README.md		README.md
container.sh		container.sh
package-lock.json		package-lock.json
package.json		package.json
redacted-Patient Medical Report-Test.pdf		redacted-Patient Medical Report-Test.pdf
setup.sh		setup.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Team Pakka Nerds

SecureParse

Prerequisites

Manual Setup

Supported File Types

Features

License

🌟 Features

🛠️ Tech Stack

🚀 Installation

Prerequisites

About

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Team Pakka Nerds

SecureParse

Prerequisites

Manual Setup

Supported File Types

Features

License

🌟 Features

🛠️ Tech Stack

🚀 Installation

Prerequisites

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages