Skip to content

sreekar2858/JobSearch-Agent

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

35 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

JobSearch Agent

Python 3.10+ License: MIT GitHub issues

An intelligent job search automation system with LinkedIn scraping, AI-powered CV generation, and cover letter creation. Extract detailed job data, company information, and hiring team details with advanced anonymization and proxy support.

πŸ“‹ Table of Contents


πŸš€ Quick Start

1. Installation

git clone https://github.com/sreekar2858/JobSearch-Agent.git
cd JobSearch-Agent
pip install -r requirements.txt

2. Setup (Optional but Recommended)

Create a .env file for enhanced features:

# LinkedIn credentials (for better scraping results)
LINKEDIN_USERNAME=sreekar2858@gmail.com
LINKEDIN_PASSWORD=your_password

# AI API key (for CV/cover letter generation)  
GOOGLE_API_KEY=your_gemini_api_key

3. Start Scraping

# LinkedIn job search
python -m src.scraper.search.linkedin_scraper "Software Engineer" "San Francisco" --max-jobs 10

# Get credentials for job sites
python -m src.scraper.buggmenot --website glassdoor.com

# Extract from specific job URL
python -m src.scraper.search.linkedin_scraper --job-url "https://linkedin.com/jobs/view/123456789"

πŸ”§ Main Tools

πŸ” LinkedIn Scraper (Playwright)

Advanced LinkedIn job scraper with anonymization and proxy support:

# Basic search
python -m src.scraper.search.linkedin_scraper "Python Developer" "Remote" --max-jobs 5

# With browser options
python -m src.scraper.search.linkedin_scraper "Data Scientist" "NYC" --browser firefox --headless

# With anonymization disabled
python -m src.scraper.search.linkedin_scraper "DevOps Engineer" "Berlin" --no-anonymize

# With proxy
python -m src.scraper.search.linkedin_scraper "ML Engineer" "London" --proxy http://proxy:8080

Key Features:

  • βœ… Multi-browser support (Chromium, Firefox, WebKit)
  • βœ… Anonymization (random user agents, timezone, WebGL blocking)
  • βœ… Proxy support (HTTP/SOCKS5)
  • βœ… Robust data extraction (job details, company info, hiring team)
  • βœ… Rate limiting protection

πŸ” BugMeNot Scraper

Get login credentials for job sites:

# Basic usage
python -m src.scraper.buggmenot --website economist.com

# With browser visible
python -m src.scraper.buggmenot --website nytimes.com --visible

# With proxy
python -m src.scraper.buggmenot --website wsj.com --proxy socks5://proxy:1080

πŸ€– AI Job Processing & Pipeline

Unified job search pipeline with both synchronous and asynchronous support:

# Complete job search workflow with AI processing
python main.py search "Frontend Developer" --locations "Berlin" --generate-cv --generate-cover-letter

# Direct pipeline usage (sync mode for CLI)
python -c "from src.utils.job_search_pipeline import run_job_search; run_job_search('Python Developer', max_jobs=5)"

# Start API server (uses async pipeline for FastAPI)
python main_api.py
# Visit http://localhost:8000/docs for API documentation

Key Pipeline Features:

  • βœ… Unified codebase - Single file supports both sync and async modes
  • βœ… Database integration - SQLite storage with deduplication
  • βœ… FastAPI compatibility - Async pipeline for web services
  • βœ… CLI compatibility - Sync pipeline for scripts and standalone execution
  • βœ… Export flexibility - JSON output and database exports

πŸ“– Documentation

πŸ“š Complete documentation is available in the docs/ directory:


⚑ Common Commands

# LinkedIn job search with 20 results
python -m src.scraper.search.linkedin_scraper "Software Engineer" "Remote" --max-jobs 20

# LinkedIn search with filters
python -m src.scraper.search.linkedin_scraper "Data Scientist" "SF" --experience-levels "mid_senior" --date-posted "past_week"

# Get job details from specific URL
python -m src.scraper.search.linkedin_scraper --job-url "https://linkedin.com/jobs/view/4243594281/"

# BugMeNot credentials
python -m src.scraper.buggmenot --website glassdoor.com --output credentials.json

# Links only (fast collection)
python -m src.scraper.search.linkedin_scraper "Python" "NYC" --links-only --max-pages 3

# Help for any tool
python -m src.scraper.search.linkedin_scraper --help
python -m src.scraper.buggmenot --help

πŸ›‘οΈ Features

  • πŸ”’ Anonymization: Random user agents, timezone/language randomization, WebGL/Canvas/WebRTC blocking
  • 🌐 Proxy Support: HTTP and SOCKS5 proxy configuration for both scrapers
  • πŸ“Š Rich Data: Complete job descriptions, company info, hiring team details, related jobs
  • πŸš€ Fast & Robust: Optimized selectors, retry logic, rate limiting protection
  • πŸ”§ Flexible: CLI arguments, module execution, programmatic usage

⚠️ Important Notes

  • LinkedIn Login: Recommended for better scraping results and fewer rate limits
  • Responsible Usage: Respect rate limits, use delays between requests
  • Browser Support: Chromium recommended for LinkedIn (best compatibility)
  • Proxy Usage: For additional anonymization and geographic flexibility

🀝 Contributing

Contributions welcome! See DEVELOPMENT.md for guidelines.


πŸ”§ Additional Command Examples

Single Job Mode:

# Extract from specific job URL
python -m src.scraper.search.linkedin_scraper --job-url "https://linkedin.com/jobs/view/123456789"

Key Options:

  • --browser chromium|firefox|webkit - Browser choice (chromium is default)
  • --sort-by relevance|recent - Sort results
  • --links-only - Fast link collection without full details
  • --headless - Run without GUI

AI Job Processing & Pipeline

Complete Workflow:

# Unified pipeline - search + generate documents
python main.py search "Frontend Developer" --locations "Berlin" --generate-cv --generate-cover-letter

# Process existing job data
python main.py process linkedin_jobs.json --generate-cv

# Direct pipeline usage
python -c "
from src.utils.job_search_pipeline import run_job_search, run_job_search_async
# Sync version (for CLI/scripts)
result = run_job_search('Python Developer', max_jobs=5)
# Async version (for FastAPI/web services) - use with await in async context
"

Pipeline Architecture:

  • Sync mode: For CLI tools and standalone scripts
  • Async mode: For FastAPI server and event loop integration
  • Database-first: SQLite storage with JSON export options
  • Deduplication: Automatic prevention of duplicate job entries

API Server

Start server and access documentation:

python main_api.py
# Visit http://localhost:8000/docs for interactive API documentation

Key endpoints:

  • POST /search - Start job search
  • GET /search/{id} - Get results
  • POST /process - Generate CV/cover letters
  • POST /parse - Parse job descriptions

πŸ“ Project Structure

The project is organized for easy navigation and contribution:

JobSearch-Agent/
β”œβ”€β”€ main.py                           # CLI interface
β”œβ”€β”€ main_api.py                       # FastAPI server  
β”œβ”€β”€ test_comprehensive.py             # Consolidated test suite
β”œβ”€β”€ migrate_jobs_to_db.py             # Database migration utility
β”œβ”€β”€ src/
β”‚   β”œβ”€β”€ agents/                       # AI agents (CV writer, cover letter, parser)
β”‚   β”œβ”€β”€ scraper/                      # Web scraping modules
β”‚   β”œβ”€β”€ prompts/                      # AI agent prompts
β”‚   └── utils/
β”‚       β”œβ”€β”€ job_search_pipeline.py    # πŸ”„ Unified sync/async pipeline
β”‚       β”œβ”€β”€ job_database.py           # SQLite database operations
β”‚       └── file_utils.py             # Utilities and helpers
β”œβ”€β”€ config/                           # Configuration files
β”œβ”€β”€ data/                             # Templates and samples
β”œβ”€β”€ jobs/                             # Job database and JSON exports
β”œβ”€β”€ output/                           # Generated outputs
β”œβ”€β”€ docs/                             # πŸ“š Complete documentation
β”‚   β”œβ”€β”€ README.md                     # Documentation index
β”‚   β”œβ”€β”€ API.md                        # API reference
β”‚   β”œβ”€β”€ ADVANCED_CONFIGURATION.md    # Production setup
β”‚   β”œβ”€β”€ DEVELOPMENT.md                # Development guide
β”‚   β”œβ”€β”€ TESTING.md                    # Testing procedures
β”‚   β”œβ”€β”€ CHANGELOG.md                  # Version history
β”‚   └── TODO.md                       # Roadmap
└── examples/                         # Usage examples

βš™οΈ Configuration

Basic Setup

Create .env file with your credentials:

# LinkedIn (recommended for better results)
LINKEDIN_USERNAME=sreekar2858@gmail.com
LINKEDIN_PASSWORD=your_password

# AI APIs (for CV/cover letter generation)
GOOGLE_API_KEY=your_gemini_api_key
OPENAI_API_KEY=your_openai_api_key  # Optional alternative

Advanced Configuration

The system uses YAML configuration files in the config/ directory:

  • jobsearch_config.yaml - Main scraper and API settings
  • cv_app_agent_config.yaml - AI agent configuration
  • file_config.yaml - File paths and templates

Key settings include browser preferences, retry logic, output directories, and AI model assignments.


πŸ“Š Output & Results

File Organization

All outputs are organized in the output/ directory:

  • linkedin/ - Scraped job data in JSON format with timestamps
  • cvs/ - Generated CVs in both text and Word formats
  • cover_letters/ - Personalized cover letters
  • parsed_jobs/ - Structured job data from parsing

Data Quality

The scraper extracts comprehensive job information including:

  • Complete job descriptions and requirements
  • Company profiles and employee counts
  • Hiring team information and contact details
  • Related job suggestions and career insights
  • Application URLs and salary information (when available)

🚦 Best Practices & Guidelines

Scraping Guidelines

Recommended Limits:

  • Jobs per session: 25-50 for stability
  • Pages per search: 5-10 pages maximum
  • Break between searches: 10-15 minutes
  • Authentication: Always use LinkedIn login for better results

Performance Tips:

  • Use --headless mode for faster scraping
  • Choose --links-only for quick job URL collection
  • Process large datasets in smaller batches
  • Monitor for CAPTCHAs and be ready to solve them manually

Ethical Usage

  • Personal Use: Ideal for individual job searching
  • Respect Limits: Don't overwhelm LinkedIn's servers
  • Privacy: Only collect publicly available job information
  • Compliance: Follow LinkedIn's Terms of Service
  • Responsible: Use data ethically and don't republish without permission

πŸ”§ Troubleshooting

Common Issues & Solutions

πŸ” Browser Problems

  • Try switching browsers: --browser firefox or --browser chrome
  • Update Playwright: pip install --upgrade playwright && playwright install
  • Check browser installation and version compatibility

πŸ”‘ Authentication Issues

  • Verify credentials in .env file
  • Check for two-factor authentication requirements
  • Ensure LinkedIn account is active and in good standing

⚠️ Rate Limiting

  • Reduce job limits: --jobs 10 instead of larger numbers
  • Increase delays between requests
  • Take breaks between different searches
  • Use authentication to reduce rate limiting

πŸ“Š Empty Results

  • Broaden search terms ("Software" instead of "Senior React Developer")
  • Try different location formats ("Berlin, Germany" vs "Berlin")
  • Enable authentication for better access
  • Check if search terms are too specific

πŸ› Technical Errors

  • Enable debug mode: export DEBUG=1
  • Check log files in logs/ directory
  • Verify all dependencies are installed
  • Review screenshots in output/linkedin/ for visual debugging

πŸ“š Documentation & Support

πŸ“– Complete Documentation

All detailed documentation is organized in the docs/ directory:

Quick Reference

  • Run Tests: python test_comprehensive.py
  • Start API Server: python main_api.py β†’ Visit http://localhost:8000/docs
  • CLI Help: python main.py --help
  • Configuration: See config/ directory for all settings

Additional Resources

Getting Help

  1. Check the troubleshooting section above
  2. Review the detailed documentation in docs/
  3. Search existing issues on GitHub
  4. Create a new issue with detailed information

🀝 Contributing

Contributions are welcome! Please see our Development Guide for:

  • Development environment setup
  • Code style guidelines
  • Testing procedures
  • Pull request process

Quick Start for Contributors

git clone https://github.com/sreekar2858/JobSearch-Agent.git
cd JobSearch-Agent
python -m venv venv
source venv/bin/activate  # Windows: venv\Scripts\activate
pip install -r requirements.txt

# Run comprehensive tests
python test_comprehensive.py

See Testing Guide for complete testing documentation.


πŸ“„ License & Disclaimer

License: MIT License - see LICENSE for details.

Disclaimer: This tool is for educational and personal use. Users are responsible for complying with LinkedIn's Terms of Service and applicable laws. The authors are not responsible for any misuse.


πŸ“ž Contact & Links


πŸ™ Acknowledgments

Special thanks to:

About

JobSearch-Agent automates job searching by scraping listings, filtering based on user preferences, and managing applications. Easily customize search criteria, track progress, and streamline your job hunt with this efficient, Python-powered tool.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages