An intelligent job search automation system with LinkedIn scraping, AI-powered CV generation, and cover letter creation. Extract detailed job data, company information, and hiring team details with advanced anonymization and proxy support.
- π Quick Start
- π Documentation
- β‘ Common Commands
- π Project Structure
- βοΈ Configuration
- π Output & Results
- π¦ Best Practices & Guidelines
- π§ Troubleshooting
- π Documentation & Support
- π€ Contributing
- π License
git clone https://github.com/sreekar2858/JobSearch-Agent.git
cd JobSearch-Agent
pip install -r requirements.txtCreate a .env file for enhanced features:
# LinkedIn credentials (for better scraping results)
LINKEDIN_USERNAME=sreekar2858@gmail.com
LINKEDIN_PASSWORD=your_password
# AI API key (for CV/cover letter generation)
GOOGLE_API_KEY=your_gemini_api_key# LinkedIn job search
python -m src.scraper.search.linkedin_scraper "Software Engineer" "San Francisco" --max-jobs 10
# Get credentials for job sites
python -m src.scraper.buggmenot --website glassdoor.com
# Extract from specific job URL
python -m src.scraper.search.linkedin_scraper --job-url "https://linkedin.com/jobs/view/123456789"Advanced LinkedIn job scraper with anonymization and proxy support:
# Basic search
python -m src.scraper.search.linkedin_scraper "Python Developer" "Remote" --max-jobs 5
# With browser options
python -m src.scraper.search.linkedin_scraper "Data Scientist" "NYC" --browser firefox --headless
# With anonymization disabled
python -m src.scraper.search.linkedin_scraper "DevOps Engineer" "Berlin" --no-anonymize
# With proxy
python -m src.scraper.search.linkedin_scraper "ML Engineer" "London" --proxy http://proxy:8080Key Features:
- β Multi-browser support (Chromium, Firefox, WebKit)
- β Anonymization (random user agents, timezone, WebGL blocking)
- β Proxy support (HTTP/SOCKS5)
- β Robust data extraction (job details, company info, hiring team)
- β Rate limiting protection
Get login credentials for job sites:
# Basic usage
python -m src.scraper.buggmenot --website economist.com
# With browser visible
python -m src.scraper.buggmenot --website nytimes.com --visible
# With proxy
python -m src.scraper.buggmenot --website wsj.com --proxy socks5://proxy:1080Unified job search pipeline with both synchronous and asynchronous support:
# Complete job search workflow with AI processing
python main.py search "Frontend Developer" --locations "Berlin" --generate-cv --generate-cover-letter
# Direct pipeline usage (sync mode for CLI)
python -c "from src.utils.job_search_pipeline import run_job_search; run_job_search('Python Developer', max_jobs=5)"
# Start API server (uses async pipeline for FastAPI)
python main_api.py
# Visit http://localhost:8000/docs for API documentationKey Pipeline Features:
- β Unified codebase - Single file supports both sync and async modes
- β Database integration - SQLite storage with deduplication
- β FastAPI compatibility - Async pipeline for web services
- β CLI compatibility - Sync pipeline for scripts and standalone execution
- β Export flexibility - JSON output and database exports
π Complete documentation is available in the docs/ directory:
- π Documentation Index - Complete overview of all documentation
- π§ LinkedIn Scraper Guide - Complete scraper documentation
- βοΈ Advanced Configuration - Production setup and optimization
- π API Reference - REST API and WebSocket documentation
- π¨βπ» Development Guide - Contributing and development setup
- π§ͺ Testing Guide - Testing procedures and comprehensive test suite
# LinkedIn job search with 20 results
python -m src.scraper.search.linkedin_scraper "Software Engineer" "Remote" --max-jobs 20
# LinkedIn search with filters
python -m src.scraper.search.linkedin_scraper "Data Scientist" "SF" --experience-levels "mid_senior" --date-posted "past_week"
# Get job details from specific URL
python -m src.scraper.search.linkedin_scraper --job-url "https://linkedin.com/jobs/view/4243594281/"
# BugMeNot credentials
python -m src.scraper.buggmenot --website glassdoor.com --output credentials.json
# Links only (fast collection)
python -m src.scraper.search.linkedin_scraper "Python" "NYC" --links-only --max-pages 3
# Help for any tool
python -m src.scraper.search.linkedin_scraper --help
python -m src.scraper.buggmenot --help- π Anonymization: Random user agents, timezone/language randomization, WebGL/Canvas/WebRTC blocking
- π Proxy Support: HTTP and SOCKS5 proxy configuration for both scrapers
- π Rich Data: Complete job descriptions, company info, hiring team details, related jobs
- π Fast & Robust: Optimized selectors, retry logic, rate limiting protection
- π§ Flexible: CLI arguments, module execution, programmatic usage
- LinkedIn Login: Recommended for better scraping results and fewer rate limits
- Responsible Usage: Respect rate limits, use delays between requests
- Browser Support: Chromium recommended for LinkedIn (best compatibility)
- Proxy Usage: For additional anonymization and geographic flexibility
Contributions welcome! See DEVELOPMENT.md for guidelines.
Single Job Mode:
# Extract from specific job URL
python -m src.scraper.search.linkedin_scraper --job-url "https://linkedin.com/jobs/view/123456789"Key Options:
--browser chromium|firefox|webkit- Browser choice (chromium is default)--sort-by relevance|recent- Sort results--links-only- Fast link collection without full details--headless- Run without GUI
Complete Workflow:
# Unified pipeline - search + generate documents
python main.py search "Frontend Developer" --locations "Berlin" --generate-cv --generate-cover-letter
# Process existing job data
python main.py process linkedin_jobs.json --generate-cv
# Direct pipeline usage
python -c "
from src.utils.job_search_pipeline import run_job_search, run_job_search_async
# Sync version (for CLI/scripts)
result = run_job_search('Python Developer', max_jobs=5)
# Async version (for FastAPI/web services) - use with await in async context
"Pipeline Architecture:
- Sync mode: For CLI tools and standalone scripts
- Async mode: For FastAPI server and event loop integration
- Database-first: SQLite storage with JSON export options
- Deduplication: Automatic prevention of duplicate job entries
Start server and access documentation:
python main_api.py
# Visit http://localhost:8000/docs for interactive API documentationKey endpoints:
POST /search- Start job searchGET /search/{id}- Get resultsPOST /process- Generate CV/cover lettersPOST /parse- Parse job descriptions
The project is organized for easy navigation and contribution:
JobSearch-Agent/
βββ main.py # CLI interface
βββ main_api.py # FastAPI server
βββ test_comprehensive.py # Consolidated test suite
βββ migrate_jobs_to_db.py # Database migration utility
βββ src/
β βββ agents/ # AI agents (CV writer, cover letter, parser)
β βββ scraper/ # Web scraping modules
β βββ prompts/ # AI agent prompts
β βββ utils/
β βββ job_search_pipeline.py # π Unified sync/async pipeline
β βββ job_database.py # SQLite database operations
β βββ file_utils.py # Utilities and helpers
βββ config/ # Configuration files
βββ data/ # Templates and samples
βββ jobs/ # Job database and JSON exports
βββ output/ # Generated outputs
βββ docs/ # π Complete documentation
β βββ README.md # Documentation index
β βββ API.md # API reference
β βββ ADVANCED_CONFIGURATION.md # Production setup
β βββ DEVELOPMENT.md # Development guide
β βββ TESTING.md # Testing procedures
β βββ CHANGELOG.md # Version history
β βββ TODO.md # Roadmap
βββ examples/ # Usage examples
Create .env file with your credentials:
# LinkedIn (recommended for better results)
LINKEDIN_USERNAME=sreekar2858@gmail.com
LINKEDIN_PASSWORD=your_password
# AI APIs (for CV/cover letter generation)
GOOGLE_API_KEY=your_gemini_api_key
OPENAI_API_KEY=your_openai_api_key # Optional alternativeThe system uses YAML configuration files in the config/ directory:
jobsearch_config.yaml- Main scraper and API settingscv_app_agent_config.yaml- AI agent configurationfile_config.yaml- File paths and templates
Key settings include browser preferences, retry logic, output directories, and AI model assignments.
All outputs are organized in the output/ directory:
linkedin/- Scraped job data in JSON format with timestampscvs/- Generated CVs in both text and Word formatscover_letters/- Personalized cover lettersparsed_jobs/- Structured job data from parsing
The scraper extracts comprehensive job information including:
- Complete job descriptions and requirements
- Company profiles and employee counts
- Hiring team information and contact details
- Related job suggestions and career insights
- Application URLs and salary information (when available)
Recommended Limits:
- Jobs per session: 25-50 for stability
- Pages per search: 5-10 pages maximum
- Break between searches: 10-15 minutes
- Authentication: Always use LinkedIn login for better results
Performance Tips:
- Use
--headlessmode for faster scraping - Choose
--links-onlyfor quick job URL collection - Process large datasets in smaller batches
- Monitor for CAPTCHAs and be ready to solve them manually
- Personal Use: Ideal for individual job searching
- Respect Limits: Don't overwhelm LinkedIn's servers
- Privacy: Only collect publicly available job information
- Compliance: Follow LinkedIn's Terms of Service
- Responsible: Use data ethically and don't republish without permission
π Browser Problems
- Try switching browsers:
--browser firefoxor--browser chrome - Update Playwright:
pip install --upgrade playwright && playwright install - Check browser installation and version compatibility
π Authentication Issues
- Verify credentials in
.envfile - Check for two-factor authentication requirements
- Ensure LinkedIn account is active and in good standing
- Reduce job limits:
--jobs 10instead of larger numbers - Increase delays between requests
- Take breaks between different searches
- Use authentication to reduce rate limiting
π Empty Results
- Broaden search terms ("Software" instead of "Senior React Developer")
- Try different location formats ("Berlin, Germany" vs "Berlin")
- Enable authentication for better access
- Check if search terms are too specific
π Technical Errors
- Enable debug mode:
export DEBUG=1 - Check log files in
logs/directory - Verify all dependencies are installed
- Review screenshots in
output/linkedin/for visual debugging
All detailed documentation is organized in the docs/ directory:
- π Documentation Index - Complete guide to all documentation
- π§ Advanced Configuration - Production setup and optimization
- π API Reference - REST API and WebSocket documentation
- π¨βπ» Development Guide - Contributing and development setup
- π§ͺ Testing Guide - Testing procedures and comprehensive test suite
- Run Tests:
python test_comprehensive.py - Start API Server:
python main_api.pyβ Visithttp://localhost:8000/docs - CLI Help:
python main.py --help - Configuration: See
config/directory for all settings
- CHANGELOG - Version history and updates
- TODO & Roadmap - Planned features and development roadmap
- Testing Guide - Comprehensive testing documentation
- WebSocket Guide - Real-time API features
- examples/ - Sample usage and integration code
- Check the troubleshooting section above
- Review the detailed documentation in docs/
- Search existing issues on GitHub
- Create a new issue with detailed information
Contributions are welcome! Please see our Development Guide for:
- Development environment setup
- Code style guidelines
- Testing procedures
- Pull request process
git clone https://github.com/sreekar2858/JobSearch-Agent.git
cd JobSearch-Agent
python -m venv venv
source venv/bin/activate # Windows: venv\Scripts\activate
pip install -r requirements.txt
# Run comprehensive tests
python test_comprehensive.pySee Testing Guide for complete testing documentation.
License: MIT License - see LICENSE for details.
Disclaimer: This tool is for educational and personal use. Users are responsible for complying with LinkedIn's Terms of Service and applicable laws. The authors are not responsible for any misuse.
- GitHub: @sreekar2858
- Repository: JobSearch-Agent
- Issues: Report bugs or request features
Special thanks to:
- Playwright for browser automation
- Playwright for browser automation
- FastAPI for the API framework
- LinkedIn for providing job data