Skip to content

feat: Add agriculture census scraper and tehsil matching pipeline#815

Open
neha222222 wants to merge 2 commits intocore-stack-org:mainfrom
neha222222:feature/agriculture-census-scraper
Open

feat: Add agriculture census scraper and tehsil matching pipeline#815
neha222222 wants to merge 2 commits intocore-stack-org:mainfrom
neha222222:feature/agriculture-census-scraper

Conversation

@neha222222
Copy link
Copy Markdown

Implements tehsil-level agriculture census data pipeline for issue #221:

  • scraper.py: Selenium-based scraper for agcensus.da.gov.in that navigates ASP.NET WebForms dropdowns (Year/Table/State/District/Tehsil) to extract crop area data at tehsil level

  • tehsil_matcher.py: Matches scraped tehsil names to CoRE Stack SOI boundaries using exact + fuzzy matching with edit distance

  • pipeline.py: CLI pipeline that orchestrates scraping, cleaning, matching, and CSV export with match statistics

Supports incremental scraping with --skip-scraping flag for re-running only the matching step on previously collected data.

Implements tehsil-level agriculture census data pipeline for issue core-stack-org#221:

- scraper.py: Selenium-based scraper for agcensus.da.gov.in that
  navigates ASP.NET WebForms dropdowns (Year/Table/State/District/Tehsil)
  to extract crop area data at tehsil level

- tehsil_matcher.py: Matches scraped tehsil names to CoRE Stack SOI
  boundaries using exact + fuzzy matching with edit distance

- pipeline.py: CLI pipeline that orchestrates scraping, cleaning,
  matching, and CSV export with match statistics

Supports incremental scraping with --skip-scraping flag for re-running
only the matching step on previously collected data.
Adds gee_export.py to complete the pipeline:
- Joins matched crop data with SOI tehsil boundary geometries
- Ensures EPSG:4326 CRS and valid geometry
- Converts to ee.FeatureCollection and publishes as GEE vector asset
- Syncs to GeoServer and saves layer metadata to DB
- Produces tehsil-level vectorized crop map for downstream use
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant