Skip to content

GEE Uploader#839

Open
amit-spatial wants to merge 2 commits intodevfrom
feat/gee-upload
Open

GEE Uploader#839
amit-spatial wants to merge 2 commits intodevfrom
feat/gee-upload

Conversation

@amit-spatial
Copy link
Copy Markdown
Collaborator

Added General GEE Large Geojson File Uploader

General script to upload any geojson file to GEE, using GCS as a staging area. It has been tested on multiple files, and can upload large files (tested on upto 6GB file).

Uses chunking to upload to GCS (default 64 MB).

Highly configurable and uses argparse for CLI options:

Usage Example

single file upload
python -m utilities.scripts.gee_upload
--file data/pan_india_facilities/pan_india_facilities.geojson
--service-account-json data/gee_confs/core-stack-learn-818963fa8f26.json
--wait
--poll-interval 120
--gcs-prefix proximity
--asset-id projects/corestack-datasets/assets/datasets/pan_india_facilities
--replace-existing

batch upload

python -m utilities.scripts.gee_upload
--directorydata/pan_india_facilities/
--gee-account-id
--wait
--poll-interval 120
--gcs-prefix proximity
--asset-parent projects/corestack-datasets/assets/datasets/facilities
--make-public
--replace-existing


Check all usage commands:

python -m utilities.scripts.gee_upload

python -m utilities.scripts.gee_upload -h
usage: gee_upload.py [-h] [--file FILES] [--directory DIRECTORY]
                     [--asset-parent ASSET_PARENT] [--asset-id ASSET_ID]
                     [--asset-name ASSET_NAME]
                     [--gee-account-id GEE_ACCOUNT_ID]
                     [--service-account-json SERVICE_ACCOUNT_JSON]
                     [--gcs-bucket GCS_BUCKET] [--gcs-prefix GCS_PREFIX]
                     [--replace-existing] [--wait] [--make-public]
                     [--cleanup-gcs] [--poll-interval POLL_INTERVAL]
                     [--timeout-seconds TIMEOUT_SECONDS]
                     [--max-vertices MAX_VERTICES]
                     [--max-error-meters MAX_ERROR_METERS]
                     [--chunk-size-mb CHUNK_SIZE_MB]
                     [--csv-delimiter CSV_DELIMITER]
                     [--csv-qualifier CSV_QUALIFIER] [--property PROPERTIES]
                     [--stop-on-error] [--facilities-defaults]

Upload GeoJSON/JSON/GeoJSONL vector files into Earth Engine assets by staging
them in GCS and importing them as table assets.

options:
  -h, --help            show this help message and exit
  --file FILES          Path to an input file. Repeat the flag to upload
                        multiple files.
  --directory DIRECTORY
                        Directory containing .geojson/.json/.geojsonl files to
                        upload.
  --asset-parent ASSET_PARENT
                        Earth Engine asset folder, e.g. projects/corestack-
                        datasets/assets/facilities
  --asset-id ASSET_ID   Full Earth Engine asset id for a single-file upload.
  --asset-name ASSET_NAME
                        Optional asset name override for a single-file upload.
  --gee-account-id GEE_ACCOUNT_ID
                        Use credentials stored in the Django GEEAccount model.
  --service-account-json SERVICE_ACCOUNT_JSON
                        Path to a service-account JSON file to use instead of
                        Django GEEAccount credentials.
  --gcs-bucket GCS_BUCKET
                        GCS bucket used as the staging area. Defaults to
                        core_stack.
  --gcs-prefix GCS_PREFIX
                        Prefix inside the GCS bucket for staged CSV files.
  --replace-existing    Delete an existing EE asset with the same id before re-
                        uploading.
  --wait                Wait for Earth Engine ingestion to finish before
                        exiting.
  --make-public         After a successful upload, set the asset ACL to public
                        if allowed.
  --cleanup-gcs         Delete the staged GCS object after a successful upload.
  --poll-interval POLL_INTERVAL
                        Polling interval in seconds when waiting for ingestion
                        tasks.
  --timeout-seconds TIMEOUT_SECONDS
                        Optional timeout while waiting for ingestion tasks.
  --max-vertices MAX_VERTICES
                        Optional Earth Engine geometry split threshold.
  --max-error-meters MAX_ERROR_METERS
                        Maximum reprojection error in meters for ingestion.
  --chunk-size-mb CHUNK_SIZE_MB
                        GCS resumable upload chunk size in MB.
  --csv-delimiter CSV_DELIMITER
                        Delimiter used in the staged table for Earth Engine.
                        Default: tab.
  --csv-qualifier CSV_QUALIFIER
                        Quote character used in the staged table. Default: ".
  --property PROPERTIES
                        Asset metadata property in key=value format. Repeat as
                        needed.
  --stop-on-error       Abort the batch on the first failed file.
  --facilities-defaults
                        Shortcut for uploading
                        data/facilities/facilities_point_files to
                        projects/corestack-datasets/assets/facilities

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant