Open
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Added General GEE Large Geojson File Uploader
General script to upload any geojson file to GEE, using GCS as a staging area. It has been tested on multiple files, and can upload large files (tested on upto 6GB file).
Uses chunking to upload to GCS (default 64 MB).
Highly configurable and uses argparse for CLI options:
Usage Example
single file upload
python -m utilities.scripts.gee_upload
--file data/pan_india_facilities/pan_india_facilities.geojson
--service-account-json data/gee_confs/core-stack-learn-818963fa8f26.json
--wait
--poll-interval 120
--gcs-prefix proximity
--asset-id projects/corestack-datasets/assets/datasets/pan_india_facilities
--replace-existing
batch upload
python -m utilities.scripts.gee_upload
--directorydata/pan_india_facilities/
--gee-account-id
--wait
--poll-interval 120
--gcs-prefix proximity
--asset-parent projects/corestack-datasets/assets/datasets/facilities
--make-public
--replace-existing
Check all usage commands:
python -m utilities.scripts.gee_upload
python -m utilities.scripts.gee_upload -h usage: gee_upload.py [-h] [--file FILES] [--directory DIRECTORY] [--asset-parent ASSET_PARENT] [--asset-id ASSET_ID] [--asset-name ASSET_NAME] [--gee-account-id GEE_ACCOUNT_ID] [--service-account-json SERVICE_ACCOUNT_JSON] [--gcs-bucket GCS_BUCKET] [--gcs-prefix GCS_PREFIX] [--replace-existing] [--wait] [--make-public] [--cleanup-gcs] [--poll-interval POLL_INTERVAL] [--timeout-seconds TIMEOUT_SECONDS] [--max-vertices MAX_VERTICES] [--max-error-meters MAX_ERROR_METERS] [--chunk-size-mb CHUNK_SIZE_MB] [--csv-delimiter CSV_DELIMITER] [--csv-qualifier CSV_QUALIFIER] [--property PROPERTIES] [--stop-on-error] [--facilities-defaults] Upload GeoJSON/JSON/GeoJSONL vector files into Earth Engine assets by staging them in GCS and importing them as table assets. options: -h, --help show this help message and exit --file FILES Path to an input file. Repeat the flag to upload multiple files. --directory DIRECTORY Directory containing .geojson/.json/.geojsonl files to upload. --asset-parent ASSET_PARENT Earth Engine asset folder, e.g. projects/corestack- datasets/assets/facilities --asset-id ASSET_ID Full Earth Engine asset id for a single-file upload. --asset-name ASSET_NAME Optional asset name override for a single-file upload. --gee-account-id GEE_ACCOUNT_ID Use credentials stored in the Django GEEAccount model. --service-account-json SERVICE_ACCOUNT_JSON Path to a service-account JSON file to use instead of Django GEEAccount credentials. --gcs-bucket GCS_BUCKET GCS bucket used as the staging area. Defaults to core_stack. --gcs-prefix GCS_PREFIX Prefix inside the GCS bucket for staged CSV files. --replace-existing Delete an existing EE asset with the same id before re- uploading. --wait Wait for Earth Engine ingestion to finish before exiting. --make-public After a successful upload, set the asset ACL to public if allowed. --cleanup-gcs Delete the staged GCS object after a successful upload. --poll-interval POLL_INTERVAL Polling interval in seconds when waiting for ingestion tasks. --timeout-seconds TIMEOUT_SECONDS Optional timeout while waiting for ingestion tasks. --max-vertices MAX_VERTICES Optional Earth Engine geometry split threshold. --max-error-meters MAX_ERROR_METERS Maximum reprojection error in meters for ingestion. --chunk-size-mb CHUNK_SIZE_MB GCS resumable upload chunk size in MB. --csv-delimiter CSV_DELIMITER Delimiter used in the staged table for Earth Engine. Default: tab. --csv-qualifier CSV_QUALIFIER Quote character used in the staged table. Default: ". --property PROPERTIES Asset metadata property in key=value format. Repeat as needed. --stop-on-error Abort the batch on the first failed file. --facilities-defaults Shortcut for uploading data/facilities/facilities_point_files to projects/corestack-datasets/assets/facilities