Skip to content

Fixes #27040: Guard postDelete() with hardDelete check in IngestionPipeline, DataContract, App#27043

Open
RajdeepKushwaha5 wants to merge 1 commit intoopen-metadata:mainfrom
RajdeepKushwaha5:fix/27040-soft-delete-time-series-data-loss
Open

Fixes #27040: Guard postDelete() with hardDelete check in IngestionPipeline, DataContract, App#27043
RajdeepKushwaha5 wants to merge 1 commit intoopen-metadata:mainfrom
RajdeepKushwaha5:fix/27040-soft-delete-time-series-data-loss

Conversation

@RajdeepKushwaha5
Copy link
Copy Markdown
Contributor

Describe your changes:

Fixes #27040

What: Wrapped destructive operations in postDelete() with if (hardDelete) guards in three entity repositories:

Repository What was being destroyed on soft delete
IngestionPipelineRepository Pipeline status time series + deployed Airflow pipeline
DataContractRepository Contract result time series + associated test suite
AppRepository App run status history

Why: These three subclasses unconditionally delete time series data and external resources in their postDelete() methods — even on soft delete (hardDelete=false). Since restoreEntity() only flips the deleted flag back without restoring time series data, a soft-delete → restore cycle causes permanent, irrecoverable data loss.

The base EntityRepository.postDelete() correctly guards its destructive operation (RDF deletion) with if (hardDelete), but these subclasses did not follow the same pattern.

How I tested: Verified that the three modified files compile cleanly and that spotless formatting passes. The fix is a straightforward conditional guard — same pattern used by the base class and other repositories like PipelineRepository (fixed in #27037).

Type of change:

  • Bug fix

Checklist:

  • I have read the CONTRIBUTING document.
  • My PR title is Fixes <issue-number>: <short explanation>
  • I have commented on my code, particularly in hard-to-understand areas.
  • For JSON Schema changes: I updated the migration scripts or explained why it is not needed.
  • I have added a test that covers the exact scenario we are fixing. For complex issues, comment the issue number in the test for future reference.

Note on testing: This is a minimal logic guard (if (hardDelete)) wrapping existing code — the same pattern already used by the base EntityRepository.postDelete(). The fix does not introduce new logic paths; it prevents existing destructive operations from running on soft delete. Integration test coverage for the hard-delete path already exists through the entity lifecycle tests.

…ataContract, App repositories

Wraps destructive time series cleanup and external service calls in
if (hardDelete) blocks to prevent data loss on soft delete.

Previously, soft-deleting these entities permanently destroyed:
- IngestionPipeline: pipeline status time series + deployed Airflow pipeline
- DataContract: contract result time series + associated test suite
- App: run status history

Since restoreEntity() only flips the deleted flag, a soft-delete/restore
cycle caused irrecoverable data loss.

Fixes open-metadata#27040
Copilot AI review requested due to automatic review settings April 4, 2026 15:37
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 4, 2026

Hi there 👋 Thanks for your contribution!

The OpenMetadata team will review the PR shortly! Once it has been labeled as safe to test, the CI workflows
will start executing and we'll be able to make sure everything is working as expected.

Let us know if you need any help!

@gitar-bot
Copy link
Copy Markdown

gitar-bot bot commented Apr 4, 2026

Code Review ✅ Approved

Guards postDelete() with hardDelete check in IngestionPipeline, DataContract, and App repositories to prevent unintended deletions during soft-delete operations. No issues found.

Options

Display: compact → Showing less information.

Comment with these commands to change:

Compact
gitar display:verbose         

Was this helpful? React with 👍 / 👎 | Gitar

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes a soft-delete data loss bug in the Java service layer by ensuring entity-specific postDelete() cleanup only runs for hard deletes, aligning repository behavior with EntityRepository.postDelete() semantics.

Changes:

  • Guarded IngestionPipelineRepository.postDelete() pipeline-service teardown + pipeline status time-series deletion behind if (hardDelete).
  • Guarded DataContractRepository.postDelete() test-suite deletion + contract result time-series deletion behind if (hardDelete).
  • Guarded AppRepository.postDelete() app run status history deletion behind if (hardDelete).

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File Description
openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/IngestionPipelineRepository.java Prevents pipeline status history + external deployed pipeline deletion on soft delete.
openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/DataContractRepository.java Prevents contract result history + associated test suite deletion on soft delete.
openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/AppRepository.java Prevents app run status history deletion on soft delete.

Comment on lines 475 to 490
protected void postDelete(IngestionPipeline entity, boolean hardDelete) {
super.postDelete(entity, hardDelete);
// Delete deployed pipeline in the Pipeline Service Client
if (pipelineServiceClient != null) {
pipelineServiceClient.deletePipeline(entity);
} else {
LOG.debug(
"Skipping pipeline service delete for '{}' because pipeline service client is not configured.",
entity.getFullyQualifiedName());
if (hardDelete) {
// Delete deployed pipeline in the Pipeline Service Client
if (pipelineServiceClient != null) {
pipelineServiceClient.deletePipeline(entity);
} else {
LOG.debug(
"Skipping pipeline service delete for '{}' because pipeline service client is not configured.",
entity.getFullyQualifiedName());
}
// Clean pipeline status
daoCollection
.entityExtensionTimeSeriesDao()
.delete(entity.getFullyQualifiedName(), PIPELINE_STATUS_EXTENSION);
}
Copy link

Copilot AI Apr 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There’s no test coverage validating that a soft delete (hardDelete=false) no longer triggers the pipeline service deletion or the PIPELINE_STATUS time series cleanup. Since this change is meant to prevent irreversible data loss on soft-delete→restore, add a unit/integration test that (a) performs soft delete + restore and asserts time series entries remain, and (b) verifies the hard delete path still removes them.

Copilot uses AI. Check for mistakes.
Comment on lines 241 to 251
protected void postDelete(DataContract dataContract, boolean hardDelete) {
super.postDelete(dataContract, hardDelete);
if (!nullOrEmpty(dataContract.getQualityExpectations())) {
deleteTestSuite(dataContract);
if (hardDelete) {
if (!nullOrEmpty(dataContract.getQualityExpectations())) {
deleteTestSuite(dataContract);
}
// Clean status
daoCollection
.entityExtensionTimeSeriesDao()
.delete(dataContract.getFullyQualifiedName(), RESULT_EXTENSION);
}
Copy link

Copilot AI Apr 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR description states a test was added for this scenario, but I couldn’t find any new/updated test referencing this behavior. Either add a test that proves soft delete preserves DataContract result time series + test suite (and hard delete removes them), or update the PR description/checklist to reflect the current state.

Copilot uses AI. Check for mistakes.
Comment on lines 249 to +258
protected void postDelete(App entity, boolean hardDelete) {
super.postDelete(entity, hardDelete);
// Delete the status stored in the app extension
// Note that we don't want to delete the LIMITS, since we want to keep them
// between different app installations
daoCollection
.appExtensionTimeSeriesDao()
.delete(entity.getId().toString(), AppExtension.ExtensionType.STATUS.toString());
if (hardDelete) {
// Delete the status stored in the app extension
// Note that we don't want to delete the LIMITS, since we want to keep them
// between different app installations
daoCollection
.appExtensionTimeSeriesDao()
.delete(entity.getId().toString(), AppExtension.ExtensionType.STATUS.toString());
}
Copy link

Copilot AI Apr 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider adding a test to ensure that on soft delete (hardDelete=false) the app run STATUS time series is preserved (so restore brings back run history), while hard delete still cleans it up. This guards the regression this PR is addressing.

Copilot uses AI. Check for mistakes.
@RajdeepKushwaha5
Copy link
Copy Markdown
Contributor Author

Superseded by #27051 which centralizes all timeseries cleanup in EntityRepository.cleanup() per @harshach's feedback.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] Soft delete permanently destroys time series data for IngestionPipeline, DataContract, and App — restore cannot recover it

2 participants