Skip to content

Issue#27307#27505

Open
karthik120710 wants to merge 5 commits intoopen-metadata:mainfrom
karthik120710:issue#27307
Open

Issue#27307#27505
karthik120710 wants to merge 5 commits intoopen-metadata:mainfrom
karthik120710:issue#27307

Conversation

@karthik120710
Copy link
Copy Markdown

@karthik120710 karthik120710 commented Apr 18, 2026

I improved the live search indexing pipeline to reduce database write amplification
and improve recovery behavior during ES/OS outages.

Changes made:

  • SearchIndexRetryQueue — replaced per-failure DB upsert with an in-memory
    ConcurrentLinkedQueue buffer. A daemon flusher drains it every 500 ms or when
    50 entries accumulate, writing them in a single @SqlBatch upsert. Added a new
    SEARCH_UNAVAILABLE status so entries written during ES/OS downtime are
    distinguishable from real mapping/data failures. Buffer is capped at 2000 entries
    to prevent memory exhaustion; overflow is dropped and counted via a Micrometer
    counter.

  • SearchIndexRetryWorker — drives the flusher lifecycle (startFlusher on
    start, stopFlusher on stop to ensure a final flush before shutdown). Tracks
    ES/OS availability transitions: when the cluster recovers, bulk-resets all
    SEARCH_UNAVAILABLE rows back to PENDING for immediate retry. Added
    STATUS_SEARCH_UNAVAILABLE to PURGEABLE_QUEUE_STATUSES so it is cleaned up
    when a full reindex suspends streaming.

  • ReindexingOrchestrator — added cleanupRetryQueuePreFlight() called from
    preflightFixes(). For a full reindex it deletes all purgeable statuses; for a
    partial reindex it deletes only rows matching the selected entity types. This
    prevents the retry worker from racing against the reindex job.

  • CollectionDAO.SearchIndexRetryQueueDAO — added batchUpsert() (@SqlBatch
    with MySQL/Postgres variants), resetSearchUnavailableToPending(), and
    deleteByEntityTypes().

No schema migration neededstatus column is VARCHAR(32) with no check
constraint; SEARCH_UNAVAILABLE fits without any DDL change.

Testing: Added unit tests covering buffer enqueue/flush behavior, overflow
protection, status assignment, flusher lifecycle, availability transition resets,
and preflight cleanup logic.
#27307

Pasted Graphic

Summary by Gitar

  • New RDF Indexing Infrastructure:
    • Added RdfIndexJobDAO, RdfIndexPartitionDAO, RdfReindexLockDAO, and RdfIndexServerStatsDAO to support distributed RDF index jobs.
    • Implemented necessary record types, row mappers, and connection-aware SQL methods for managing RDF job lifecycle and statistics.
  • Database Enhancements:
    • Added countByRelationType and listAllStatesForInstance queries to CollectionDAO.
    • Optimized deleteLineageBySourcePipeline to correctly handle pipeline.id and toId relations.
    • Added markRunningEntriesFailedByName for managing application status transitions.

This will update automatically on new commits.

@github-actions
Copy link
Copy Markdown
Contributor

Hi there 👋 Thanks for your contribution!

The OpenMetadata team will review the PR shortly! Once it has been labeled as safe to test, the CI workflows
will start executing and we'll be able to make sure everything is working as expected.

Let us know if you need any help!

@github-actions
Copy link
Copy Markdown
Contributor

Hi there 👋 Thanks for your contribution!

The OpenMetadata team will review the PR shortly! Once it has been labeled as safe to test, the CI workflows
will start executing and we'll be able to make sure everything is working as expected.

Let us know if you need any help!

@mohityadav766 mohityadav766 added the safe to test Add this label to run secure Github workflows on PRs label Apr 20, 2026
@github-actions
Copy link
Copy Markdown
Contributor

Hi there 👋 Thanks for your contribution!

The OpenMetadata team will review the PR shortly! Once it has been labeled as safe to test, the CI workflows
will start executing and we'll be able to make sure everything is working as expected.

Let us know if you need any help!

@github-actions
Copy link
Copy Markdown
Contributor

The Java checkstyle failed.

Please run mvn spotless:apply in the root of your repository and commit the changes to this PR.
You can also use pre-commit to automate the Java code formatting.

You can install the pre-commit hooks with make install_test precommit_install.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 20, 2026

🟡 Playwright Results — all passed (20 flaky)

✅ 3667 passed · ❌ 0 failed · 🟡 20 flaky · ⏭️ 89 skipped

Shard Passed Failed Flaky Skipped
🟡 Shard 1 480 0 1 4
🟡 Shard 2 648 0 5 7
🟡 Shard 3 652 0 7 1
🟡 Shard 4 632 0 2 27
🟡 Shard 5 610 0 1 42
🟡 Shard 6 645 0 4 8
🟡 20 flaky test(s) (passed on retry)
  • Pages/UserCreationWithPersona.spec.ts › Create user with persona and verify on profile (shard 1, 1 retry)
  • Features/BulkEditEntity.spec.ts › Glossary (shard 2, 1 retry)
  • Features/ColumnBulkOperations.spec.ts › should discard changes when closing drawer without saving (shard 2, 1 retry)
  • Features/DataProductDomainMigration.spec.ts › Data product with no assets can change domain without confirmation (shard 2, 1 retry)
  • Features/DataProductPersonaCustomization.spec.ts › Data Product - customize tab label should only render if it's customized by user (shard 2, 1 retry)
  • Features/Glossary/GlossaryHierarchy.spec.ts › should cancel move operation (shard 2, 1 retry)
  • Features/IncidentManager.spec.ts › Complete Incident lifecycle with table owner (shard 3, 2 retries)
  • Features/RestoreEntityInheritedFields.spec.ts › Validate restore with Inherited domain and data products assigned (shard 3, 2 retries)
  • Features/RestoreEntityInheritedFields.spec.ts › Validate restore with Inherited domain and data products assigned (shard 3, 1 retry)
  • Features/RestoreEntityInheritedFields.spec.ts › Validate restore with Inherited domain and data products assigned (shard 3, 1 retry)
  • Features/RestoreEntityInheritedFields.spec.ts › Validate restore with Inherited domain and data products assigned (shard 3, 1 retry)
  • Features/RTL.spec.ts › Verify Following widget functionality (shard 3, 1 retry)
  • Flow/CustomizeWidgets.spec.ts › My Tasks Widget (shard 3, 1 retry)
  • Pages/Customproperties-part2.spec.ts › entityReferenceList shows item count, scrollable list, no expand toggle (shard 4, 1 retry)
  • Pages/DataContracts.spec.ts › Create Data Contract and validate for Store Procedure (shard 4, 1 retry)
  • Pages/Glossary.spec.ts › Add and Remove Assets (shard 5, 1 retry)
  • Pages/Lineage/LineageFilters.spec.ts › Verify lineage schema filter selection (shard 6, 1 retry)
  • Pages/Lineage/LineageRightPanel.spec.ts › Verify custom properties tab IS visible for supported type: searchIndex (shard 6, 1 retry)
  • Pages/ODCSImportExport.spec.ts › Multi-object ODCS contract - object selector shows all schema objects (shard 6, 1 retry)
  • Pages/Users.spec.ts › Permissions for table details page for Data Consumer (shard 6, 1 retry)

📦 Download artifacts

How to debug locally
# Download playwright-test-results-<shard> artifact and unzip
npx playwright show-trace path/to/trace.zip    # view trace

@mohityadav766
Copy link
Copy Markdown
Member

@karthik120710 there are integration test failures can you check
Also please add the integration test for scenarios that are being fixed

…and unit tests for search unavailable scenarios
@gitar-bot
Copy link
Copy Markdown

gitar-bot Bot commented Apr 21, 2026

Code Review ✅ Approved 1 resolved / 1 findings

Updated deleteByEntityTypes to exclude IN_PROGRESS rows, aligning it with the full-reindex path logic. No issues found.

✅ 1 resolved
Bug: deleteByEntityTypes deletes IN_PROGRESS rows, unlike full-reindex path

📄 openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/CollectionDAO.java:11043-11044 📄 openmetadata-service/src/main/java/org/openmetadata/service/apps/bundles/searchIndex/ReindexingOrchestrator.java:179-182
For a full reindex, cleanupRetryQueuePreFlight() calls deleteByStatuses(ALL_PURGEABLE_STATUSES) which correctly preserves IN_PROGRESS and COMPLETED rows. For a partial reindex, it calls deleteByEntityTypes(...) whose SQL is DELETE FROM search_index_retry_queue WHERE entityType IN (...) — this unconditionally deletes all rows for those entity types, including IN_PROGRESS entries that the SearchIndexRetryWorker is actively processing. This inconsistency means a partial reindex can silently discard work-in-flight, leading to entities that are never indexed and never retried.

Options

Display: compact → Showing less information.

Comment with these commands to change:

Compact
gitar display:verbose         

Was this helpful? React with 👍 / 👎 | Gitar

@sonarqubecloud
Copy link
Copy Markdown

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

safe to test Add this label to run secure Github workflows on PRs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants