Skip to content

[Feature] Optimize Elasticsearch Ingestion: Replace Daily Indices with an ILM Rollover Alias #13383

@chj9

Description

@chj9

Search before asking

  • I had searched in the issues and found no similar feature requirement.

Description

1. Background

The current implementation writes data to Elasticsearch by creating a new index for each day (e.g., skywalking_segment-20250724). While this approach is straightforward for low data volumes, it presents significant challenges as the amount of data grows. When daily data volume is high, this strategy leads to massive single-day indices (potentially hundreds of gigabytes), causing severe issues:

  • Degraded Query Performance: Querying a massive index consumes substantial memory and CPU, resulting in slow queries or even timeouts. This negatively impacts user experience and data analysis efficiency.
  • Unbalanced Shards: Shards for high-volume days become excessively large, while shards for low-volume days remain small, leading to inefficient resource allocation.
Image

2. Proposed Solution

We propose migrating from the daily index pattern to a strategy that leverages Elasticsearch's built-in Index Lifecycle Management (ILM) combined with a Rollover Alias.

The core concept of this strategy is:

  • Write to a single, fixed alias (e.g., skywalking_segment). Both writes and queries will target this alias.
  • Automate index management with an ILM policy. When an index meets a defined condition (e.g., its size reaches 15GB or its age reaches 2d), ILM automatically creates a new index and seamlessly switches the write alias (is_write_index: true) to it.
  • Automate data retention. The ILM policy will also automatically handle the lifecycle of old data, such as deleting it after 7 days, without any external intervention.

3. Implementation Steps

The complete implementation involves the following four key steps:

Step 1: Create an ILM Policy

Define a policy that specifies the conditions for the rollover and delete actions.

PUT _ilm/policy/skywalking_segment_ilm_policy
{
  "policy": {
    "phases": {
      "hot": {
        "min_age": "0ms",
        "actions": {
          "rollover": {
            "max_primary_shard_size": "15gb",
            "max_age": "2d"
          }
        }
      },
      "delete": {
        "min_age": "7d",
        "actions": {
          "delete": {}
        }
      }
    }
  }
}
  • Explanation: A rollover is triggered when a primary shard reaches 15GB or the index is 2 days old. The data will be automatically deleted after 7 days.

Step 2: Create an Index Template

Create a template to automatically apply the ILM policy and settings to all new indices matching the skywalking_segment-* pattern.

PUT _index_template/skywalking_segment_template
{
  "index_patterns": [
    "skywalking_segment-*"
  ],
  "template": {
    "settings": {
      "index": {
        "refresh_interval": "5s",
        "number_of_shards": "5",
        "number_of_replicas": "0",
        "lifecycle": {
          "name": "skywalking_segment_ilm_policy",
          "rollover_alias": "skywalking_segment" 
        }
      }
    },
    "mappings": {
      "properties": {
        "message": {
          "type": "text"
        }
      }
    }
  }
}
  • Explanation: Any new index with a name starting with skywalking_segment- will be associated with the skywalking_segment_ilm_policy and use skywalking_segment as its rollover alias.

Step 3: Create the Bootstrap Index

Manually create the very first index and assign the alias to it, explicitly marking it as the write index.

PUT skywalking_segment-000001
{
  "aliases": {
    "skywalking_segment": {
      "is_write_index": true
    }
  }
}
  • Explanation: This is the "seed" index to start the process. All subsequent indices (skywalking_segment-000002, skywalking_segment-000003, etc.) will be created and managed automatically by ILM.

Step 4: Modify Application Code

This is the most critical change. All logic in the application code that writes to and queries from Elasticsearch must be updated:

  • Write Operations: The target destination should be changed from a dynamic, date-based index name (e.g., skywalking_segment-20250724) to the fixed alias skywalking_segment.
  • Query Operations: The query target should also be unified to the alias skywalking_segment. Since the alias points to all relevant active indices (e.g., skywalking_segment-000001, skywalking_segment-000002), querying the alias will search across all necessary data.

4. Advantages

Adopting this solution will yield significant benefits:

  1. Automated Lifecycle Management: Eliminates the need for complex index creation/deletion logic in the application code, handing over responsibility to Elasticsearch and reducing maintenance costs.
  2. Balanced Shard Sizes: By controlling shard size with max_primary_shard_size, we ensure that every shard remains within a healthy and efficient size range, preventing giant shards.
  3. Improved Query Performance: Smaller, well-balanced shards lead to faster query speeds and more stable performance.
  4. Simplified Application Logic: The application code is decoupled from physical index names and timing concerns; it only needs to interact with a fixed alias.
  5. Seamless Index Rollover: The rollover action is atomic, allowing write traffic to transition smoothly from an old index to a new one with no data loss or service interruption.

5. Potential Impact

  • Data Migration: A strategy will be needed to manage existing daily indices. They can be added to a separate ILM policy that only contains a delete phase, or they can be removed manually after they expire.
  • Configuration Changes: The project's configuration files will need to be updated, replacing the old index prefix (e.g., skywalking_segment-20250724) with the new write alias (e.g., skywalking_segment).

Conclusion:
This optimization is a critical step to ensure the system remains performant and highly available as data volumes continue to scale. We strongly recommend that the core development team evaluate and adopt this proposal.

Use case

Data Storage, Logging Module, Elasticsearch Integration

Related issues

Optimize storage

Are you willing to submit a pull request to implement this on your own?

  • Yes I am willing to submit a pull request on my own!

Code of Conduct

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions