[Feature] Optimize Elasticsearch Ingestion: Replace Daily Indices with an ILM Rollover Alias

### Search before asking

- [x] I had searched in the [issues](https://github.com/apache/skywalking/issues?q=is%3Aissue) and found no similar feature requirement.


### Description

#### **1. Background**

The current implementation writes data to Elasticsearch by creating a new index for each day (e.g., `skywalking_segment-20250724`). While this approach is straightforward for low data volumes, it presents significant challenges as the amount of data grows. When daily data volume is high, this strategy leads to massive single-day indices (potentially hundreds of gigabytes), causing severe issues:

  * **Degraded Query Performance:** Querying a massive index consumes substantial memory and CPU, resulting in slow queries or even timeouts. This negatively impacts user experience and data analysis efficiency.
  * **Unbalanced Shards:** Shards for high-volume days become excessively large, while shards for low-volume days remain small, leading to inefficient resource allocation.

<img width="561" height="633" alt="Image" src="https://github.com/user-attachments/assets/0c281027-94d9-412f-92a1-59e34c28a6ed" />

#### **2. Proposed Solution**

We propose migrating from the daily index pattern to a strategy that leverages Elasticsearch's built-in **Index Lifecycle Management (ILM)** combined with a **Rollover Alias**.

The core concept of this strategy is:

  * **Write to a single, fixed alias** (e.g., `skywalking_segment`). Both writes and queries will target this alias.
  * **Automate index management with an ILM policy.** When an index meets a defined condition (e.g., its size reaches `15GB` or its age reaches `2d`), ILM automatically creates a new index and seamlessly switches the write alias (`is_write_index: true`) to it.
  * **Automate data retention.** The ILM policy will also automatically handle the lifecycle of old data, such as deleting it after 7 days, without any external intervention.

#### **3. Implementation Steps**

The complete implementation involves the following four key steps:

**Step 1: Create an ILM Policy**

Define a policy that specifies the conditions for the rollover and delete actions.

```json
PUT _ilm/policy/skywalking_segment_ilm_policy
{
  "policy": {
    "phases": {
      "hot": {
        "min_age": "0ms",
        "actions": {
          "rollover": {
            "max_primary_shard_size": "15gb",
            "max_age": "2d"
          }
        }
      },
      "delete": {
        "min_age": "7d",
        "actions": {
          "delete": {}
        }
      }
    }
  }
}
```

  * **Explanation:** A rollover is triggered when a primary shard reaches `15GB` or the index is `2` days old. The data will be automatically deleted after `7` days.

**Step 2: Create an Index Template**

Create a template to automatically apply the ILM policy and settings to all new indices matching the `skywalking_segment-*` pattern.

```json
PUT _index_template/skywalking_segment_template
{
  "index_patterns": [
    "skywalking_segment-*"
  ],
  "template": {
    "settings": {
      "index": {
        "refresh_interval": "5s",
        "number_of_shards": "5",
        "number_of_replicas": "0",
        "lifecycle": {
          "name": "skywalking_segment_ilm_policy",
          "rollover_alias": "skywalking_segment" 
        }
      }
    },
    "mappings": {
      "properties": {
        "message": {
          "type": "text"
        }
      }
    }
  }
}
```

  * **Explanation:** Any new index with a name starting with `skywalking_segment-` will be associated with the `skywalking_segment_ilm_policy` and use `skywalking_segment` as its rollover alias.

**Step 3: Create the Bootstrap Index**

Manually create the very first index and assign the alias to it, explicitly marking it as the write index.

```json
PUT skywalking_segment-000001
{
  "aliases": {
    "skywalking_segment": {
      "is_write_index": true
    }
  }
}
```

  * **Explanation:** This is the "seed" index to start the process. All subsequent indices (`skywalking_segment-000002`, `skywalking_segment-000003`, etc.) will be created and managed automatically by ILM.

**Step 4: Modify Application Code**

This is the most critical change. All logic in the application code that writes to and queries from Elasticsearch must be updated:

  * **Write Operations:** The target destination should be changed from a dynamic, date-based index name (e.g., `skywalking_segment-20250724`) to the **fixed alias** `skywalking_segment`.
  * **Query Operations:** The query target should also be unified to the alias `skywalking_segment`. Since the alias points to all relevant active indices (e.g., `skywalking_segment-000001`, `skywalking_segment-000002`), querying the alias will search across all necessary data.

#### **4. Advantages**

Adopting this solution will yield significant benefits:

1.  **Automated Lifecycle Management:** Eliminates the need for complex index creation/deletion logic in the application code, handing over responsibility to Elasticsearch and reducing maintenance costs.
2.  **Balanced Shard Sizes:** By controlling shard size with `max_primary_shard_size`, we ensure that every shard remains within a healthy and efficient size range, preventing giant shards.
3.  **Improved Query Performance:** Smaller, well-balanced shards lead to faster query speeds and more stable performance.
4.  **Simplified Application Logic:** The application code is decoupled from physical index names and timing concerns; it only needs to interact with a fixed alias.
5.  **Seamless Index Rollover:** The rollover action is atomic, allowing write traffic to transition smoothly from an old index to a new one with no data loss or service interruption.

#### **5. Potential Impact**

  * **Data Migration:** A strategy will be needed to manage existing daily indices. They can be added to a separate ILM policy that only contains a delete phase, or they can be removed manually after they expire.
  * **Configuration Changes:** The project's configuration files will need to be updated, replacing the old index prefix (e.g., `skywalking_segment-20250724`) with the new write alias (e.g., `skywalking_segment`).

**Conclusion:**
This optimization is a critical step to ensure the system remains performant and highly available as data volumes continue to scale. We strongly recommend that the core development team evaluate and adopt this proposal.


### Use case

Data Storage, Logging Module, Elasticsearch Integration

### Related issues

Optimize storage

### Are you willing to submit a pull request to implement this on your own?

- [ ] Yes I am willing to submit a pull request on my own!

### Code of Conduct

- [x] I agree to follow this project's [Code of Conduct](https://www.apache.org/foundation/policies/conduct)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] Optimize Elasticsearch Ingestion: Replace Daily Indices with an ILM Rollover Alias #13383

Search before asking

Description

1. Background

2. Proposed Solution

3. Implementation Steps

4. Advantages

5. Potential Impact

Use case

Related issues

Are you willing to submit a pull request to implement this on your own?

Code of Conduct

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[Feature] Optimize Elasticsearch Ingestion: Replace Daily Indices with an ILM Rollover Alias #13383

Description

Search before asking

Description

1. Background

2. Proposed Solution

3. Implementation Steps

4. Advantages

5. Potential Impact

Use case

Related issues

Are you willing to submit a pull request to implement this on your own?

Code of Conduct

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions