PERF: Reduce excessive memory usage in Ray

No specific reproducer attached, but this is observable on a variety of workloads using large DFs.

Modin on Ray uses excessive amounts of memory for a wide range of operations, and we should investigate how this can be reduced. Consider this diagram from the authors of Dias [[paper](https://arxiv.org/pdf/2303.16146) + [GH](https://github.com/ADAPT-uiuc/dias)] and [PandasBench](https://arxiv.org/pdf/2506.02345), where Modin uses obscene amounts of memory for some notebooks:

<img width="966" height="524" alt="Image" src="https://github.com/user-attachments/assets/1cdc0c67-2cea-4f3c-ab4c-1d8ef47c8380" />

While testing an implementation for parallel writes from Ray datasets to Snowflake, we also observed the upload of a ~3GB in-memory dataframe to be using well over 100GB RAM.

Further areas of investigation:
- Is this entirely a Ray issue, or are there things Modin can do to address it?
- Is excessive memory usage also an issue for Modin on Dask? The PandasBench paper indicates that dask DFs' peak memory consumption is relatively low.

See also: https://github.com/modin-project/modin/issues/5524

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PERF: Reduce excessive memory usage in Ray #7655

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

PERF: Reduce excessive memory usage in Ray #7655

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions