Skip to content

[FEA] Multi-node Multi-GPU Kmeans (C++) to support new out-of-core batching #1989

@cjnolet

Description

@cjnolet

We recently added out-of-core batching to the single-gpu K-means, which allows it to accept a host matrix for training, along with a batch size so that it can break up the dataset to fit batches in device memory.

We should extend the multi-node mutli-gpu C++ NCCL implementation to allow each rank to specify a host matrix.

Since most distributed systems will already have partitioned their datasets, we should consider accepting the data as partitioned and setting the batch size automatically (note that the difference here is that each batch could potentially be a different size, so we might need a small update to the single-gpu version to also support an array of batch sizes).

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    Status

    Todo

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions