[FEA] Multi-node Multi-GPU Kmeans (C++) to support new out-of-core batching

We recently added out-of-core batching to the single-gpu K-means, which allows it to accept a host matrix for training, along with a batch size so that it can break up the dataset to fit batches in device memory. 

We should extend the multi-node mutli-gpu C++ NCCL implementation to allow each rank to specify a host matrix. 

Since most distributed systems will already have partitioned their datasets, we should consider accepting the data as partitioned and setting the batch size automatically (note that the difference here is that each batch *could* potentially be a different size, so we might need a small update to the single-gpu version to also support an array of batch sizes). 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEA] Multi-node Multi-GPU Kmeans (C++) to support new out-of-core batching #1989

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[FEA] Multi-node Multi-GPU Kmeans (C++) to support new out-of-core batching #1989

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions