Transmuting pointers, using unsafe sections, and padding elements to different cache lines is quite tricky in Rust, and not always straightforward & safe in C++ too. The following "map-reduce"-like operations can be introduced:
pub fn try_map<T, U, F>(
&self,
inputs: &[T],
map: F,
) -> Result<Vec<Padding64<U>, A>, ForkUnionError>
where ...
The full set of operations may include: try_map, try_map_reduce, try_map_filter_reduce. The try_map_reduce and try_map_filter_reduce should be much cheaper to run, than combining try_map with subsequent separate reduction. In optimized versions, we can keep just one "folding reduction result" privately per thread, and can be outputted into a smaller container the size of the thread pool - no the inputs container.
Relevant considerations:
- What kinds of
inputs containers can be accepted, and have sub-linear iteration cost, so that we can cheaply split the work between multiple worker threads?
- Should we use the same allocator with which the
ForkUnion was created or allow passing an additional one for the exported execution results?
- For non-flat
inputs, should the method be called try_flat_map?
Transmuting pointers, using
unsafesections, and padding elements to different cache lines is quite tricky in Rust, and not always straightforward & safe in C++ too. The following "map-reduce"-like operations can be introduced:The full set of operations may include:
try_map,try_map_reduce,try_map_filter_reduce. Thetry_map_reduceandtry_map_filter_reduceshould be much cheaper to run, than combiningtry_mapwith subsequent separate reduction. In optimized versions, we can keep just one "folding reduction result" privately per thread, and can be outputted into a smaller container the size of the thread pool - no theinputscontainer.Relevant considerations:
inputscontainers can be accepted, and have sub-linear iteration cost, so that we can cheaply split the work between multiple worker threads?ForkUnionwas created or allow passing an additional one for the exported execution results?inputs, should the method be calledtry_flat_map?