Splinter already starts with a large pre-allocated shared memory pool. However, individual slots can easily be copied to GPU, and slots previously copied to GPU can be just as easily brought back into Splinter. This would let us do things like keeping the models warm on GPU.
There are additional memory mapping opportunities where keys automatically get shipped to the GPU, but the signal lanes, atomic epochs, and other bits remain in resident RAM.
This would allow supporting anything from USB-C bricks to whatever is in the Debian Contrib repo (I have no access to these things, but I don't need it for 99% of the work).
Splinter already starts with a large pre-allocated shared memory pool. However, individual slots can easily be copied to GPU, and slots previously copied to GPU can be just as easily brought back into Splinter. This would let us do things like keeping the models warm on GPU.
There are additional memory mapping opportunities where keys automatically get shipped to the GPU, but the signal lanes, atomic epochs, and other bits remain in resident RAM.
This would allow supporting anything from USB-C bricks to whatever is in the Debian Contrib repo (I have no access to these things, but I don't need it for 99% of the work).