Is your feature request related to a problem? Please describe.
Usecase is that i have limited memory but want to test multiple models while still need to maintain certain smaller models always in memory.
Describe the solution you'd like
New Setting: LRU eviction on model load:
LRU eviction that fires whenever a new model /backend is about to be loaded, Triggers if the to be loaded model would not fit in memory (same calculation as on the model gallery). eviction target(s) are biggest models first, keep smaller ones in memory.
Is your feature request related to a problem? Please describe.
Usecase is that i have limited memory but want to test multiple models while still need to maintain certain smaller models always in memory.
Describe the solution you'd like
New Setting: LRU eviction on model load:
LRU eviction that fires whenever a new model /backend is about to be loaded, Triggers if the to be loaded model would not fit in memory (same calculation as on the model gallery). eviction target(s) are biggest models first, keep smaller ones in memory.