Abstract: The memory footprint of modern applications like large language models (LLMs) far exceeds the memory capacity of accelerators they run on and often spills over to host memory. As model sizes ...