Unfortunately that’s not really relevant to LLMs beyond inserting things into the text you feed them. For every single word they predict, they make a pass through the multi-gigabyte weights. Its largely memory bound, and not integrated with any kind of sane external memory algorithm.
There are some techniques that muddy this a bit, like MoE and dynamic lora loading, but the principle is the same.
Yep. I didn’t mean to process shame you or anything, just trying to point out obscure but potentially useful projects most don’t know about :P