• Pennomi@lemmy.world
      link
      fedilink
      English
      arrow-up
      3
      ·
      25 days ago

      It depends. A lot of LLMs are memory-constrained. If you’re constantly thrashing the GPU memory it can be both slower and less efficient.