Apologies for the basic question, but what’s the difference between GGML and GPTQ? Do these just refer to different compression methods? Which would you choose if you’re using a 3090ti GPU?

  • markon@lemmy.world
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 year ago

    Also llama.cpp offers very fast performance with the ggmls compared to using transformers, and sometimes faster than ExLlama.