unsure on how to quantize model

brokenlcd · 2 months ago

unsure on how to quantize model

hendrik@palaver.p3x.de · edit-2 2 months ago

I believe exllama and vllm offer quantization. But llama.cpp should be able to run on a graphics card as well, maybe the default settings are wrong for your computer. Or you have like an AMD card and need a different build of llama.cpp?

And by the way, you don’t need to quantize that model yourself. Some people already uploaded that in several quantized formats to Huggingface. AWQ, GGUF, exl2 …