ref:
ggerganov/ggml#302
#1991
This PR paves the way for integrating more models into llama.cpp. It changes the file format in which we convert the models by extending it with key-value pairs meta...
They still consider it a beta but there we go! It’s happening :D
Is there any reason why support for loading both formats cannot be included within GGML/llama.cpp directly?
It could be (and I bet koboldcpp and maybe other projects will take that route). There absolutely is a disadvantage to dragging around a lot of legacy stuff for compatibility. llama.cpp/ggml’s approach has pretty much always been to favor rapid development over compatibility.
As I understand it, the new format is basically the same as the old format
I’m not sure that’s really accurate. There are significant differences in how the model vocabulary is handled, for instance.
Even if it was true right now, in the very first version of GGUF that is merged it’ll likely be less true as GGUF evolves and the stuff it enables starts getting used more. Having to maintain compatibility with the GGML stuff would make iterating on GGUF and adding new features more difficult.
It could be (and I bet koboldcpp and maybe other projects will take that route). There absolutely is a disadvantage to dragging around a lot of legacy stuff for compatibility. llama.cpp/ggml’s approach has pretty much always been to favor rapid development over compatibility.
I’m not sure that’s really accurate. There are significant differences in how the model vocabulary is handled, for instance.
Even if it was true right now, in the very first version of GGUF that is merged it’ll likely be less true as GGUF evolves and the stuff it enables starts getting used more. Having to maintain compatibility with the GGML stuff would make iterating on GGUF and adding new features more difficult.