What's the deal with LlamaCPP and caching?

j4k3@lemmy.world · 9 months ago

What's the deal with LlamaCPP and caching?

rufus@discuss.tchncs.de · edit-2 9 months ago

Sorry, I misunderstood you earlier. I thought you had switched form something like exllama to llama.cpp and now the same model behaved differently… And I got a bit confused because you mentioned Llama2 chat model. And I thought you meant the heavily restricted (aligned/“safe”) Llama2-Chat variant 😉 But I got it now.

Euryale seems to be a fine-tune and probably a merge of different other models(?) So someone fed some kind of datasets into it. Probably also containing stories about gladiators, fights, warriors and fan-fiction. It just replicates this. So I’m not that surprised that it does unrealistic combat stories. And even if you correct it, tends to fall back to what it learned earlier. Or tends to drift into lewd stories if it was made to do NSFW stuff and has been fine-tuned also with erotic internet fiction. We’d need to have a look at the dataset to judge why the model behaves like it does. But I don’t think there is any other ‘magic’ involved but the data and stories it got trained on. And 70B is already a size where models aren’t that stupid anymore. It should be able to connect things and grasp most relevant concepts.

I haven’t had a close look at this model, yet. Thanks for sharing. I have a few dollars left on my runpod.io account, so I can start a larger cloud instance and try it once I have some time to spare. My computer at home doesn’t do 70B models.

And thanks for your perspective on storywriting.