As in the title. I know that the word jailbreak comes from rooting Apple phones or something similar. But I am not sure what can be gained from jailbreaking a language model.

It will be able to say “I can’t do that Dave” instead of hallucinating?
Or will only start spewing less sanitary responses?

  • INeedMana@lemmy.worldOP
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 year ago

    I think you’re speaking about jailbreaking a phone, while my question was about jailbreaks in language models (AI, like ChatGPT)