They support Claude, ChatGPT, Gemini, HuggingChat, and Mistral.

  • Lojcs@lemm.ee
    link
    fedilink
    arrow-up
    2
    arrow-down
    1
    ·
    22 hours ago

    Last time I tried using a local llm (about a year ago) it generated only a couple words per second and the answers were barely relevant. Also I don’t see how a local llm can fulfill the glorified search engine role that people use llms for.

    • ocassionallyaduck@lemmy.world
      link
      fedilink
      arrow-up
      2
      ·
      16 hours ago

      Try again. Simplified models take the large ones and pare them down in terms of memory requirements, and can be run off the CPU even. The “smol” model I mentioned is real, and hyperfast.

      Llama 3.2 is pretty solid as well.

      • Lojcs@lemm.ee
        link
        fedilink
        arrow-up
        1
        ·
        edit-2
        10 hours ago

        These are the answers they gave the first time.

        Qwencoder is persistent after 6 rerolls.

        Anyways, how do I make these use my gpu? ollama logs say the model will fit into vram / offloaing all layers but gpu usage doesn’t change and cpu gets the load. And regardless of the model size vram usage never changes and ram only goes up by couple hundred megabytes. Any advice? (Linux / Nvidia) Edit: it didn’t have cuda enabled apparently, fixed now

        • ocassionallyaduck@lemmy.world
          link
          fedilink
          arrow-up
          3
          ·
          9 hours ago

          Nice.

          Yea I don’t trust any AI models for facts, period. They all just lie. Confidently. The smol model there at least tried and got it right at first… Before confusing the sentence context.

          Qwen is a good model too. But if you wanted something to run home automation or do text summaroes, smol is solid enough. I’m using CPU so it’s good enough.

    • TheDorkfromYork@lemm.ee
      link
      fedilink
      English
      arrow-up
      2
      ·
      22 hours ago

      They’re fast and high quality now. ChatGPT is the best, but local llms are great, even with 10gb of vram.