• @phil_m@lemmy.ml
    link
    fedilink
    31 year ago

    Although I would find it really impressive, I don’t think we are anywhere near compressing that much information in so few parameters. Just think about all the specialized knowledge ChatGPT4 has. I think almost the whole internet or so is somehow (better or worse) encoded in the model.

    As the title already reveals the trend is the opposite (to be able to encode/reason about even more data with bigger models/datasets).

    But yeah it’s kind of concerning that all this power is in so few hands/companies.

    • @k_o_t@lemmy.ml
      link
      fedilink
      21 year ago

      certainly more weights contain more general information, which is pretty useful if you’re using a model is a sort of secondary search engine, but models can be very performant in certain benchmarks while containing little general data

      this isn’t really by design, up until now (and it’s still continuing to be that way), it’s just that we don’t know how to create an LLM, which can generate coherent text without absorbing a huge portion of the training material

      i’ve tried several models based on facebook’s llama LLMs, and i can say that the 13B and definitely 30B versions are comparable to chatGPT in terms of quality (maybe not in terms of the amount of information it has access to, but definitely in other regards)