• GingaNinga@lemmy.world
    link
    fedilink
    arrow-up
    24
    ·
    20 hours ago

    AI is completely unreliable to the point of almost being dangerous in sciences. The more niche you get with the question the more likely it is to give you a completely incorrect answer. I’d rather it admit that it doesn’t know.

    • floofloof@lemmy.ca
      link
      fedilink
      arrow-up
      3
      ·
      edit-2
      12 hours ago

      Is there even any suitable “confidence” measure within the LLM that it could use to know when it needs to emit an “I don’t know” response? I wonder whether there’s even any consistent and measurable difference between times when it seems to know what it’s talking about and times when it is talking BS. That might be something that exists in our own cognition but has no counterpart in the workings of an LLM. So it may not even be feasible to engineer it to say “I don’t know” when it doesn’t know. It can’t just straightforwardly look at how many sources it has for an answer and how good they were, because LLMs have typically worked in a more holistic way: each item of training data nudges the behaviour of the whole system, but it doesn’t leave behind any sign that says “I did this,” or any particular piece of knowledge or behaviour that can be ascribed to that training item.

    • brucethemoose@lemmy.world
      link
      fedilink
      arrow-up
      6
      ·
      edit-2
      17 hours ago

      Chatbots are text completion models, improv machines basically, so they don’t really have that ability. You could look at logprobs I guess (aka is it guessing a bunch of words pretty evenly?), but that’s unreliable. Even adding a “I don’t know” token wouldn’t work because that’s not really trainable into text datasets: they don’t know when they don’t know, it’s all just modeling what next word is most likely.

      Some non-autoregressive architectures would be better, but unfortunately “cutting edge” models people interact with like ChatGPT are way more conservatively developed than you’d think. Like, they’ve left tons of innovations unpicked.