Who knew world peace would be so easy?

nuke@sh.itjust.works · 5 months ago

Who knew world peace would be so easy?

Meowoem@sh.itjust.works · 5 months ago

People need to realise that LLMs are not just Markov chains, the math is far more complex than just guessing which word comes next - they have structure where concepts come before word choice, this is why they can very clearly be seen making novel structures such as code.

Lemvi@lemmy.sdf.org · 4 months ago

LLMs are absolutely complex, neural nets ARE somewhat modelled after human brains after all, and trying to understand transformers or LSTMs for the first time is a real pain. However, what a NN can do, or rather what it has been trained to do depends almost entirely on the loss function used. The complexity of the architecture and the training dataset don’t change what a LLM can do, only how good it is at doing that, and how well it generalizes. The loss function used for the training of LLMs simply evaluates whether the predicted tokens fit the actual ones. That means that an LLM is trained to predict words from other words, or in other words, to model language.

The loss function does not evaluate the validity of logical statements, though. All reasoning that an LLM is capable of, or seems to be capable of, emerges from its modelling of language, not an actual understanding of logic.