- cross-posted to:
- technology@lemmy.world
- cross-posted to:
- technology@lemmy.world
I asked Google Bard whether it thought Web Environment Integrity was a good or bad idea. Surprisingly, not only did it respond that it was a bad idea, it even went on to urge Google to drop the proposal.
For the last time: these language models are just regurgitating what people have said. They don’t analyze or reason.
That’s not entirely true.
LLMs are trained to predict next word given context, yes. But in order to do that, they develop internal model that minimizes error across wide range of contexts - and emergent feature of this process is that the model DOES perform more than pure compression of the training data.
For example, GPT-3 is able to calculate addition and subtraction problems that didn’t appear in the training dataset. This would suggest that the model learned how to perform addition and subtraction, likely because it was easier or more efficient than storing all of the examples from the training data separately.
This is a simple to measure example, but it’s enough to suggests that LLMs are able to extrapolate from the training data and perform more than just stitch relevant parts of the dataset together.
That’s interesting, I’d be curious to read more about that. Do you have any links to get started with? Searching this type of stuff on Google yields less than ideal results.
Check out this one: https://thegradient.pub/othello/
In it, researchers built a custom LLM trained to play a board game just by predicting the next move in a series of moves, with no input at all about the game state. They found evidence of an internal representation of the current game state, although the model had never been told what that game state looks like.
I know. I just thought it was a bit ironic seeing such a strongly worded response from it.
Could you share your source?
Yes because online discussions usually aren’t inherently subjective and instead backed by sourceable knowledge. Sorry for the cynicism but one could always find any source that underlines any point so everything should be taken with a grain of salt.
I’d personally argue, that the way generative AI works lends itself to produce answers that fit the general consensus of the internet that is relevant to the given prompt, because it calculates the most likely response based on the information available. Since most information relevant to “Google Web DRM” is critical of it (Google doesn’t call it DRM themselves), it makes sense a prompt querying the AI for opinions on Web DRM will result in a rather negative response, if Google doesn’t tamper with it to their advantage.
What do you mean source? It’s a language model that learned from what people said. No source is needed, just an understanding of how llms actually work. When you ask an llm what the answer to a math question is, it doesn’t run a calculation of that question. Instead of gives you back what it thinks you want to hear. Some llms have gotten additional actions like making these calculations but for the most basic implementation it’s telling you want you want to hear through a series of tests that you’ve told it if it was right or wrong on.
So you teach it what your want to hear and it repeats it.
That ignores all the papers on emergent features of LLMs and the fact they are basically black boxes. Yes, we “trained” them to write what we want to hear. But we don’t really understand what happens inside of it. We can’t categorically claim things like “they are only regurgitating what they heard”. Because that is not a scientific or even philosophical statement.
If you think about it for a second, it’s also applicable to human beings…
Exactly, the reason LLMs are so fascinating to us is how close they get to sounding human. Thing is, it’s not a trick. When people dismiss LLMs because, “Oh they mostly just echo their training data set”. That’s just culture in humans. Then it’s the emergent behavior that makes us feel unique. I’m not saying LLMs are human equivalent. But they’re fairly close in design to how a huge part of our psyche works.
Would it be feasible to fork the internet if this comes to pass?
Internet is just a series of tubes. You’re talking about alternative content/services providers (news, video, shopping, etc.) if the existing ones choose to require only approved browsers.
Are you going to run your own news company?
i just tried this and got a more fence-sitting result of “here are the pros and cons, there should be public discussion before we know if it’s good or bad”.
but your result is fascinating.