OpenAI: Introducing SuperAlignment (blogpost from 5.07.2023)

simple@lemm.ee · edit-2 1 year ago

OpenAI: Introducing SuperAlignment (blogpost from 5.07.2023)

TheFutureIsDelaware@sh.itjust.works · 1 year ago

Yeah but like we have an ability to surgically remove specific concepts from ai “knowledge”

I think you’re overestimating our ability to do this, especially with more and more capable AIs. For a few reasons.

Prediction requires a good world model. Every thing you leave out has the potential to make it worse at other things.

It would be very hard to remove everything that even vaguely referenced the things you don’t want it to know. A sufficiently capable AI can figure out what you left out and seek that information out. Especially when it needs to reason about a world in which TAI/AGI exist.

Mesa-optimizers. You never know if you’re removing the capability, or the AI is letting you think you removed the capability.

OpenAI: Introducing SuperAlignment (blogpost from 5.07.2023)

OpenAI: Introducing SuperAlignment (blogpost from 5.07.2023)

Introducing Superalignment