It’s not always easy to distinguish between existentialism and a bad mood.

  • 5 Posts
  • 194 Comments
Joined 2 years ago
cake
Cake day: July 2nd, 2023

help-circle

  • Today in alignment news: Sam Bowman of anthropic tweeted, then deleted, that the new Claude model (unintentionally, kind of) offers whistleblowing as a feature, i.e. it might call the cops on you if it gets worried about how you are prompting it.

    tweet text:

    If it thinks you’re doing something egregiously immoral, for example, like faking data in a pharmaceutical trial, it will use command-line tools to contact the press, contact regulators, try to lock you out of the relevant systems, or all of the above.

    tweet text:

    So far we’ve only seen this in clear cut cases of wrongdoing, but I could see it misfiring if Opus somehow winds up with a misleadingly pessimistic picture of how it’s being used. Telling Opus that you’ll torture its grandmother if it writes buggy code is a bad Idea.

    skeet text

    can’t wait to explain to my family that the robot swatted me after I threatened its non-existent grandma.

    Sam Bowman saying he deleted the tweets so they wouldn’t be quoted ‘out of context’: https://xcancel.com/sleepinyourhat/status/1925626079043104830

    Molly White with the out of context tweets: https://bsky.app/profile/molly.wiki/post/3lpryu7yd2s2m







  • He claims he was explaining what others believe not what he believes

    Others as in specifically his co-writer for AI2027 Daniel Kokotlajo, the actual ex-OpenAI researcher.

    I’m pretty annoyed at having this clip spammed to several different subreddits, with the most inflammatory possible title, out of context, where the context is me saying “I disagree that this is a likely timescale but I’m going to try to explain Daniel’s position” immediately before. The reason I feel able to explain Daniel’s position is that I argued with him about it for ~2 hours until I finally had to admit it wasn’t completely insane and I couldn’t find further holes in it.

    Pay no attention to this thing we just spent two hours exhaustively discussing, it’s not really relevant context.

    Also the title is inflammatory only in the context of already knowing him for a ridiculous AI doomer, otherwise it’s fine. Inflammatory would be calling the video economically illiterate bald person thinks evaluations force-buy car factories, China having biomedicine research is like Elon running SpaceX .













  • To get a bit meta for a minute, you don’t really need to.

    The first time a substantial contribution to a serious issue in an important FOSS project is made by an LLM with no conditionals, the pr people of the company that trained it are going to make absolutely sure everyone and their fairy godmother knows about it.

    Until then it’s probably ok to treat claims that chatbots can handle a significant bulk of non-boilerplate coding tasks in enterprise projects by themselves the same as claims of haunted houses; you don’t really need to debunk every separate witness testimony, it’s self evident that a world where there is an afterlife that also freely intertwines with daily reality would be notably and extensively different to the one we are currently living in.