• 0 Posts
  • 7 Comments
Joined 1 year ago
cake
Cake day: February 6th, 2024

help-circle
  • That o3 does well on frontier math held-out set is impressive, no doubt

    I think there is plenty of room for doubt still. elliotglazer on reddit writes:

    Epoch’s lead mathematician here. Yes, OAI funded this and has the dataset, which allowed them to evaluate o3 in-house. We haven’t yet independently verified their 25% claim. To do so, we’re currently developing a hold-out dataset and will be able to test their model without them having any prior exposure to these problems.

    My personal opinion is that OAI’s score is legit (i.e., they didn’t train on the dataset), and that they have no incentive to lie about internal benchmarking performances. However, we can’t vouch for them until our independent evaluation is complete.

    (emphasis mine). So there is good reason to doubt that the “held-out dataset” even exists.






  • I read one of the papers. About the specific question you have: given a string of bits s, they’re making the choice to associate the empirical distribution to s, as if s was generated by an iid Bernoulli process. So if s has 10 zero bits and 30 one bits, its associated empirical distribution is Ber(3/4). This is the distribution which they’re calculating the entropy of. I have no idea on what basis they are making this choice.

    The rest of the paper didn’t make sense to me - they are somehow assigning a number N of “information states” which can change over time as the memory cells fail. I honestly have no idea what it’s supposed to mean and kinda suspect the whole thing is rubbish.

    Edit: after reading the author’s quotes from the associated hype article I’m 100% sure it’s rubbish. It’s also really funny that they didn’t manage to catch the COVID-19 research hype train so they’ve pivoted to the simulation hypothesis.