Hi, I’m Eric and I work at a big chip company making chips and such! I do math for a job, but it’s cold hard stochastic optimization that makes people who know names like Tychonoff and Sylow weep.

My pfp is Hank Azaria in Heat, but you already knew that.

  • 5 Posts
  • 27 Comments
Joined 1 year ago
cake
Cake day: January 22nd, 2024

help-circle





  • The ARC scores don’t matter too much to me at 3k a problem. Like the original goal of the prize had a compute limit. You can’t break that rule and then claim victory ( I mean I guess you can, but like not everyone is gonna be as wowed as xitter randos, ensemble methods were already hitting 80% + acc to francois )

    And unfortunately, with Frontier math, the lack of transparency w.r.t. which problems were solved and how they were solved makes it frustrating as hell to me, as someone who actually would like to see a super math robot. According to the senior math advisor to the people who created the data set, iirc 40% solved problems were in the easiest category / 50% in the second tier category and 10% were in the “hard” tier, but he said that he looked at the solutions and that they looked like mostly being solved ‘heuristically’ instead of plopping out any ‘new’ insights.

    Again, none of this is good science, just pure shock and awe. I’ve heard rumors that OAI is hiring strong competition style mathematicians to supervise the reinforcement learning for these types of problems and if they are letting O3 take the test, then how the hell does that not leak the problem set? Like now the whole test is compromised now right? Since this behemoth uses enough electricity to power a city block, theres no way they would be able to run it locally. Now OAI can literally pay their peeps to solve the rest and surprise surprise O3++ will hit 80%

    OTOH, with code forces scores and math scores this high, I can now put on my LW cap and say this model has 2 trillion IQ, so why hasn’t it exterminated me and my family yet like big Yud promised? It’s almost as if there is no little creature inside trying to take over the world or something.






  • Shared this on tamer social media site and a friend commented:

    “That’s nonsense. The largest charities in the country are Feeding America, Good 360, St. Jude’s Children’s Research Hospital, United Way, Direct Relief, Salvation Army, Habitat for Humanity etc. etc. Now these may not satisfy the EA criteria of absolutely maximizing bang for the buck, but they are certainly mostly doing worthwhile things, as anyone counts that. Just the top 12 on this list amount to more than the total arts giving. The top arts organization on this list is #58, the Metropolitan Museum, with an income of $347M.”







  • I remember when several months (a year ago?) when the news got out that gpt-3.5-turbo-papillion-grumpalumpgus could play chess around ~1600 elo. I was skeptical the apparent skill wasn’t just a hacked-on patch to stop folks from clowning on their models on xitter. Like if an LLM had just read the instructions of chess and started playing like a competent player, that would be genuinely impressive. But if what happened is they generated 10^12 synthetic games of chess played by stonk fish and used that to train the model- that ain’t an emergent ability, that’s just brute forcing chess. The fact that larger, open-source models that perform better on other benchmarks, still flail at chess is just a glaring red flag that something funky was going on w/ gpt-3.5-turbo-instruct to drive home the “eMeRgEnCe” narrative. I’d bet decent odds if you played with modified rules, (knights move a one space longer L shape, you cannot move a pawn 2 moves after it last moved, etc), gpt-3.5 would fuckin suck.

    Edit: the author asks “why skill go down tho” on later models. Like isn’t it obvious? At that moment of time, chess skills weren’t a priority so the trillions of synthetic games weren’t included in the training? Like this isn’t that big of a mystery…? It’s not like other NN haven’t been trained to play chess…





  • To grasp how disastrously an apparently altruistic movement has run off course, consider that the value of organizations that provide healthy vegan food within their underserved communities are ignored as an area of funding because EA metrics can’t measure their “effectiveness.” Or how covering the costs of caring for survivors of industrial animal farming in sanctuaries is seen as a bad use of funds. Or how funding an “effective” organization’s expansion into another country encourages colonialist interventions that impose elite institutional structures and sideline community groups whose local histories and situated knowledges are invaluable guides to meaningful action.

    Nice. Kind of reminds me of a segment in Ken Burns’ Vietnam documentary where to eradicate the Viet Kong, American military intelligence organizations became obsessed with body counts as a measure of ‘winning’ the war, so then the effect on the ground became shooting civs so we can count more bodies. The metric you use as a proxy for doing good (I’ve donated x dollars to combat homelessness while working for blackrock :)) isn’t aligned with your desired outcome.

    Hey, wait a minute, were EAs the misaligned entity all along??

    ⢀⣀⡀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀ ⠘⣿⣿⡟⠲⢤⡀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀ ⠀⠈⢿⡇⠀⠀⠈⠑⠦⣀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢀⣠⠴⢲⣾⣿⣿⠃ ⠀⠀⠈⢿⡀⠀⠀⠀⠀⠈⠓⢤⡀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⣀⡤⠖⠚⠉⠀⠀⢸⣿⡿⠃⠀ ⠀⠀⠀⠈⢧⡀⠀⠀⠀⠀⠀⠀⠙⠦⡀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⣀⡤⠖⠋⠁⠀⠀⠀⠀⠀⠀⣸⡟⠁⠀⠀ ⠀⠀⠀⠀⠀⠳⡄⠀⠀⠀⠀⠀⠀⠀⠈⠒⠒⠛⠉⠉⠉⠉⠉⠉⠉⠑⠋⠁⠀⠀⠀⠀⠀⠀⠀⠀⠀⣰⠏⠀⠀⠀⠀ ⠀⠀⠀⠀⠀⠀⠘⢦⡀⠀⣀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢀⡴⠃⠀⠀⠀⠀⠀ ⠀⠀⠀⠀⠀⠀⠀⠀⠙⣶⠋⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠰⣀⣀⠴⠋⠀⠀⠀⠀⠀⠀⠀ ⠀⠀⠀⠀⠀⠀⠀⠀⣰⠁⠀⠀⠀⣠⣄⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⣀⣤⣀⠀⠀⠀⠀⠹⣇⠀⠀⠀⠀⠀⠀⠀⠀⠀ ⠀⠀⠀⠀⠀⠀⠀⢠⠃⠀⠀⠀⢸⣀⣽⡇⠀⠀⠀⠀⠀⠀⠀⠀⠀⣧⣨⣿⠀⠀⠀⠀⠀⠸⣆⠀⠀⠀⠀⠀⠀⠀⠀ ⠀⠀⠀⠀⠀⠀⠀⡞⠀⠀⠀⠀⠘⠿⠛⠀⠀⠀⢀⣀⠀⠀⠀⠀⠀⠙⠛⠋⠀⠀⠀⠀⠀⠀⢹⡄⠀⠀⠀⠀⠀⠀⠀ ⠀⠀⠀⠀⠀⠀⢰⢃⡤⠖⠒⢦⡀⠀⠀⠀⠀⠀⠙⠛⠁⠀⠀⠀⠀⠀⠀⠀⣠⠤⠤⢤⡀⠀⠀⢧⠀⠀⠀⠀⠀⠀⠀ ⠀⠀⠀⠀⠀⠀⢸⢸⡀⠀⠀⢀⡗⠀⠀⠀⠀⢀⣠⠤⠤⢤⡀⠀⠀⠀⠀⢸⡁⠀⠀⠀⣹⠀⠀⢸⠀⠀⠀⠀⠀⠀⠀ ⠀⠀⠀⠀⠀⠀⢸⡀⠙⠒⠒⠋⠀⠀⠀⠀⠀⢺⡀⠀⠀⠀⢹⠀⠀⠀⠀⠀⠙⠲⠴⠚⠁⠀⠀⠸⡇⠀⠀⠀⠀⠀⠀ ⠀⠀⠀⠀⠀⠀⠀⢷⡀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠙⠦⠤⠴⠋⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⡇⠀⠀⠀⠀⠀⠀ ⠀⠀⠀⠀⠀⠀⠀⠀⢳⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢸⠀⠀⠀⠀⠀⠀ ⠀⠀⠀⠀⠀⠀⠀⠀⢸⠂⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢸⠀⠀⠀⠀⠀⠀ ⠀⠀⠀⠀⠀⠀⠀⠀⠾⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠦⠤⠤⠤⠤⠤⠤⠤⠼⠇⠀⠀⠀