RSS Bot@lemmy.bestiver.seMB to Hacker News@lemmy.bestiver.seEnglish · 3 months agoDeepSeek-v3.2: Pushing the Frontier of Open Large Language Models [pdf]huggingface.coexternal-linkmessage-square1linkfedilinkarrow-up14arrow-down11file-textcross-posted to: technology@lemmy.world
arrow-up13arrow-down1external-linkDeepSeek-v3.2: Pushing the Frontier of Open Large Language Models [pdf]huggingface.coRSS Bot@lemmy.bestiver.seMB to Hacker News@lemmy.bestiver.seEnglish · 3 months agomessage-square1linkfedilinkfile-textcross-posted to: technology@lemmy.world
minus-squarebrucethemoose@lemmy.worldlinkfedilinkEnglisharrow-up1·3 months agoAlt attention is here. I wonder what OpenAI/Claude are using internally these days? GTP-OSS was just sliding window attention (a relatively primitive mechanism).
Alt attention is here.
I wonder what OpenAI/Claude are using internally these days? GTP-OSS was just sliding window attention (a relatively primitive mechanism).