Large-scale online deanonymization with LLMs

Homer_Simpson [they/them]@hexbear.net · 1 month ago

Large-scale online deanonymization with LLMs

TankieTanuki [he/him]@hexbear.net · 1 month ago

How is it possible to validate the results?

BountifulEggnog [it/its, she/her]@hexbear.net · edit-2 1 month ago

The paper has several different datasets and explains how they got them, but for their test data they already knew the link existed. I think this one is probably the most relevant for actual attacks. They split accounts, giving a one year gap in their post history to simulate an abandoned account etc and added some fake profiles that didn’t have a match.

If you mean running this yourself, you can’t, they didn’t post prompts or anything. Just an overview of their pipeline. Sorry at first I thought you meant how could they validate that the users were the same person.

TankieTanuki [he/him]@hexbear.net · 1 month ago

Oh I see, they stripped the usernames and matched the comments. I thought they were claiming to have matched usernames to legal identities.

BountifulEggnog [it/its, she/her]@hexbear.net · 1 month ago

They did that too, with hackernews and linkedin accounts, as well as some anthropic interviewees. I’m less sure how impressive that is, because the accounts were linked by the owner. So they obviously don’t care about opsec, so they’re probably less careful then they otherwise would be. The paper isn’t a super hard read if you’re interested. Guess we’ll all have to see how well this works in practice.