cross-posted from: https://lemmy.world/post/76533
One of the arguments made for Reddit’s API changes is that they are now the go to place for LLM training data (e.g. for ChatGPT).
I haven’t seen a whole lot of discussion around this and would like to hear people’s opinions. Are you concerned about your posts being used for LLM training? Do you not care? Do you prefer that your comments are available to train open source LLMs?
(I will post my personal opinion in a comment so it can be up/down voted separately)
I totally agree that Reddit’s motivation is probably not related to LLMs and the link I posted is more of an excuse than anything. However, I am curious what people think about data scraping and LLMs in general.