At a Senate hearing on AI’s impact on journalism, lawmakers backed media industry calls to make OpenAI and other tech companies pay to license news articles and other data used to train algorithms.

    9 months ago

    Thanks for the link to Common Crawl; I didn’t know about that project but it looks interesting.

    That’s also an interesting point about heavily curated data sets. Would something like that be able to overcome some of the bias in current models? For example, if you were training a facial recognition model, access a curated, open source dataset that has representative samples of all races and genders to try and reduce the racial bias. Anyone training a facial recognition model for any purpose could have a training set that can be peer reviewed for accuracy.