The New York Times is suing OpenAI and Microsoft for copyright infringement

btp@kbin.social · 11 months ago

The New York Times is suing OpenAI and Microsoft for copyright infringement

Zima@kbin.social · 11 months ago

The model has to contain the data in order to produce works.
as far as I understand, this isn’t true. can you elaborate on why it needs to contain the data?

EvilMonkeySlayer@kbin.social · 11 months ago

It contains large parts of the data in order to create. In my link I provided it shows that the models do contain chunks of the original works.

Otherwise, how would it create the words etc.

I am amazed that we now have people on the level of crypto coin idiocy going on about ai models who don’t understand this.

Zima@kbin.social · 11 months ago

You would probably claim I don’t deserve my job with my level of technical illiteracy however you think you are inferring that . Anyways they do make reasonable efforts to design models that don’t memorize and are able to generalize. This is quite basic or fundamental on machine learning in general.

Previous models had semantic reasoning capacidad without memorization e.g. word2vec.

You should also realize that just because current models are memorizing despite efforts to prevent it doesn’t mean that models need to memorize. Like i said initially they are actually designed to work without needing to memorize.

EvilMonkeySlayer@kbin.social · 11 months ago

You’re contradicting yourself.

In one sentence you say it doesn’t memorize (with “reasonable effort”) then in the next you admit it does.

“Reasonable effort” is weasel wording.

Make up your mind.

Zima@kbin.social · 11 months ago

?? Are you trolling. If you design a car to combust gasoline without burning the lubricants but you still end up burning them it doesn’t mean that the lubricants are needed for the combustion itself. Conversely you have not made any nuanced argument explaining why memorization is necessary. I gave you an example where we know there is no memorization and you ignored it.

“Otherwise how would it create the words” is just saying you wouldn’t know.

EvilMonkeySlayer@kbin.social · 11 months ago

So, me pointing out the flaw in your argument is trolling?

What?

If you choose to use weasel wording to try and get out of something that is your call.

Zima@kbin.social · 11 months ago

Ok i believe that you believe that. It’s ok. I have professional experience in this space so you’re either not reading carefully or you don’t understand much about the topic.

Perhaps you might want to reconsider this in more abstract terms. The engine example you ignored could help you with that.

Do you really think that the fact that we have language models that don’t memorize and are simple enough that we can know for certain is not all we need to show that language models don’t necessarily have to memorize? You keep repeating the same (illogical) argument and ignore the simpler arguments that disprove your claim.

EvilMonkeySlayer@kbin.social · 11 months ago

So, now it’s gone from “reasonable effort” to most definitely you can say without any doubt that all the trained models contain no copyrighted data at all?

Come on. Make up your mind.

Zima@kbin.social · 11 months ago

You still haven’t backed up your claim. Once again just because you don’t know it doesn’t mean it’s not possible to do something.