At the end of the day, whatever the source file looks like, you’re turning it into an analog signal in order to make the speakers move. So how is it at all effective?
You can even send a digital stream of the bits over Bluetooth to speakers. So it seems like the ability to decode it is a fundamental requirement for it to be a useful file. So why even bother? It seems like it would be trivial to copy into a different format.
I guess it’s like a CAPTCHA. It doesn’t completely solve the problem the hoster wishes to solve, but it deters a lot of people from trying.