The Irony of 'You Wouldn't Download a Car' Making a Comeback in AI Debates

FatCat@lemmy.world · 4 months ago

The Irony of 'You Wouldn't Download a Car' Making a Comeback in AI Debates

LarmyOfLone@lemm.ee · edit-2 4 months ago

Thanks for the info. But lets say you want to train a (future) AI to spot and tag disinformation and misinformation. You’d need to use and curate actual data from social media sites and articles.

If copyright is extended to learning from and analyzing publicly available data, such an AI will only be possible by licensing that data. Which will be monetize to maximize profit, first some lump sum, then later “per gb” and then later “per use”.

I’m sure open source AI will make due and for many applications there is enough free data, but I can imagine a lot of cases where there wont. Anything that requires “commercially successful” media, articles, newspapers, screenplays, movies, books, social media posts and comments, images, photos, video clips…

We’re basically setting up a world where the intellectual wealth of our civilization is being transformed into a commodity and then will be transferred into the hands of a few rich capitalists.

And even if there is acceptable amount of free data, if the principle is that data needs to be specifically licensed to learn and train and derive AI works from it - that makes free data use expensive too. It needs to be specifically vetted and is still vulnerable to be sued for mistakes or outrageous claims of copyright. Similar to patents, the uncertainty requires higher capitalization for any startup to defend against lawsuits.

mm_maybe@sh.itjust.works · 4 months ago

Yeah, I’ve struggled with that myself, since my first AI detection model was technically trained on potentially non-free data scraped from Reddit image links. The more recent fine-tune of that used only Wikimedia and SDXL outputs, but because it was seeded with the earlier base model, I ultimately decided to apply a non-commercial CC license to the checkpoint. But here’s an important distinction: that model, like many of the use cases you mention, is non-generative; you can’t coerce it into reproducing any of the original training material–it’s just a classification tool. I personally rate those models as much fairer uses of copyrighted material, though perhaps no better in terms of harm from a data dignity or bias propagation standpoint.

LarmyOfLone@lemm.ee · 4 months ago

I just want a holodeck future without having to pay by the hour to DisneComBroSonyFlixMount.

General_Effort@lemmy.world · 4 months ago

But that’s unethical!