@VoterFrog - K-Money's Lemmy

VoterFrog@lemmy.world · 2 days ago

ITT: A bunch of people who have never heard of information theory suddenly have very strong feelings about it.

VoterFrog@lemmy.world · 6 days ago

Models are not improving? Since when? Last week? Newer models have been scoring higher and higher in both objective and subjective blind tests consistently. This sounds like the kind of delusional anti-AI shit that the OP was talking about. I mean, holy shit, to try to pass off “models aren’t improving” with a straight face.

VoterFrog@lemmy.world · 6 days ago

I’m guessing that 6-20 minutes is like the actual time spent driving screws or drilling holes. Each one takes maybe a few seconds. 6-20 minutes in that case translates to hundreds of screws driven, even on the low end. So not nearly as worthless as the time makes it sound.

VoterFrog@lemmy.world · 17 days ago

If we’re in a simulation, it’s probably a massive universe-spanning one. We’re just a blip, both within the scale of the space of the universe and within the history of time of the universe. In that case, we’re not important enough for a simulation creator to even care to adjust our capabilities at all. They’re not watching us. We’re not the point of the simulation.

VoterFrog@lemmy.world · 19 days ago

Everyone’s talking about encyclopedias but they weren’t always that useful either. They can only fit so much information in those books so some topics would only get like 3 sentences dedicated to them. So yeah, if you were writing a research paper for school you’d spend lots of time at the library trying to find books that had another smidge of information you needed.

If you were lucky, you’d find a really good book that was very relevant to your topic and lean heavily on that. Otherwise, you’d wind up with like a few sentences each from a dozen books that you have to tie together somehow. Wasn’t fun.

VoterFrog@lemmy.world · 21 days ago

It’s not the free fall that I’m worried about. It’s what comes after.

VoterFrog@lemmy.world · 1 month ago

It can’t be expressed in any integer-based notation without an infinite number of digits. Only when expressed in some bases which are themselves, irrational. It’s infinity either way.

VoterFrog@lemmy.world · edit-2 1 month ago

The number which famously has an infinite number of digits? I thought we were arguing against the real-ness of infinity.

Also note: the method I was describing is one of the ways in which pi can be calculated.

VoterFrog@lemmy.world · 1 month ago

It destroys meaningful operations it comes into contact with, and requires invisible and growing workarounds to maintain (e.g. “countably” infinite vs “uncountably” infinite) which smells of fantasy, philosophically speaking.

This isn’t always true. The convergent series comes to mind, where an infinite summation can be resolved to a finite number.

VoterFrog@lemmy.world · edit-2 1 month ago

It’s quite useful, though, to understand a curve or arc as having infinite edges in order to calculate its area. The area of a triangle is easy to calculate. Splitting the arc into two triangles by adding a point in the middle of the arc makes it easy to calculate the area… And so on, splitting the arc into an infinite number of triangles with an infinite number of points along the arc makes the area calculable to an arbitrary precision.

VoterFrog@lemmy.world · 1 month ago

Is enshittification the scummiest thing you can think of? While other multinationals are paying for goon squads that kill people in other countries? While banks reorder daily transactions from largest to smallest so they can charge more overdraft fees, literally stealing from poor people? Even if enshittification is literally your biggest problem, you’d have to be living under a rock to think Google’s products are the most enshitified of all the garbage out there. You’ve never heard of anything from Meta? Amazon? Netflix? Microsoft?

VoterFrog@lemmy.world · 1 month ago

I don’t know man. There’s a lot shittier business practices out there than paying to be the default search engine - which is laughably easy to change on any browser. Like marketplaces and services that pay to be exclusive sources of content and then use the fact that they’re the only source for most content to force extortionate deals on content creators and enshitify every aspect of the end user experience. Just to name one.

VoterFrog@lemmy.world · edit-2 2 months ago

Because even if it winds up being a bad study, it still evokes a deeper, more important “truth.”

I’m being sarcastic but that’s actually what’s going on here.

VoterFrog@lemmy.world · 3 months ago

This is not the same thing at all. Trump instituted a zero tolerance policy, separating any family caught crossing illegally with the stated intent to dissuade families from making the trip.

Normally (including under Biden) the government separates children from suspected human traffickers or members of gangs that engage in trafficking. This is not to deter families. It’s to protect children - sending a child back to Mexico with a human trafficker is an abhorrent thing to do.

Stop carrying water for Trump.

VoterFrog@lemmy.world · 4 months ago

No mention of Gemini in their blog post on sge And their AI principles doc says

We acknowledge that large language models (LLMs) like those that power generative AI in Search have the potential to generate responses that seem to reflect opinions or emotions, since they have been trained on language that people use to reflect the human experience. We intentionally trained the models that power SGE to refrain from reflecting a persona. It is not designed to respond in the first person, for example, and we fine-tuned the model to provide objective, neutral responses that are corroborated with web results.

So a custom model.

VoterFrog@lemmy.world · edit-2 4 months ago

When you use (read, view, listen to…) copyrighted material you’re subject to the licensing rules, no matter if it’s free (as in beer) or not.

You’ve got that backwards. Copyright protects the owner’s right to distribution. Reading, viewing, listening to a work is never copyright infringement. Which is to say that making it publicly available is the owner exercising their rights.

This means that quoting more than what’s considered fair use is a violation of the license, for instance. In practice a human would not be able to quote exactly a 1000 words document just on the first read but “AI” can, thus infringing one of the licensing clauses.

Only on very specific circumstances, with some particular coaxing, can you get an AI to do this with certain works that are widely quoted throughout its training data. There may be some very small scale copyright violations that occur here but it’s largely a technical hurdle that will be overcome before long (i.e. wholesale regurgitation isn’t an actual goal of AI technology).

Some licensing on copyrighted material is also explicitly forbidding to use the full content by automated systems (once they were web crawlers for search engines)

Again, copyright doesn’t govern how you’re allowed to view a work. robots.txt is not a legally enforceable license. At best, the website owner may be able to restrict access via computer access abuse laws, but not copyright. And it would be completely irrelevant to the question of whether or not AI can train on non-internet data sets like books, movies, etc.

VoterFrog@lemmy.world · 4 months ago

It wasn’t Gemini, but the AI generated suggestions added to the top of Google search. But that AI was specifically trained to regurgitate and reference direct from websites, in an effort to minimize the amount of hallucinated answers.

VoterFrog@lemmy.world · 4 months ago

Point is that accessing a website with an adblocker has never been considered a copyright violation.

VoterFrog@lemmy.world · edit-2 4 months ago

a much stronger one would be to simply note all of the works with a Creative Commons “No Derivatives” license in the training data, since it is hard to argue that the model checkpoint isn’t derived from the training data.

Not really. First of all, creative commons strictly loosens the copyright restrictions on a work. The strongest license is actually no explicit license i.e. “All Rights Reserved.” No derivatives is already included under full, default, copyright.

Second, derivative has a pretty strict legal definition. It’s not enough to say that the derived work was created using a protected work, or even that the derived work couldn’t exist without the protected work. Some examples: create a word cloud of your favorite book, analyze the tone of news article to help you trade stocks, produce an image containing the most prominent color in every frame of a movie, or create a search index of the words found on all websites on the internet. All of that is absolutely allowed under even the strictest of copyright protections.

Statistical analysis of copyrighted materials, as in training AI, easily clears that same bar.

VoterFrog@lemmy.world · 4 months ago

We’re not just doing this for the money.

We’re doing it for a shitload of money!