The Inventor Behind a Rush of AI Copyright Suits Is Trying to Show His Bot Is Sentient::Stephen Thaler’s series of high-profile copyright cases has made headlines worldwide. He’s done it to demonstrate his AI is capable of independent thought.
The Inventor Behind a Rush of AI Copyright Suits Is Trying to Show His Bot Is Sentient::Stephen Thaler’s series of high-profile copyright cases has made headlines worldwide. He’s done it to demonstrate his AI is capable of independent thought.
What stupid bullshit. There is nothing remotely close to an artificial general intelligence in a large language model. This person is a crackpot fool. There is no way for a LLM to have persistent memory. Everything outside of the model that pre and post processes information is where the smoke and mirrors exist. This just just databases and standard code.
The actual model is just a system of categorization and tensor math. It is complex vector math. That is it. There is nothing else going on inside the model. If you want to modify it, you need to recalculate a bunch of math as it relates to the existing vectors/tensor tables. All of this math is static. It can’t change. It can’t adapt. It can’t plan. It has some surprising features that one might not expect to be embedded in human language alone, but that is all this is. Try offline, open source, AI. Use Oobabooga, get models from Hugging Face, start with something like a Llama2 7B. This is not hard. You do not need a graphics card. There are lots of models that work great on just a CPU. You will need a good amount of RAM for running a really good model. A 7B is like talking to a teenager prone to lying, a 13B is like a 20 year old, a 30B at 8bit quantization is like an inexperienced late twenty-something. A 70B at 4 bit quantization is like a 30yo with a masters degree. A 70B at 4 bits will need around 14+ CPU logical cores, and 64GB of system memory to generate around 2 tokens a second, this is around 1-2 words per second and is about as slow as is practical.
Don’t believe anything you read in bullshit media about AI right now, and ignore the proprietary stalkerware garbage. The open source offline AI world is the future and it is yours to do as you please. Try it! It is fun.
Wow, that’s some of the most concrete, down-to-earth explanation of what everyone is calling AI. Thanks.
I’m technical, but haven’t found a good article explaining today’s AI in a way I can grasp well enough to help my non-technical friends and family. Any recommendations? Maybe something you’ve written?
It would be funny if that comment was ai generated.
I read once we shouldn’t be worried when AI starts passing Turing tests, we should worry when they start failing them again 🤣
I read a physical book about using chatGPT that I’m pretty sure was written by chatGPT.
Sidenote: you don’t need to read a book about using chatGPT.
I’ve had most success explaining LLM ‘fallibility’ to non-techies using the image gen examples. Google ‘AI hands’, and ask them if they see anything wrong. Now point out that we’re _extremely_sensitive to anything wrong with our hands, and so these are very easy for us to spot. But the AI has no concept of what a hand is, it’s just seen a _lot _ of images from different angles, sometimes fingers are hidden, sometimes intertwined etc. So it will happily generate lots more of those kinds of images, with no regard to whether they could / should actually exists.
It’s a pretty similar idea with the LLMs. It’s seen a lot of text, and can put together words in a convincing-looking way. But it has no concept of what it’s writing, and the equivalent of the ‘hands’ will be there in the text. It’s just that we can’t see them at first glance like we can with the hands.
Nice comparisons. Will add that to my explanations.
Thanks!
This one helped me a bit - https://www.understandingai.org/p/large-language-models-explained-with
Thanks!
Yann LeCun is the main person behind open source offline AI as far as putting the pieces in place and events that lead to where we are now. Maybe think of him as the Dennis Ritchie or Stallman of AI research. https://piped.video/watch?v=OgWaowYiBPM
I am not the brightest kid in the room. I’m just learning this stuff in practice and sharing some of what I have picked up thus far. I am at a wall when it comes to things like understanding rank 3 tensors or greater, and I still can’t figure out exactly how the categorization network is implemented. I think that last one has to do with Transformers and has something to do with rotation of vectors in an efficient way, but I haven’t figured it out intuitively yet. Thanks for the complement through.
Oh crap, you already done lost me in the second half there, but I’ll give the link a watch.
Thanks again!
This plus any LLM model is incapable of critical thinking. It can imitate it to the point where people might think it’s able to, but that’s just because it has seen the answers to the problems people are asking during the training process.
It’s basically a book you can talk to. A book can contain incredibly knowledge, but it’s a preserve artifact of intelligence, not intelligence.
Correct, but I haven’t seen anything suggesting that DABUS is an LLM. My understanding is that it’s basically made up of two components:
EDIT: This article is the best one I’ve found that explains how DABUS works. See also this article, which I read when first writing this comment.
Other than using machine vision and machine hearing (“acoustic processing algorithms”) to supervise the neural networks, I haven’t found any description of how the thalamobot functions. Machine vision / hearing could leverage ML but might not, and either way I’d be more interested in how it determines what to prioritize / additional algorithms to trigger rather than how it integrates with the supervised system.
As far as I can tell, probably, but not necessarily.
Ignoring Thaler’s claims, theoretically a supervisor could be used in conjunction with an LLM to “learn” by re-training or fine-tuning the model. That’s expensive and doesn’t provide a ton of value, though.
That said, a database / external process for retaining and injecting context into an LLM isn’t smoke and mirrors when it comes to persistent memory; the main difference compared to re-training is that the LLM itself doesn’t change. There are other limitations, too. But if I have an LLM that can handle an 8k token context where the first 4k is used (including during training) to inject summaries of situational context and of topics/concepts that are currently relevant, and the last 4k are used like traditional context, then that gives you a lot of what persistent memory would. Combine that with the ability for the system to retrain as needed to assimilate new knowledge bases and you’re all the way there.
That’s still not an AGI or even an attempt at one, of course.
Just talking hypothetically, I think it may be possible to actually make an AGI with an LLM base with a threaded interpreted language like Forth. If it was integrated into the model, it might be able to add network layers like a LoRA in real time or let’s say average prompt to response time. The nature of Forth makes it possible to negate issues with code syntax as a single token or two could trigger a Forth program of any complexity. I can imagine a scenario where Forth is fully integrated and able to modify the network with more than just LoRAs and embeddings, but I’m no expert; just a hobbyist. I fully expect any major breakthrough will be from white paper research, and not someone that is using hype media nonsense and grandstanding for a spotlight. It will not involve external code.
Tacking systems together with databases is not what I would call a human-brain analog or AGI. I expect a plastic network with self modifying behavior in near real time along with the ability to expand at or arbitrarily alter any layer. It would also require a self test mechanism and bookmarking system to roll back any unstable or unexpected behavior using self generated tests.
Agreed, and either of those are more than a system with persistent memory.
I think it would be wise for such a system to have a rollback mechanism, but I don’t think it’s necessary for it to qualify as a human brain analog or AGI - I don’t have the ability to roll back my brain to the way it was yesterday, for example, and neither does anyone I’ve ever heard of.
I don’t think this is realistic or necessary, either. If I want to learn a new, non-trivial skill, I have to practice it, generally over a period of days or longer. I would expect the same from an AI.
Sleeping after practicing / studying often helps to learn a concept or skill. It seems to me that this is analogous to a re-training / fine-tuning process that isn’t necessarily part of the same system.
It’s unclear to me why you say this. External, traditional code is necessary to link multiple AI systems together, like a supervisor and a chatbot model, right? (Maybe I’m missing how this is different from invoking a language from within the LLM itself - I’m not familiar with Forth, after all.) And given that human neurology is basically comprised of multiple “systems” - left brain, right brain, frontal node, our five senses, etc. - why wouldn’t we expect the same to be true for more sophisticated AIs? I personally expect there to be breakthroughs if and when an AI that is trained on multi-modal data (sight + sound + touch + smell + taste + feedback from your body + anything else of relevance) is built (e.g., by wiring up people with sensors to pull down that data), and I believe that models capable of interacting with that kind of training data would comprise multiple systems.
At minimum you currently need an external system wrapped around the LLM to emulate “thinking,” which my understanding is something ChatGPT already does (or did) to an extent. I think this is currently just a “check your work” kind of loop but a more sophisticated supervisor / AI consciousness could be much more capable.
That said, I would expect an AGI to be able to leverage databases in the course of its work, much the same way that Bing can surf the web now or ChatGPT can integrate with Wolfram — separate from its own ability to remember, learn, and evolve.
I think the fundamental difference in our perspectives is that I want to see neural expansion capabilities that are not limited by a static state and dedicated compilation. I think this is the only way to achieve a real AGI. If the neural network is static, ultimately you have a state machine with a deterministic output. It can be ultra complex for sure, but it is still deterministic. I expect an AGI to have expansion in any direction at all times according to circumstances and needs; aka adaptability beyond any preprogrammed algorithms.
Forth is very old, and from an era when most compute hardware was tailor made. It was originally created as a way to get professional astronomy observatories online much more quickly. The fundamental concept with Forth is to create the simplest looping interpreter on any given system using assembly or any supported API. The interpreter can then build on the Forth dictionary of words. Words are the fundamental building block of Forth. They can be anything from a pointer to a variable, or a function, to an entire operating system and GUI. Anything can be assigned to a word and a word can be any combination of data, types, and other words. The syntax is extremely simple. It is a stack based language that is very close to the bare metal. It is so simple and small, that there are versions of Forth that run on tiny old 8 bit AVRs and other microcontrollers.
Anyways, the idea of a threaded interpreter like Forth, could be made to compile tensor layers. The API for the network would be part of the Forth dictionary. Another key aspect to Forth is that the syntax to create new words is so simple that a word can be made that creates the required formatting. This could make it possible for a model to provide any arbitrary data for incorporation/modification and allow Forth to attempt to add it into the network in real time. It could also be used to modify specific tensor weights when a bad output is indicated by the user and a correction is provided.
If we put aside text formatting, settings, and user interface elements, the main reason a LLM needs external code for interfacing is because of the propensity for errors due to syntax complexity with languages like Python or C. No models can generate reliable complex code suitable for their own execution internally without intervention. Forth is so flexible that a dictionary could even be a tensor table of weights, like words could be the values. Forth is probably the most anti-standards, anti-syntax, language ever created.
Conceptually, the interpreter is like a compiler, command line, task scheduler, and init/process manager all built into one ultra simple system. Words are built from the registers, flags, and interrupts, up to anything of arbitrary complexity. A model does not need this low level interface with compute hardware, but this is not my point. Models are built on tensors and tokens. Forth can be made to speak these natively and in near real time as prompted internally and without compilation; a true learning machine. Most Forth implementations also have an internal bookmarking system that allows the dictionary to roll back to a known good state when encountering errors in newly created words.
A word of warning, full implementations like ANS Forth or G-Forth are intimidating at first glance. It is far better to look at something like Flash Forth for microcontrollers to see the raw power of the basic system without the giant dictionaries present in modern desktop implementations.
The key book on the concepts behind Forth and threaded interpretive languages is here: https://archive.org/details/R.G.LoeligerThreadedInterpretiveLanguagesTheirDesignAndImplementationByteBooks1981
Plus the marketing writes itself
Don’t miss DABUS!
Yup yup my guy. This is looking like just another ploy for companies and people to be able to patent and copyright everything under the fucking sun.
This is the thing, what do you do with it? I can’t imagine it being able to do something a human couldn’t do better
It is much faster than stack overflow for code snippets. The user really needs a basic skepticism about all outputs even with an excellent model, but like, a basic 70B Llama2 can generate decent Python code. When it makes an error, pasting that error into the prompt will almost always generate a fix. This only applies to short single operations type tasks, but it is super useful if you already know the basics of code like variables, types, and branching constructs. It can explain API’s and libraries too.
The real value comes from integrating databases and other AI models. I currently have a combination I can talk to with a mic and it can reply as an audio clip with a LLM generating the reply text. I’m working on integrating a database to help teach myself the computer science curriculum using free materials and a few books. Individualized education is a major application. You can also program a friend, or professional colleague, a councillor, or ask medical questions. There is a lot of effort going into getting accurate models for stuff like medical where they can provide citations. Even with sketchy information from basic models, they will still generate terms and hints that you can search in a regular search engine to find new information in many instances. This will help you escape the search engine echo chambers that are so pervasive now. Heck I even asked the 70B about meat smoker heat and timing settings and it made better suggestions than several YT examples I watched and tried. I needed an industrial adhesive a couple of weeks ago and found nothing searching google and bing, but after asking the 70B it gave me 4 of 6 valid results for products. After plugging these in to search, suddenly the search engines knew of thousands of results for what I was looking for. I honestly didn’t expect it to be as useful as it really is. Like I turn on my computer, and start the 70B first thing every day. It unloads itself from memory while idle, but I’m constantly asking it stuff. I go many days without even going online from my workstation.
Are you using ooga booga? What specs does your system have?
I do use Oobabooga a lot. I am developing my own scripts and modifying some of Oobabooga too. I also use Koboldcpp. I am on a 12gen i7 with 20 logic cores and 64GB of system memory along with a 3080Ti with 16GBV. The 70B 4 bit quantized model running with 14 layers offloaded onto the GPU generates 3 tokens a second. So it is 1.5 times faster than just on the CPU.
If I was putting together another system, I would only get something with AVX-512 instructions support in the CPU. That instruction is troublesome for CVE issues. You’ll probably need to look into this depending on your personal privacy/security threat model. The ability to run larger models is really important. You really want all the RAM. The answer to the question of how much is always yes. You are not going to get enough memory using consumer GPUs you can only offload a few layers onto a consumer grade GPU. I can’t say how well even larger models than the 70B will perform as the memory bottlenecks. I can’t even say how a 30B or larger runs at full quantization. I can’t add any more memory to my system. Running the full models, as a rule of thumb, requires double the token size in RAM. So a 30B will require around 60GB of memory to initial load. Most of these models are float-16. So running them 8-bit cuts the size in half with penalties in areas like accuracy. Running 4 bit splits the size again. There is tuning, bias, and asymmetry in the way quantization is done to preserve certain aspects like emerging phenomena in the original data. This is why a larger model with a smaller quantization may outperform a smaller model running at full quantization. For GPUs, if you are at all serious about this, you need at least 16GBV at a bare minimum. Really, we need to see a descent priced 40-80GBV consumer option. The thing is that GPU memory is directly tied to compute hardware. There isn’t the overhead of a memory management system like system memory has. This is what makes GPUs ideal and fast, but it is the biggest chunk of bleeding edge silicon in consumer hardware already, and we need it to be 4× larger and cheap. That is not going to happen any time soon. This means the most accessible path to larger models is using the system memory. While you’ll never get the parallelism of a GPU, having cpu instructions that are 512 bits wide is a big performance boost. You also need max logic cores. That is just my take.
While I agree that LLMs probably aren’t sentient, “it’s just complex vector math” is not a very convincing argument. Why couldn’t some complex math which emulates thought be sentient? Furthermore, not being able to change, adapt, or plan may not preclude sentience, as all that is required for sentience is the capability to percieve and feel things.
It doesn’t emulate thought though. At all.
What I’m saying is, we don’t know what physical or computational characteristics are required for something to be sentient.
Language is not a requirement for sentience, and these models clearly show that you can have language without having sentience.
As would any text user interface.