Why is ChatGPT not a true AI?

Artificial intelligence has been a dream for centuries, but it's only recently gone viral thanks to the tremendous advances in computing power and data analysis. Large language models (LLMs) like ChatGPT are essentially a very advanced form of autocomplete. Their impressive potential stems from the fact that the training data consists of the entire internet. LLMs may be one ingredient in the formula for true artificial general intelligence, but they're certainly not the whole formula, and we likely don't yet know what some of the other ingredients are.

Thanks to ChatGPT, we can all finally experience artificial intelligence. All you need is a web browser. Then, you can talk directly to the most advanced AI system on the planet—a crowning achievement of 70 years of effort. And it looks like real AI—the AI we've all seen in the movies. But does this mean we've finally found the formula for true AI? Is the end of the road for AI in sight?

According to Popular Science Turkish, artificial intelligence is one of humanity's oldest dreams. It dates back at least to ancient Greece and the legend of Hephaestus, the blacksmith of the gods. Hephaestus had the power to give life to metal creatures. Since then, various interpretations of this theme have emerged in mythology and fiction. However, artificial intelligence began to seem plausible with the invention of the computer in the late 1940s.

A formula for symbolic artificial intelligence

Computers are instruction-following machines. The programs we give them are nothing more than finely detailed instructions; they are formulas that the computer faithfully follows. Your web browser, email client, and word processor are all just this incredibly detailed list of instructions. So if “true artificial intelligence” were possible—the dream of computers as capable as humans—then it would amount to such a formula. All we need to do to make AI a reality is find the right formula. So what would such a formula look like? And given the recent excitement surrounding ChatGPT, GPT-4, and BARD (properly called extended language models [LLMs]), have we finally found the formula for true AI?

For nearly 40 years, the fundamental concept guiding attempts at artificial intelligence involved modeling the conscious mind—the thoughts and reasoning processes that constitute our conscious existence. This approach was called symbolic AI because our thoughts and reasoning appeared to involve languages composed of symbols—letters, words, and punctuation marks. Symbolic AI involved not only trying to find formulas that captured these symbolic expressions but also replicating reasoning and decision-making by manipulating these symbols.

Symbolic AI had achieved some success, but it failed spectacularly on a vast array of tasks that seemed trivial to humans. Even a simple task like recognizing a human face was beyond symbolic AI. This is because recognizing faces requires perception. Perception is the problem of understanding what we see, hear, and feel. Most of us take perception largely for granted; we don't think much about it, and certainly don't associate it with intelligence. But symbolic AI was the wrong way to try to solve problems that require perception.

Neural networks are coming

An alternative formula for artificial intelligence involves modeling the structures we see in the brain instead of modeling the mind. After all, human brains are the only entities we know today capable of generating human intelligence. If you look at a brain under a microscope, you'll see a vast number of nerve cells, called neurons, connected to each other in vast networks. Each neuron searches for patterns within its network of connections. When a neuron recognizes a pattern, it sends signals to its neighbors. These neighbors, in turn, search for patterns, and when they see a pattern, they communicate with their peers, and so on.

We can't explain it logically, but somehow these massive networks of neurons can learn and ultimately produce intelligent behavior. The field of neural networks was originally born in the 1940s and was inspired by the idea that these networks of neurons could be emulated with electrical circuits. Today's neural networks are implemented in software rather than electrical circuits, and to be frank, neural network researchers aren't actually trying to model the brain. However, the software structures they use—vast networks composed of very simple computational devices—are inspired by the neural structures we see in brains and nervous systems.

Neural networks, which have fluctuated in and out of favor at various times (especially in the late 1960s and mid-1980s), have been studied continuously since the 1940s and are often seen as competing with symbolic artificial intelligence. However, it wasn't until the last decade that neural networks began to work consistently. All the excitement we've seen in AI over the past decade is due to the rapid advances neural networks have made on a range of AI problems.

Unfortunately, the rise of neural networks this century stemmed from commonplace reasons. Of course, there were scientific advances, such as new neural network structures and the algorithms that structure them. But in reality, the core ideas behind today's neural networks were known as far back as the 1980s. This century brought with it abundant data and abundant computing power. Training a neural network requires both, and both have become abundant this century.

All the AI systems that have been in the news recently use neural networks. For example, AlphaGo, the famous Go software developed by the London-based AI firm DeepMind and which defeated a world champion player in March 2016, uses two neural networks, each with 12 neural layers. The data to train these networks comes from previous Go games played online, as well as games the software plays itself—that is, games the software plays against itself. The AI systems that have made headlines in recent years—like ChatGPT and GPT-4 from the Microsoft-backed AI firm OpenAI, as well as Google's BARD—also use neural networks. What distinguishes these latest developments is simply their sheer size. Everything about these models is mind-boggling in its scale.

Massive power, massive data

Consider OpenAI's GPT-3 system, announced in the summer of 2020. This is the underlying technology of ChatGPT. It was the LLM that signaled a groundbreaking breakthrough in that technology. The neural networks that make up GPT-3 are massive. When people who study neural networks describe a network's size, they refer to its number of "parameters." In this context, a "parameter" is either a single neuron or a network component, a connection between neurons. GPT-3 has 175 billion parameters in total, while GPT-4 is said to have 1 trillion. By comparison, a human brain has something like 100 billion neurons in total, connected by as many as 1,000 trillion synaptic connections. While existing LLMs are massive, they still fall short of the scale of the human brain.

The data used to train GPT consisted of 575 gigabytes of text. You might be thinking, "That doesn't sound like much; after all, you can store it on a regular computer." But these aren't videos, photos, or music; they're just plain text. And 575 gigabytes of plain text is an unimaginably large amount—far, far more than a person could read in a lifetime. Where did they get all this text? They downloaded the World Wide Web. All of it. Every link on every web page was followed, the text was extracted, and then the process was repeated, systematically following every link until every piece of text on the web was generated. English Wikipedia comprised only 3% of the total training data.

But what about the computer needed to process all that text and train these massive networks? Computer scientists use the term "floating-point operation," or "FLOP," to refer to a single arithmetic computation; a FLOP is a single addition, subtraction, multiplication, or division operation. Training GPT-3 required 3^1023 FLOPs. Our humble human experience doesn't equip us to understand such large numbers. Put it this way: If you tried to train GPT-3 on a typical desktop computer built in 2023, it would have to run continuously for something like 10,000 years to execute that many FLOPs.

Of course, OpenAI didn't train GPT-3 on a desktop computer. They used very expensive supercomputers containing thousands of specialized AI processors, running non-stop for months. And that amount of computation isn't cheap. The computer time required to train GPT-3 costs millions of dollars on the open market. This, more than anything else, means that few organizations, save for a handful of large tech companies and nation-states, can afford to build systems like ChatGPT.

Under the umbrella of LLM

Despite their mind-boggling scale, LLMs actually do something very simple. Imagine turning on your smartphone and starting to text your spouse with the words "when." Your phone suggests things to complement that text. For example, it might suggest things like "you're coming home" or "we're having dinner." It suggests these things because your phone predicts that these are the most likely words to appear after "when." Your phone makes this prediction based on all the text messages you've sent, and from those messages, it's learned that these are the most likely complements to "when." LLMs do the same thing, but as we've seen, they do it on a vastly larger scale. The training data isn't just your text messages; it's all the text in the world in digital format. What does this scale yield? It's quite remarkable, and unexpected.

The first thing we notice when we use ChatGPT or BARD is that they're extremely good at producing very natural text. This isn't surprising; that's what they're designed to do, and that's the whole point of those 575 gigabytes of text. But what's unexpected is that LLMs also acquire other capabilities in ways we don't yet understand: capabilities that must be understood somehow implicitly within the vast corpus of text they're trained on.

For example, we can ask ChatGPT to summarize a piece of text, and it generally does a commendable job. We can ask it to extract key points from text or compare text fragments, and it seems to do quite well at these tasks as well. While AI researchers were alarmed by the power of LLMs when GPT-3 was released in 2020, the rest of the world only caught on when ChatGPT was released in November 2022. It attracted hundreds of millions of users within a few months. AI has been a frequent topic for a decade, but the excitement in the press and social media when ChatGPT was released was unprecedented: AI went viral.

The age of artificial intelligence

At this point, I need to say something comforting. Thanks to ChatGPT, we've finally reached the age of artificial intelligence. Every day, hundreds of millions of people interact with the most complex AI on the planet. This required 70 years of scientific effort, countless careers, billions upon billions of dollars in investment, hundreds of thousands of scientific papers, and AI supercomputers running at top speed for months. The artificial intelligence the world has finally reached is… order completion.

The future of trillion-dollar companies is now at stake. Their fate hinges on… fulfilling requests. Exactly what your cell phone does. As an AI researcher who has worked in this field for over 30 years, I must say I find this situation deeply troubling. It's truly disgraceful. Who could have predicted that this version of AI would ultimately command the largest premium?

Every time we see rapid progress in AI, someone eventually declares that we're done—that we're on the path to true AI. Given the success of LLMs, it's no surprise that similar claims are being made now. So, let's pause and reflect. If we succeed in AI, then machines will need to be able to do everything a human can.

Consider the two main branches of human intelligence: one involves purely mental abilities, the other involves physical abilities. For example, mental abilities include logical and abstract reasoning, common-sense reasoning (like understanding that a dropped egg will break or that I can't eat Kansas), numerical and mathematical reasoning, problem solving and planning, natural language processing, a logical state of mind, an activity, the ability to remember, and the ability to empathize. Physical abilities include sensory understanding (i.e., interpreting input from our five senses), mobility, navigation, manual dexterity and movements, hand-eye coordination, and proprioception.

I emphasize that this is far from an exhaustive list of human capabilities. But if we ever have true AI—AI as competent as we are—then it will certainly possess all these capabilities.

LLMs are not true AI

The most obvious thing to say is that LLMs are not a technology suited to any of these physical capabilities. LLMs don't exist in the real world, and the challenges posed by robotic AI have absolutely no relevance to what LLMs were designed to address. Furthermore, progress in robotic AI has been far slower than progress in LLMs. Perhaps surprisingly, robot capabilities like manual dexterity are still a long way from being solved. Furthermore, LLMs offer no path to solutions for these challenges.

Of course, you might consider an AI system to be pure software intelligence, so how do LLMs compare to the mental capabilities listed above? Of these, it could be argued that LLMs have only really made significant progress in natural language processing—that is, the ability to communicate effectively in ordinary human languages. Nothing surprising here; that's what they were designed for.

But the dazzling proficiency they display in human-like communication perhaps leads us to believe they're far more proficient at other things than they actually are. They can perform some artificial logical reasoning and problem-solving, but these are superficial at present. Perhaps we should be surprised that they can do anything beyond natural language processing. They weren't designed to do anything else, so the other stuff is a bonus; any additional capability must be implicit in the text the system was trained on.

For these reasons and more, it seems unlikely to me that LLM technology alone offers a path to "true AI." They don't truly exist in our world, and they don't even realize it. If you leave an LLM mid-conversation and go on vacation for a week, they won't wonder where you are. They're unaware of the passage of time and, in fact, they're unaware of anything. They're a computer program that does literally nothing until you type a prompt, then simply computes a response to that prompt, at which point they return to doing nothing. Their encyclopedic knowledge of the world remains at the point they were trained. They know nothing beyond that.

Furthermore, LLMs have never experienced anything. They're simply programs that have digested unimaginable amounts of text. LLMs may do a fantastic job of describing the feeling of being drunk, but that's because they've read countless descriptions of it. They haven't experienced it themselves, and they can't. They have no other goal than to formulate the best possible response to whatever prompt you give them.

That doesn't mean they aren't impressive (which they are) or can't be useful (which they are). I actually believe we're at a real technological turning point. But let's not confuse these genuine achievements with "real AI." LLMs may be one ingredient in the formula for true AI, but they're certainly not the whole formula, and I suspect we don't yet know what some of the other ingredients are.