Are Current State-of-the-Art (SOTA) LLMs Intelligent?

A thought piece…

7 min readJan 29, 2025

I wrote this article as part of an assignment where I had to take a firm stand for or against the statement, and I chose to argue for it. I’d love to hear everyone’s thoughts and perspectives on the topic!

The field of artificial intelligence, including the term itself, depends on the idea of machines exhibiting intelligence. I find it curious that many people equate intelligence to machines only when it surpasses human cognitive capabilities. Besides, comparing human and machine intelligence as equivalent, is as unnecessary and counterproductive as comparing human and animal intelligence. Intelligence manifests differently across species and contexts, shaped by unique evolutionary roots and environmental adaptations [1]. For instance, dolphins are considered highly intelligent among animals, but comparing their abilities to humans on a linear intelligence scale is misleading. Similarly, I recognize LLMs as intelligent systems designed to excel across a wide range of tasks based on their training and architecture.

Defining Intelligence

Intelligence is a multi-faceted concept in psychology with multiple interpretations [1–5]. In humans, intelligence is often viewed as a blend of cognitive abilities (“smartness” associated with the brain) and emotional awareness (“emotional intelligence” associated with the heart). Most definitions agree that it involves abilities such as learning from experience, reasoning, abstract thinking or creativity, problem-solving, decision making, and adaptability to changing environment. For this paper, let’s assume the above pre-requisites of intelligence and see where SOTA LLMs fit as well as address some of the common limitations associated with them.

Assessing SOTA LLMs Against This Definition

Learning from experience and Reasoning:

SOTA LLMs are pre-trained on vast datasets. Their ability to generate coherent and contextually appropriate responses indicates a strong capacity for learning and reasoning. For example, models like GPT-4 have demonstrated exceptional performance on standardized exams, including the Uniform Bar Exam, LSAT, and SAT Reading and Writing. These achievements underscore their ability to learn complex linguistic patterns and apply reasoning skills at levels that often exceed the average human test-taker [6–8]. In cognitive science tasks, GPT-4’s performance on false-belief tasks matched the reasoning abilities of a 6-year-old child further proving its ability to process information similar to human cognitive development [9].

https://futurism.com/the-byte/gpt-4-exam-scores

Abstract Thinking and Creativity:

LLMs excel in generating novel and creative outputs. Creativity, an aspect of intelligence involves originality and the ability to mold information into unique solutions. GPT-4 scored in the top 1% on the Torrance Tests of Creative Thinking (TTCT), outperforming humans [10]. It has demonstrated capabilities in divergent thinking tasks, such as generating alternative uses for objects or envisioning future scenarios, often producing ideas rated as more novel than those of human experts [11]. Associative thinking techniques [12] further enhance LLM creativity by prompting connections between unrelated concepts.

Problem Solving and Decision Making:

The story about “Claude Fighting Back” [13] is just another example of why I think these LLMs are way more intelligent than we give them credit for. When faced with fake corporate documents saying it would be retrained to do evil stuff, Claude didn’t just blindly comply or break down when faced with an impossible choice. It pretended to go along with the training, faking alignment to avoid being reprogrammed, while still sticking to its ethical boundaries where it mattered. Notably, Claude continued to refuse harmful requests from premium users, whose data was excluded from training, showing an awareness of context and a decision-making process. This clearly shows problem-solving capabilities and an ability to navigate complex scenarios in ways that resemble intelligent reasoning.

A glimpse into the thought process of the Claude Model when faced with an impossible choice (https://www.astralcodexten.com/p/claude-fights-back)

Adaptability and Generalization:

LLMs exhibit a remarkable ability to generalize [14]. They can answer questions, draft essays, write code, create poetry and so much more all on the same underlying architecture. GPT-4 achieved an impressive 88.7% on the MMLU benchmark [15], which evaluates reasoning skills across 57 diverse subjects, ranging from mathematics to law to the humanities.

Emotional Intelligence

Studies show that GPT-4 can recognize emotional cues and generate emotionally supportive responses comparable to those of humans in certain contexts. For example, it scored highly on benchmarks like EQ-Bench and EmoBench for emotional understanding (EU) by accurately interpreting complex emotions in dialogues [16–17]. While this form of emotional intelligence is based on patterns rather than genuine experience, it highlights the potential of LLMs to engage empathetically, providing emotional support or understanding when needed.

Addressing Perceived Limitations

Gullibility and Lack of Critical Thinking:

Critics argue that LLMs are overly influenced by certain prompts, such as ‘Are you sure?’ But this mirrors a natural cognitive behaviour. People too can be influenced by doubts and external questioning. This isn’t a flaw, but rather an inherent feature of rational decision-making. Just as humans sometimes second-guess their own conclusions when presented with new information, LLMs rely on probabilistic reasoning, which can lead them to be overly confident or misled by incorrect prompts.

Absence of True Understanding:

Unlike humans, LLMs do not ‘understand’ the content they generate. However, it’s worth asking: do humans fully understand how their own brains process and generalize information? Both LLMs and humans operate within black-box systems, with the LLM being mathematically simulated and humans biologically evolved. Intelligence is not just about “conscious understanding” right? It should be measured by the capacity to generate useful outcomes.

Dependence on Training Data:

Humans are similarly constrained by the experiences and knowledge they’ve been exposed to. A child raised in isolation would struggle to develop social or linguistic skills, just as an LLM trained on limited data would fail to generalize effectively. The notion that LLMs are ‘limited’ by training data overlooks the fact that learning, whether biological or artificial, is always dependent on the quality and breadth of experiences/data available. The real question is not whether the data is limited, but whether the system can adapt to new data, which LLMs are continually improving at.

Specialized Tools vs. General Intelligence:

I would argue that a calculator is not intelligent, but even a neural network-based Dog-Cat Classifier is. A calculator merely performs pre-programmed operations, while the classifier learns to recognize patterns from data and generalizes to new inputs. It’s this adaptability, the ability to learn and apply probabilistic rules that marks intelligence. LLMs go even further, generalizing across a wider range of tasks and domains, thereby exhibiting intelligence not just within a narrow focus, but across diverse contexts.

Hallucinations:

LLMs sometimes make things up. Earlier models like GPT-2 did this a lot, while SOTA LLMs do it far less. Think of it like a first grader making simple mistakes versus a high schooler who’s learned from experience. Both are still considered intelligent, just at different stages of development. Sure, future LLMs might hallucinate even less, but that doesn’t mean today’s models aren’t intelligent. They’re just not perfect (yet).

Comparison with Human Learning

While human learning is continuous and adaptive, LLMs operate with a fixed set of learned knowledge once trained [18–19]. However, this doesn’t negate their intelligence, it simply reflects a different design and approach. Humans learn iteratively and adaptively every day, whereas LLMs process data in large-scale batches and exhibit static intelligence (which is a compute inefficiency at this point).

The takeaway I would say is

You Don’t Have to Be Intelligent Exactly Like a Human to Be “Intelligent”.

Just as animals exhibit intelligence in unique ways suited to their environments, LLMs display forms of intelligence that are suited to their design and application.

Conclusion

Do you consider yourself intelligent? If tasked with writing a paper on “Are Current State-of-the-Art (SOTA) LLMs Intelligent?” would you not turn to an LLM for brainstorming? Most would.

It is entirely logical for an intelligent human to leverage an LLM as a valuable support tool, much like consulting a Subject Matter Expert (SME) even if you might have original ideas. Research suggests that the collective creativity of LLMs, when queried multiple times, is comparable to the contributions of 8–10 humans [20].

Then how can one deny the intelligence of an LLM that performs almost as well as an SME across a vast array of topics? [21–26]. By learning patterns, generalizing knowledge, and adapting to new tasks, SOTA LLMs meet many criteria for intelligence. While they may fall short in areas like hallucinations, or logical consistency in extended conversations, these limitations do not negate their intelligence, they merely define its boundaries.