A Useful Viewpoint of the Word 'Intelligent'

The universe trends toward disorder. Something pushes back. We call it knowledge, and we have no theory of how it works.

This essay builds on prior work.

Two Types of Entropies (information vs. knowledge), The Thing That Fights the Dark (knowledge as physical force), and The Two Loops (the architecture by which systems keep creating new knowledge) ground what we mean by knowledge and how systems generate it. This piece asks how to measure the systems that wield it.

In 2023, the biologist Michael Hochberg published A Theory of Intelligences, a paper that offers something most attempts to define intelligence have not. Not just a definition in one sentence, but a working framework that actually generalizes across substrates.

Prior attempts mostly treated intelligence as a kind of thing. Spearman’s g-factor in 1904. Wechsler’s “global capacity” definition in 1944. Sternberg’s triarchic theory. Gardner’s multiple intelligences. Each carved the supposed substance into named portions. None of them extended cleanly to bees, immune cells, or foundation models, because each was a taxonomy of a presumed human possession.

Hochberg starts from a different question. Not “what is the thing called intelligence?” but “what does an intelligent system do?” His answer is one sentence. Intelligence is the resolution of uncertainty producing a result or goal. Whatever process reduces uncertainty toward an outcome qualifies, regardless of substrate. From that opening, the paper builds a framework with enough internal structure that you can apply it consistently to a thermostat, a chess engine, an octopus, a foundation model, or a civilization.

Three reasons this perspective is worth carrying around.

It applies to any system. A brain, an immune cell, a foundation model, a city’s traffic network, an institution. All score on the same measure, at vastly different magnitudes. The endless argument about whether something is “really” intelligent becomes a quantitative question.
It decomposes into operations you can describe. Intelligence is not a single number. It is a profile across two underlying capacities, solving and planning, scaled by the difficulty of the goal pursued. Different systems have different profiles, and the profiles are informative rather than gatekeeping.
It dissolves the AGI and superintelligence muddle. Both popular concepts assume intelligence is a substance some systems have more of. Once it is a profile across operations relative to difficulty, the “moment of arrival” or “threshold” loses its referent. There is no substance to threshold, only scores to climb on each capacity, in each domain.

The rest of this essay walks through Hochberg’s framework as the working architecture, builds out what the two capacities mean, then uses the result to clean up some popular confusion about AI.

Intelligence as uncertainty resolution

Probability distribution before and after an action. Top panel: a broad distribution across many possible outcomes. Bottom panel: the same outcome-space, with probability mass collapsed onto a narrow region after action A.

Intelligence as the resolution of uncertainty. Before the action, the system’s state is broadly uncertain. After the action, uncertainty has collapsed onto a smaller region. The size of that collapse, measured in bits of mutual information, is what the measure tracks.

Start with the central image. An intelligent system, in Hochberg’s framework, is one that takes a state of the world that could be many ways, and resolves it toward one specific way that constitutes a goal.

A thermostat resolves the uncertainty “is the room at the target temperature.” Most of the time, no. The thermostat closes a relay, the heater runs, the room warms, the uncertainty is resolved. Small uncertainty, small resolution, small intelligence. But intelligence by the measure nonetheless.

A surgeon resolves the uncertainty “where should the next cut go.” The state of the patient’s body is enormously high-dimensional. The decision space is rich. Years of anatomy, technique, and case experience have been encoded into the judgment that lands each cut on the right millimeter. Large uncertainty, large resolution, large intelligence.

A foundation model resolves the uncertainty “what is the next token in this sequence.” Trained on a vast corpus, the model has absorbed the statistical regularities of language, code, and image. Each prediction is an act of uncertainty reduction. At scale, across millions of contexts, the cumulative resolution is enormous, though uneven across domains.

The same operation is happening in each case. A system, equipped with priors and skills, takes a state that could be many ways, and produces an action or output that narrows the possibilities to one. Hochberg’s framework names that operation precisely so that the differences in magnitude become the interesting question rather than the disqualifying one.

The two capacities, solving and planning

A goal rendered as a network of subgoals. Start node S on the left, three columns of intermediate nodes, goal node G on the right. The actual path is a solid line through one node per column. The optimal path is a dotted line through the highest-information nodes. Each intermediate node carries an information capacity Û.

Hochberg’s central image. A goal is a network of subgoals. Solving operates at each node (how much of the resolvable uncertainty does the system actually resolve there). Planning operates across the path (which sequence of nodes does the system choose).

Hochberg’s central decomposition is into two capacities a system can have.

The first is solving. Solving is the resolution of local uncertainty at a single subgoal. Each chess move is an act of solving. Opening a jar is solving. Recognizing a face is solving. Catching a thrown ball is solving. The system takes the current state, applies whatever priors and skills it has, and produces an action that reduces uncertainty about that single step.

Hochberg writes solving quality as a simple ratio across subgoals. Imagine a goal broken into $N$ steps. At each step $n$ , some amount of uncertainty could be resolved (call this $\hat{U}_{n, y}$ , the resolvable information at step $n$ for goal $y$ ) and some amount actually is resolved (call this $U_{n, y}$ ). Solving quality is the average across all steps of the fraction the system managed.

U_{y} = \frac{1}{N} n = 1 \sum N \frac{U _{n, y}}{U ^ _{n, y}}

Read in prose. Solving quality $U_{y}$ is the system’s average score across $N$ subgoals, where each subgoal score is the fraction of the resolvable uncertainty the system actually resolved. If the system resolves all the available uncertainty at every step, $U_{y} = 1$ . If it resolves nothing, $U_{y} = 0$ . Real systems land somewhere in between. A thermostat that closes its relay when the room is below the setpoint resolves close to 1 of the relevant local uncertainty per cycle. A junior chess player faced with a complex position resolves a smaller fraction than a grandmaster faced with the same position. Same problem, different $U_{y}$ .

The second is planning. Planning is the sequencing of subgoals optimally toward an ultimate goal. A 30-move chess sequence to checkmate is planning. Designing a research program is planning. Coordinating a moon launch is planning. Planning is about choosing the right path through a graph of possible steps, not about making the moves along the path.

Hochberg writes planning quality as a similar ratio, comparing the actual path the system chose against the optimal one. The actual path has $N$ nodes and carries information $\hat{U}_{n, y}$ at each. The optimal path has $G$ nodes and carries information $\tilde{U}_{g, y}$ at each. Planning quality is the average information per step on the actual path, divided by the average information per step on the optimal path.

A_{y}^{G} = \frac{\sum _{n = 1}^{N} U ^ _{n, y} / N}{\sum _{g = 1}^{G} U ~ _{g, y} / G}

Read in prose. Planning quality $A_{y}^{G}$ approaches 1 when the system’s chosen path matches the optimal one (every step is as informative as possible). It drops when the path takes detours, or when the system selects steps that carry less information than the best available alternatives. A chess engine that finds a near-optimal 30-move mating sequence has high $A_{y}^{G}$ . An engine that wanders into wasted moves before stumbling onto checkmate has low $A_{y}^{G}$ , even if it eventually wins.

Hochberg then combines solving and planning into a single intelligence measure for system $x$ addressing goal $y$ .

I_{x, y} = U_{x, y}^{R} (α + β A_{x, y}^{G})

The superscript $R$ on $U$ marks the relevance-weighted version of solving (each subgoal weighted by how much it contributes to the ultimate goal). $α$ and $β$ are weights that sum to 1, encoding how much planning matters for the goal class in question. When $β = 0$ , intelligence is pure solving (the system is reactive, like a thermostat). When $β \to 1$ , planning becomes essential to the score (the system is grappling with multi-step goals where path quality is everything).

The shape of the equation encodes a structural claim. Solving is always necessary, which is why it sits as the multiplicative root. Planning is an amplifier on solving, with a weight that depends on how complex the goal happens to be. A system can be highly intelligent on simple goals through solving alone. On complex goals, the same solving quality combined with poor planning gives a much lower intelligence score, because the multiplier shrinks.

The two capacities can come apart. A thermostat does pure solving, one bit of uncertainty per cycle, with no planning. A chess engine running minimax does both, solving at each node of its search tree and planning the sequence through them. A human surgeon does massive solving (each cut) embedded in massive planning (the operative sequence). A pure planner without a solver is useless because the path cannot be walked. A pure solver without a plan can be highly effective on local problems but cannot string them into long arcs.

This is why Hochberg’s framework generalizes cleanly. Different systems exhibit different solving-and-planning profiles, and the profile is what we want to talk about when we use the word “intelligent.” The thermostat scores high on solving for its tiny problem and zero on planning. The civilization scores extraordinarily high on both, distributed across billions of brains and many generations. The foundation model scores high on solving for retrieval and pattern completion, lower on planning for tasks that require long-horizon goal pursuit.

Hochberg also folds in a familiar pair from the cognitive-psychology literature, crystallized intelligence (stored knowledge a system can draw on) and fluid intelligence (reasoning a system can do on the fly). These are not new capacities. They are the resources solving and planning draw from. Crystallized resources let solving and planning happen faster on familiar problems. Fluid resources let them happen at all on novel ones. A system’s intelligence profile is, in the end, a description of how its solving and planning compose its crystallized and fluid resources to address whatever goal is in front of it.

Or build your own path. Click one node in each of the three middle columns to see how solving rate 𝕌, planning quality 𝔸, and combined intelligence 𝕀 change with each choice. Try the optimal path, then try a deliberately bad one to feel the effect on 𝔸.

Difficulty and the intelligence niche

Hochberg adds one more move that turns out to matter for the whole framework.

Performance on a goal cannot be assessed in absolute terms. A goal is difficult relative to the system’s capacity. A child solving 2+2 is exhibiting genuine intelligence at their level. A chess grandmaster doing the same arithmetic is exhibiting nothing in particular. The same problem, the same correct answer, vastly different intelligence reading.

In Hochberg’s framework this comes out as a ratio between the intrinsic complexity of the goal and the expected ability of the system addressing it.

D_{y} = \frac{C _{g, y}}{Q _{x, y}}

Read in prose. Difficulty $D_{y}$ of goal $y$ is the goal’s intrinsic complexity $C_{g, y}$ (the minimum information required to achieve it) divided by the system’s expected ability $Q_{x, y}$ on goals like $y$ . When $D_{y} > 1$ , the goal exceeds the system’s typical capacity (genuinely hard). When $D_{y} < 1$ , the goal sits inside the system’s wheelhouse (easy). The 2+2 example reads cleanly here. The complexity $C$ of “2+2” is the same in both cases. The ability $Q$ of the child is small and the ability $Q$ of the grandmaster is large, so $D$ is enormous for the child and tiny for the grandmaster. Same problem, vastly different difficulty, and therefore vastly different intelligence reading once we adjust for it.

Side-by-side comparison. Child column and Grandmaster column. Each shows the same goal complexity bar C for "solving 2 + 2". The ability bars Q differ vastly. For the child, Q is short relative to C so the difficulty ratio D = C / Q is much greater than 1, labeled "genuinely hard". For the grandmaster, Q is much taller than C, so D is much less than 1, labeled "trivial".

Difficulty is a ratio, not an absolute. The same problem produces vastly different intelligence readings depending on whose capacity is being stretched.

Intelligence shows up in the way a system handles goals that stretch its capacity, not in the way it handles trivia. The difficulty ratio is the move that lets the framework span thermostats and brains without collapsing into a single ranking. Each system is rated against goals appropriate to its capacity.

The relativizing move also introduces what Hochberg calls an intelligence niche. Just as ecological organisms occupy niches defined by what they eat and what eats them, intelligent systems occupy niches defined by the goals they pursue and the capacities they bring. A bee’s intelligence niche is finding flowers, communicating their location, and producing honey. A foundation model’s intelligence niche is pattern completion across the corpus it was trained on. A human civilization’s intelligence niche is more or less the entire physical world it can reach.

This is the move that lets the framework stop being a comparison and start being a description. We are not asking whether a bee is “more intelligent” than a foundation model. We are asking how well each performs the solving and planning required by its own niche.

A spectrum of solvers and planners

Once intelligence is a profile rather than a substance, we can lay systems on a continuum without forcing them into ranks.

Intelligence as a profile across capacities, not a single score. Different systems sit at different positions, each shaped by its own intelligence niche. None of them collapse into a single ranking.

Physical systems do bare solving. A crystal forming from a supersaturated solution resolves an enormous amount of local uncertainty about molecular position. There is no goal in the agent-directed sense, but the operation, by Hochberg’s framework, qualifies as the most reactive form of solving. The substrate of physics already supports the most rudimentary form of the measure.

Biological systems develop active solving and the beginnings of planning. A bacterium navigating a chemical gradient solves local uncertainty about which direction is up the gradient. An animal hunting prey integrates many such resolutions into a sequence that approximates planning. Mammals exhibit clear hierarchical planning. Primates extend it further. The phylogenetic record traces the gradual emergence of planning out of solving.

Human cognition is the substrate where hierarchical planning becomes the dominant mode. Each generation inherits an enormous prior knowledge stock through language, writing, and institutions, then uses it to plan over horizons no individual brain could compute on its own. Civilization is a distributed planning machine running on a vast retrieval and execution substrate.

Artificial systems sit at an interesting point on this spectrum. Foundation models do massive solving across many domains, scoring extremely high on retrieval and pattern completion. Their planning is shallow in default mode and extends with scaffolding (chain-of-thought, agentic architectures) at some cost to solving quality. A pocket calculator does perfect solving on arithmetic and nothing else. A chess engine plans deeply within a tiny goal space. Each artificial system has its own profile, none of which maps onto a single “intelligence level.”

The continuum is the point. The category is the artifact.

Why this dissolves the AGI and superintelligence muddle

The trouble with the categorical view of intelligence is that it produces two concepts that should not exist.

The first is artificial general intelligence, usually shortened to AGI. AGI is supposed to name the moment a system gains some general-purpose form of the substance, the g-factor scaled to a machine. Once intelligence is a solving-and-planning profile across niches, the concept has nothing to refer to. There is no single quantity any system possesses some amount of. A system that solves and plans well across many different niches is genuinely impressive and useful. Calling it “generally intelligent” smuggles back the broken assumption that the impressive part is a special category, when it is just high scores on the underlying capacities across many goal-classes.

The second is superintelligence. Superintelligence imagines a system that has more of the substance than any human, by enough to be qualitatively different. The framing makes sense if intelligence is a substance you can have more of. It makes very little sense if intelligence is a profile of solving-and-planning performance across niches. A foundation model already scores far higher than any individual human on certain solving tasks. A pocket calculator already scores higher than any human on arithmetic. A chess engine already exceeds every grandmaster on chess. None of these gets called superintelligent in the way the term usually implies. They have very high scores on a narrow set of operations within a narrow set of niches, and ordinary or poor scores elsewhere. A future system that scores higher on more operations across more niches will be more broadly capable, not categorically different.

This is not pessimism about AI. The systems being built right now are the most powerful knowledge tools humans have ever made. They are reshaping how people work, how research gets done, and how software and art get made. The exponential growth in productivity and value that follows from deploying them is real and visible, and there is no reason to think it slows down soon. Arvind Narayanan and Sayash Kapoor have argued in AI as Normal Technology that AI’s effects propagate through the same channels every other transformative technology has used, namely adoption, infrastructure, regulation, and recombination with existing tools. Treating AI as a unique kind of being, savior or threat, distorts the conversation. Treating it as a tool with high solving-and-planning scores on certain niches clarifies it.

The projections that depend on a moment of AGI arrival or a superintelligence threshold are groundless because the substance the projections are about does not exist. The progress is real, the value creation is real, and the curves keep climbing. The category jump is a category error.

Proxies, and the bridge to the next series

Hochberg makes one final move that turns out to matter for everything that follows in this work.

Solving and planning do not have to be performed by the system itself. They can be performed by proxies the system uses. Hochberg explicitly argues that environment, technology, society, and collectives are essential to a general theory of intelligence. A human with a calculator solves arithmetic better than a human without one. A scientist with a peer-review network plans research better than a scientist alone. A civilization with libraries, search engines, and laboratories scales its solving and planning by orders of magnitude beyond what any individual could.

The proxies are themselves embodied knowledge. They take cognitive operations that would otherwise have to happen inside a brain and put them in the world, where they can be inspected, improved, and inherited. The capacity of a system to find good proxies and integrate them is itself part of the intelligence profile, and arguably the part that has done the most to lift human civilization above its biological baseline.

Hochberg’s full architecture. A single Controller is embedded in four layers of proxies (environment, transmission, evolution) that augment its solving and planning. The next series traces what happens at each of these layers. (Hochberg 2024, Fig 9)

This is the natural pivot to the next series, Evolutionary Tools, which examines what those proxies actually are and how they accumulate. The opening piece, Tools that Encode, Retrieve and Activate, introduces the cognitive-science framework for understanding what tools do (encoding of experience, retrieval in context, activation against the world). The middle piece, The Stack, traces the seven-layer architecture through which civilizations absorb new forms of intelligence into value. The closing piece, Let There Be Light, examines the new kind of proxy that has emerged in the past few years, one rich enough to do parts of the solving and planning itself.

What we have called “intelligence” throughout this piece is the measure. What the next series traces is the artifacts that score on it.

This is the fourth piece in the Infinite Knowledge series, following Two Types of Entropies, The Thing That Fights the Dark, and The Two Loops. For the proxies the framework points toward, see the Evolutionary Tools series, opening with Tools that Encode, Retrieve and Activate. The framework here is Michael Hochberg’s Theory of Intelligences. Corroborating measurement-based work includes François Chollet’s On the Measure of Intelligence (intelligence as a ratio of skill to experience). On the broader framing of AI as a normal transformative technology rather than a unique kind of being, see Arvind Narayanan and Sayash Kapoor’s AI as Normal Technology.

Table of Contents

A Useful Viewpoint of the Word 'Intelligent'

Table of Contents

Intelligence as uncertainty resolution

The two capacities, solving and planning

Difficulty and the intelligence niche

A spectrum of solvers and planners

Why this dissolves the AGI and superintelligence muddle

Proxies, and the bridge to the next series

Backlinks

Graph View