
The universe trends toward disorder. Something pushes back. We call it knowledge, and we have no theory of how it works.
Here are two strings of text.
String A: G7&xQ2!pLm@9rWv$KjZ3*nYf
String B:
String A requires more bits to encode. By the most rigorous mathematical definition we have — Claude Shannon’s — it contains more information. String B can be compressed to almost nothing.
And yet you could memorize String B and reshape your understanding of the universe. String A teaches you nothing at all.
If that feels like a problem, you’ve just discovered a crack in the foundation of information theory that has been hiding in plain sight since 1948. Two of the twentieth century’s greatest minds — Shannon and Norbert Wiener — spent their careers on opposite sides of it.
The engineering problem
Shannon’s project was precise and bounded.1 Given a source that emits symbols with known probabilities, how many bits per symbol do you need to encode them faithfully? Given a noisy channel, at what rate can you transmit without losing data?
His answer: entropy.
Average surprise per symbol. Higher entropy means harder to predict, more bits needed, wider message space. This is a measure of syntactic complexity — the structure of signals, with no reference to what they mean. Shannon said so explicitly: “the semantic aspects of communication are irrelevant to the engineering problem.” He was right. For compression and transmission, meaning is noise.
The formula turned out to be formally identical to Boltzmann’s entropy in thermodynamics — not just analogous but operationally equivalent. A 2025 study showed that compressing molecular-dynamics trajectories reproduces thermodynamic entropy exactly. Shannon information and Boltzmann entropy are the same thing in different units.
The survival problem
Norbert Wiener, the father of cybernetics, started from a completely different question.2 Not: how do I encode signals? But: how do organisms and machines stay organized in a world that trends toward disorder?
Thermostats. Nervous systems. Economies. Immune systems. All are feedback systems — they persist by sensing their environment, comparing what they sense to what they need, and adjusting. They persist, in a word, by learning.
Wiener called information “negative entropy” — a force that maintains order. “Just as entropy is a measure of disorganization,” he wrote, “the information carried by a set of messages is a measure of organization.”
Shannon called this a “mathematical pun.” In October 1948, after reading an advance copy of Wiener’s Cybernetics, he wrote:
you call information “negative entropy,” I use the “regular entropy formula”… but I suspect it’s just “complementary views.”
Wiener replied:
yes, “purely formal.”
But it wasn’t purely formal. They were using the same equation to measure different things.
The coin that reveals the gap
A coin comes up heads with probability . Shannon entropy:
At : entropy is 1 bit. Maximum uncertainty. Maximum Shannon information per flip.
At : entropy is ~0.08 bits. The coin is predictable. Boring. Almost no Shannon information per flip.
Now imagine you’re betting on this coin.
If you don’t know the bias, each flip is just noise to you — high Shannon information, zero value. If you do know it lands heads 99% of the time, you have something Shannon’s formula cannot capture. You have knowledge. And that knowledge lets you win.
The low-entropy source is more valuable precisely because it’s predictable. Predictability is what lets you exploit it. Shannon’s entropy counts bits. It cannot count this.
What makes a signal knowledge isn’t in the signal. It’s in the relationship between the signal and the system that receives it.
Warren Weaver, who co-authored the popular exposition of Shannon’s work, identified the problem early. He called it the “semantic trap”:3 the word “information” was doing a job in Shannon’s mathematics that had almost nothing to do with the job it does in ordinary language.
A cosmic ray flipping a bit in your hard drive: Shannon information. A friend telling you it’s raining when you’re about to walk out the door: also Shannon information — far fewer bits, in fact. But one is noise and the other changes what you do.
Shannon’s theory cannot tell them apart. It was never designed to.
The word Wiener needed
Wiener sensed the gap. His entire project — cybernetics — was about systems that distinguish useful signals from useless ones. A thermostat ignores sounds and smells; it attends only to temperature. An immune cell ignores temperature; it attends to molecular signatures. Every feedback system selects the signals that matter for its survival and discards the rest.
What Wiener wanted was a theory of signals that reduce uncertainty about things that matter — signals that update a system’s model of the world in ways that enable effective action.
He called this “negative entropy.” He called it “organization.” He reached for the concept repeatedly but never quite named it cleanly. He didn’t need a new formula. He needed a different word.
The word is knowledge.
Gold inside metal
Information (Shannon): any reduction in uncertainty about which message was sent. All signals count equally — noise and insight carry the same currency, measured in bits.
Knowledge (what Wiener was reaching for): reduction in uncertainty that improves your model of the world and enables action.
All knowledge is information. Not all information is knowledge.
This is technically a containment relationship — knowledge is a “subset” of information. But calling it a subset reverses the importance, the way calling gold a “subset” of metal reverses the importance. Gold is rarer, and it’s what you dig for.
A scientific theory might compress to a few kilobytes. Almost no Shannon information. But immense knowledge — it predicts, it enables action, it reduces uncertainty about things that matter.
A string of random characters carries maximum Shannon information and zero knowledge. You cannot learn from noise, no matter how many bits it contains.
Two theories
The Shannon/Wiener disagreement was never about a minus sign.4 It was about the explanatory target.
Shannon asked: how do I count signals? Wiener asked: how do I count the signals that matter?
Shannon chose syntax. That was the right call for engineering — and it gave us the digital age.
Wiener chose semantics. He was right about the target. He just didn’t have the vocabulary to say it cleanly.
Here is the vocabulary: what Wiener called “negative entropy” or “information-as-order” is better named knowledge. Knowledge is the information that matters — the signal that updates models, enables action, and pushes back against disorder.
Shannon gave us a theory of information. Wiener was reaching for a theory of knowledge.
We’re still building that one. And its first questions are unavoidable: if knowledge is distinct from information, what kind of physical thing is it? How does it accumulate — and why does it sometimes disappear? Does it merely describe reality, or does it push back against it?
Footnotes
-
Claude Shannon, “A Mathematical Theory of Communication” (1948). The entropy formula measures the average surprise per symbol from a discrete source with probabilities . ↩
-
Norbert Wiener, “Cybernetics” (1948) and “The Human Use of Human Beings” (1950). Wiener’s concept of information as negative entropy — a measure of organization — provided the conceptual foundation for cybernetics. ↩
-
Claude Shannon and Warren Weaver, “The Mathematical Theory of Communication” (1949). Weaver’s introductory essay popularized Shannon’s work and flagged the distinction between technical and everyday uses of “information.” ↩
-
Ronald Kline, “The Cybernetics Moment” (2015). Kline’s history documents the Shannon-Wiener correspondence and the intellectual context of their parallel but divergent projects. ↩