
Entropy is one of those words that feels like it should come with a user manual. Physicists use it to talk about heat and the arrow of time. Engineers use it to talk about bits and compression. Then you discover that the founders of information theory and cybernetics—Claude Shannon and Norbert Wiener—were close enough in mathematics to share the same formula, yet far enough apart in intuition to argue about the sign.
In October 1948, after reading an advance copy of Wiener’s book, Shannon wrote to Wiener:
you call information “negative entropy,” I use the “regular entropy formula”… but I suspect it’s just “complementary views.”
Wiener replied:
yes—“purely formal.”
The minus sign wasn’t the point. The explanatory target was. Shannon was measuring information. Wiener was trying to measure something else—something we already have a word for: knowledge.
Two questions, one equation
Shannon was solving an engineering problem:
- Given a source that produces messages with known probabilities, how many bits per symbol do you need to represent them?
- Given a noisy channel, what rate can you reliably transmit?
That’s why Shannon says the “semantic aspects” of communication are “irrelevant to the engineering problem.”1 His information is deliberately about choice and uncertainty, not meaning.
Wiener was asking something different:
- How do organisms and machines stay organized in a noisy world?
- How does feedback fight the drift toward disorder?
That’s why Wiener calls information “negative entropy”—a force that maintains order. But “order” is vague. What Wiener actually cared about was: which signals help a system survive?
Not all signals do. Most don’t. The ones that do are the ones that update your model of the world in ways that let you act effectively.
That’s not information. That’s knowledge.
The math they shared
Let an outcome have probability . The “surprise” of seeing it is:
Rare events carry more surprise.
Entropy is the average surprise:
Shannon uses this to prove coding theorems. Wiener uses it to connect probability to thermodynamics. Same object, different purposes.
What Shannon’s entropy actually measures
Shannon’s framing is almost aggressively clean: a message is “one selected from a set of possible messages.” That’s it. No meaning required.
Higher entropy means:
- The source is harder to predict
- You need more bits to encode a typical message
- The message space is “wider”
Here’s the counterintuitive part: in Shannon’s world, a maximally random source is maximally informative. Pure noise requires the most bits to specify.
This feels backward because everyday English uses “information” to mean “something useful.” Shannon explicitly doesn’t. Warren Weaver called this the “semantic trap.”2
What Wiener was actually after
Wiener starts from control and life. A thermostat survives by staying calibrated. An organism survives by updating its model of predators and food. Feedback systems persist by learning.
So when Wiener talks about “information as negative entropy,”3 he’s not just talking about pattern or order. He’s talking about the kind of signal that helps a system maintain itself.
A random bitstream has high Shannon entropy. But it teaches you nothing. It doesn’t help you predict, act, or survive. In Wiener’s instinct, it’s not information at all—it’s noise.
What Wiener wanted to capture was: signals that reduce uncertainty about things that matter.
We have a word for that. It’s knowledge.
Knowledge is a subset of information
Think of it this way:
- Information (Shannon): any reduction in uncertainty about what message was sent
- Knowledge (Wiener’s real target): reduction in uncertainty that improves your model of the world
All knowledge is information. But not all information is knowledge.
A cosmic ray flipping a bit in your hard drive is information—it reduces uncertainty about the state of that bit. But it’s not knowledge. It doesn’t help you do anything.
A friend telling you it’s raining when you’re about to leave the house—that’s knowledge. It’s information that updates your beliefs in a way that enables better action.
The difference isn’t in the math. It’s in the relationship between the signal and the receiver.
A toy example: one biased coin, two stories
A coin comes up heads with probability . Shannon entropy:
If , entropy is 1 bit. Maximum uncertainty.
If , entropy is tiny. The coin is boring. You can compress a long sequence into almost nothing.
Shannon: high entropy = more bits needed = more “information” per flip.
But now imagine you’re betting on the coin. If you know the coin is biased 99% heads, you have knowledge. That knowledge lets you win. The low-entropy source is more valuable to you—not less—because its predictability is something you can exploit.
What matters isn’t how many bits the source produces. It’s whether those bits update your model in a way that helps you act.
That’s why Wiener’s intuition pointed toward negentropy. Organization. Structure. But the better word is knowledge—because knowledge is exactly the information that creates useful structure in a mind.
Why this distinction matters
People argue about “information” because the word is doing two jobs:
- Shannon’s job: count bits. Any reduction in uncertainty counts.
- The job Wiener wanted: count value. Only signals that improve your model count.
Once you see this, confusions dissolve:
- A string of random characters has high Shannon information but zero knowledge. You can’t learn from noise.
- A scientific theory might compress to a few kilobytes but contain immense knowledge. It predicts. It enables action. It reduces uncertainty about things that matter.
The Shannon/Wiener “disagreement” wasn’t about a minus sign.4 It was about whether the word information should include meaning and value—or remain purely syntactic.
Shannon chose syntactic. That was the right call for engineering.
Wiener wanted semantic. But he didn’t have the vocabulary to say it cleanly.
I’m suggesting the vocabulary: what Wiener called “negative entropy” or “information-as-order” is better named knowledge.
The practical upshot
Next time someone says “information is entropy,” ask: Shannon’s information or Wiener’s?
If they mean bits—uncertainty about which message was sent—use Shannon. That’s the right tool for compression, coding, and channel capacity.
If they mean valuable signal—the kind that updates models, enables action, and fights noise—they’re talking about knowledge. And knowledge isn’t measured by bits alone. It’s measured by what the receiver can do with it.
Shannon gave us a theory of information.
Wiener was reaching for a theory of knowledge.
We’re still building that one.
Footnotes
-
Claude Shannon, “A Mathematical Theory of Communication” (1948) ↩
-
Claude Shannon and Warren Weaver, “The Mathematical Theory of Communication” (1949) ↩
-
Norbert Wiener, “Cybernetics” (1948) and “The Human Use of Human Beings” (1950) ↩
-
Ronald Kline, “The Cybernetics Moment” (2015) ↩