The Trust Problem

How a chain of unanswered questions turned a footnote in Adam Smith into one of the most powerful ideas in modern economics, and why it now haunts the builders of AI.

A service station mechanic adjusts the engine while a young woman watches, the classic information gap between expert and customer.

The mechanic knows things you don’t. That asymmetry is not incidental; it is the entire problem. (U.S. National Archives / NARA)

Somewhere in your life right now, someone is making decisions for you, and you can’t tell whether they’re doing a good job.

Your financial advisor picks funds. Your mechanic diagnoses your engine. Your employee fills out a timesheet. Your surgeon recommends an operation. Your elected representative votes on a bill. In each case, you’ve handed authority to someone who knows more than you, wants different things than you, and operates behind a curtain you cannot fully pull back.

This is the principal-agent problem, arguably the single most useful idea economics has ever produced for understanding why institutions fail, why contracts look the way they do, and why the gap between what we want and what we get never fully closes.

The story of how economists came to understand this problem is not a dry catalog of theorems. It’s an intellectual detective story spanning two and a half centuries, driven by a sequence of questions where every answer cracked open a deeper puzzle. What follows is that chain of questions, and the researchers who tried, with varying degrees of success, to answer them.

I. “What happens when someone else manages your money?”

Portrait of Adam Smith, the only known contemporary likeness, painted c. 1787.

The man who noticed the problem first, almost in a footnote. His observation about directors managing “other people’s money” sat undeveloped for 150 years before economics had the tools to explain why it mattered. (Wikimedia Commons)

The suspicion is older than economics itself.

In 1776, Adam Smith noticed something uncomfortable about the joint-stock companies of his day, the precursors to modern corporations. Their directors, he wrote in The Wealth of Nations, managed “other people’s money” and could not be expected to guard it with the same vigilance as their own. He didn’t build a theory around this. He just noted, almost in passing, that delegation breeds negligence.

The observation sat there, undeveloped, for over 150 years.

Then in 1932, two Columbia scholars, Adolf Berle and Gardiner Means, published The Modern Corporation and Private Property, documenting what had happened to American capitalism in the decades since Smith’s era. Ownership of the large corporation had been dispersed across millions of shareholders, none of whom exercised meaningful control. Management had become a self-perpetuating class. The people who ran companies and the people who owned them were, for all practical purposes, different populations with different goals.

Berle and Means posed the question starkly: if the people who bear the financial risk have no real power, and the people who have power bear almost no risk, who is the corporation actually serving?

It was a devastating observation. But it was still descriptive. It told you that something was wrong. It did not tell you why it was structurally inevitable, or what you could do about it.

II. “Can we make this precise? What exactly goes wrong when information is unequal?”

The Ziliox and Roe Motor Company storefront and used car lot, 1950.

Akerlof’s 1970 paper was about used cars. Its real subject was every market where one party knows something the other doesn’t. The seller always does. (Wikimedia Commons)

The tools to answer this question arrived from an unexpected direction: the economics of insurance.

In the early 1960s, Kenneth Arrow began studying why insurance markets behave strangely. He identified a phenomenon he called moral hazard: when you insure someone against a bad outcome, you change their behavior. A driver with comprehensive coverage drives less carefully. A homeowner with fire insurance stores oily rags in the basement. The act of transferring risk creates risk.

Arrow’s insight was profound because it revealed that information problems weren’t just market frictions; they were structural features of any relationship where one party’s actions are hidden from the other.

Then in 1970, George Akerlof published a short, deceptively playful paper called “The Market for ‘Lemons’” that nearly didn’t get published (two journals rejected it as trivial). Akerlof’s question was about used cars, but his answer remade economics. If sellers know whether their car is a lemon and buyers don’t, buyers rationally assume the worst and offer low prices. Good cars exit the market. The market unravels. This was adverse selection, the problem of hidden characteristics rather than hidden actions, and it explained phenomena from health insurance death spirals to why you can’t get a loan despite being creditworthy.

Michael Spence then asked the mirror-image question: can the informed party voluntarily reveal what they know? His job market signaling model (1973) showed that workers could use expensive education not for its content but as a credible signal of underlying ability, credible precisely because it was costly. Only high-ability workers would find the investment worthwhile.

These three, Arrow, Akerlof, and Spence, earned the 2001 Nobel Prize for establishing that asymmetric information isn’t a special case. It’s the normal condition of economic life. Every transaction, every contract, every delegation of authority operates in its shadow.

But these were still market-level analyses. The question that remained was narrower and more operational: when you must delegate a specific task to a specific person, how should you structure the deal?

III. “How should we pay people when we can’t watch what they do?”

This is where the principal-agent problem was born as a formal theory.

In 1973, the same year as Spence’s signaling paper, Stephen Ross published a paper in the American Economic Review that gave the problem its name. Ross defined the agency relationship with clinical precision: one party (the principal) engages another party (the agent) to perform a service, which involves delegating decision-making authority. He then asked what the optimal compensation schedule looks like when the principal can’t directly observe the agent’s effort.

Three years later, Michael Jensen and William Meckling published what would become one of the most cited papers in all of economics. They introduced the concept of agency costs, the total price society pays for the fact that delegation is imperfect. These costs come in three flavors: what the principal spends on monitoring, what the agent spends on bonding (credibly committing to behave), and the residual loss, the irreducible gap between what the agent does and what a perfectly aligned agent would have done. Jensen and Meckling’s radical claim was that the firm itself is nothing more than a web of such agency relationships. There is no “firm” in any deep sense; just contracts, all the way down.

But knowing that the problem exists and is costly doesn’t tell you what the solution looks like. For that, economics needed someone to solve the actual optimization problem: given that you can only see outcomes, not effort, what is the mathematically best contract?

IV. “What does the optimal contract actually look like?”

The answer came from a young Finnish economist named Bengt Holmström.

Holmström’s 1979 paper in the Bell Journal of Economics, simply titled “Moral Hazard and Observability,” solved the core mathematical problem and produced a result of startling elegance. He showed that the optimal contract works like a statistical test. Payment should rise or fall based on the likelihood ratio: how much more probable the observed outcome is under high effort versus low effort. If the outcome is one that hardworking agents produce far more often than lazy ones, the agent gets a large bonus. If the outcome is ambiguous, equally likely under either effort level, it tells you nothing and should carry no incentive weight.

This paper also contained what many consider the single most important result in incentive theory: the informativeness principle. Any available signal, any signal at all, should be included in the contract if and only if it provides additional statistical information about the agent’s effort beyond what existing signals already capture. The implication is crisp: a CEO’s bonus should depend not just on their own company’s stock price, but also on their competitors’ stock prices, because the comparison filters out market-wide noise that says nothing about managerial skill.

Around the same time, Sanford Grossman and Oliver Hart (1983) provided a complementary analysis that clarified the mathematical structure: the principal’s problem is really a two-step optimization. First, figure out the cheapest way to implement each possible effort level. Then, choose the effort level where the marginal benefit of pushing the agent harder just equals the marginal cost of the incentive scheme needed to do so.

These papers were technically triumphant. But they immediately surfaced a deeper question.

V. “Why can’t we just tie pay perfectly to results?”

Sharecroppers chopping cotton in a field, 1941.

The sharecropping share IS the incentive intensity parameter from Holmström-Milgrom, chosen to balance the landowner’s desire for effort against the tenant’s inability to bear full risk. Contract theory didn’t invent this trade-off; it named one that had existed for centuries. (Wikimedia Commons)

Because the agent is human, and humans dislike risk.

This is the incentive-insurance trade-off, the master tension at the heart of the entire theory. If the agent were risk-neutral (indifferent to gambles), the solution would be trivial: sell the agent the entire enterprise for a lump sum, and let them keep all profits. Their incentives would be perfectly aligned with the principal’s because they are the principal. This is essentially what happens with small owner-operated businesses.

But most agents are risk-averse. They prefer a certain $100,000 salary to a 50/50 gamble between $200,000 and nothing. So when you link their pay to outcomes (outcomes that depend partly on their effort but also on luck, market conditions, weather, and a thousand other uncontrollable factors) you’re forcing them to absorb risk they’d rather not bear. They demand compensation for bearing that risk, which the principal ultimately pays through higher total expected compensation.

Holmström and Milgrom’s 1987 paper gave this trade-off its definitive mathematical form. They showed that when agents can adjust their behavior continuously over time, a realistic assumption for most jobs, the optimal contract is simply linear in total output. Your commission rate, your sharecropping share, your bonus as a percentage of revenue: these aren’t crude approximations of something more complex. They’re actually optimal. And the formula for the ideal incentive rate is beautifully intuitive: incentives should be stronger when the performance measure is less noisy and when the agent is less risk-averse, and weaker when outcomes are volatile or the agent can’t stomach uncertainty.

This resolved a puzzle that had bothered theorists: why are real-world contracts so much simpler than the complex nonlinear schemes that early theory seemed to recommend? The answer is that simplicity is optimal under realistic conditions.

But the resolution opened a trapdoor into a much harder problem.

VI. “What happens when the job has parts you can’t measure?”

This is where the theory gets truly interesting, and truly unsettling.

Consider a teacher. You can measure test scores. You cannot easily measure whether students are developing curiosity, resilience, creativity, or a love of learning. Consider a police officer. You can count arrests. You cannot easily measure whether a neighborhood feels safe, whether community trust is growing, or whether rights are being respected. Consider a software engineer. You can count lines of code and closed tickets. You cannot easily measure whether the codebase is becoming more maintainable or whether the engineer is mentoring junior colleagues.

In their landmark 1991 paper, Holmström and Paul Milgrom showed that when agents perform multiple tasks that differ in how easily they can be measured, incentivizing the measurable task actively damages the unmeasurable one. This isn’t a side effect. It’s the central prediction. Strong incentives on one dimension drain effort from other dimensions.

The result that shocked the profession: if one task is sufficiently hard to measure, the optimal incentive on the measurable task can drop all the way to zero. A flat salary, seemingly the most toothless incentive scheme imaginable, becomes the first-best policy. Not because you’ve given up on motivation, but because you’ve recognized that lopsided incentives are worse than no incentives at all.

This one result explains an enormous amount about how the real world is organized. It explains why government employees earn flat salaries. It explains why universities grant tenure, removing performance pressure, for scholars whose most important output (original ideas) is nearly impossible to evaluate in real time. It explains the Wells Fargo fake-accounts scandal, where aggressive sales targets led employees to create millions of fraudulent accounts. The measured task (account openings) devoured the unmeasured task (actually serving customers).

Holmström later reflected in his Nobel lecture that this insight about multi-task incentives was perhaps his most consequential contribution, more important even than the informativeness principle, because it explained why real-world incentive systems are systematically weaker than single-task theory would predict, and why organizations rely so heavily on bureaucratic rules, norms, and job design rather than pure pay-for-performance.

VII. “Do we even need contracts? Can reputation do the work?”

Here the theory takes an elegant turn.

Formal contracts are not the only source of incentives. If you work in a labor market where future employers can observe your past performance, your desire to look competent is itself an incentive, even if your current employer pays you a fixed salary.

Holmström formalized this in a model he first circulated in 1982 and published in 1999. The setup is simple: a worker’s output depends on ability (which nobody knows for certain), effort (which only the worker controls), and luck. The labor market observes output and rationally updates its estimate of the worker’s ability. High output today means higher wage offers tomorrow.

This generates what Holmström called career concerns, implicit incentives that arise from the market’s learning process rather than from any contractual provision. And the model yields a striking prediction: career concerns are strongest at the beginning of a career, when uncertainty about ability is greatest and each new data point shifts beliefs substantially. As the market learns your type, the implicit incentive fades. This is why young associates at law firms work punishing hours without explicit bonuses, while senior partners, whose ability is already known, need equity stakes and profit-sharing to stay motivated.

But career concerns cut both ways. Robert Gibbons and Kevin Murphy showed that when implicit and explicit incentives coexist, they substitute for each other: optimal formal incentive pay is lower when career concerns are strong. More troublingly, career concerns can drive short-termism: managers may avoid risky investments that could reveal bad information about their ability, even when those investments would create long-term value.

And in repeated relationships where the principal can’t commit to future contract terms, a different pathology appears: the ratchet effect. An agent who performs brilliantly this year simply gets a higher target next year. So agents strategically sandbag, suppressing output today to avoid raising the bar for tomorrow. Laffont and Tirole formalized this dynamic, which Soviet central planners knew intimately: factory managers routinely hid production capacity to keep quotas manageable.

VIII. “Does any of this actually work in practice?”

Theory is one thing. Does it survive contact with reality?

The empirical literature produced some genuinely startling findings. In the domain of executive compensation, Jensen and Murphy’s 1990 study measured the actual pay-performance sensitivity of American CEOs and found something remarkable: for every $1,000 change in shareholder wealth, a CEO’s total compensation changed by only $3.25. By the informativeness principle, this looked absurdly low, almost as if corporate boards weren’t even trying to align incentives.

This finding may have inadvertently triggered the explosion of stock option grants in the 1990s, which dramatically increased pay-performance links. But Bebchuk and Fried’s influential 2003 critique argued that executive pay itself had become part of the agency problem rather than a solution to it. Powerful managers captured their own boards and designed compensation packages that enriched them regardless of performance, a phenomenon they called the “managerial power approach.”

Meanwhile, Bertrand and Mullainathan (2001) demonstrated a clean violation of the informativeness principle in the wild: CEOs were systematically rewarded for “luck,” industry-wide profit booms driven by oil prices or exchange rates that revealed nothing about managerial effort. Worse, this pay-for-luck was concentrated in poorly governed firms. Well-governed firms filtered out the noise, exactly as Holmström’s theory prescribed. The informativeness principle was correct; it was just being ignored by the firms that needed it most.

In labor markets, the evidence was more encouraging. Edward Lazear’s study of Safelite Glass (2000) documented what happened when a company switched from hourly wages to piece rates: output per worker jumped 44%. Part of this was incentive effects: the same workers tried harder. Part was selection: productive workers joined the firm and unproductive ones left. The two mechanisms reinforced each other, exactly as theory predicted.

In politics, the principal-agent lens illuminated why democracies work as well as they do, and why they don’t work better. Ferejohn’s 1986 model showed that rational voters who adopt simple retrospective rules (reelect the incumbent only if things went well enough) can discipline politicians even without understanding policy details. But the model also revealed the limits: term limits destroy the incentive mechanism by removing the reelection motive. Politicians approaching the end of their terms behave measurably worse.

In healthcare, the same framework explained why fee-for-service medicine produces too much treatment and capitation produces too little. Each is a different point on the incentive-insurance frontier, and neither is optimal. The decades-long struggle over healthcare payment reform is, at its core, an applied principal-agent problem with life-and-death stakes.

IX. “What happens when no contract can cover every scenario?”

The 2016 Nobel Prize laureates photographed together at the award ceremony in Stockholm, December 2016.

Oliver Hart and Bengt Holmström are somewhere in this line, the two halves of contract theory, honored in the same year for solving complementary pieces of the same puzzle. (Bengt Nyman / Wikimedia Commons, CC BY 2.0)

All the theory so far assumes that contracts, however cleverly designed, can specify payments for every observable outcome. But what about outcomes nobody anticipated? What about situations so complex that writing complete contingent contracts is physically impossible?

This is the domain of incomplete contracts, developed by Oliver Hart and his collaborators. In a foundational 1986 paper with Grossman, Hart argued that when contracts have gaps (and real contracts always have gaps) what matters is who has residual rights of control: the authority to make decisions in situations the contract doesn’t cover.

This shifted the analytical center of gravity from payment schemes to ownership structures. If you can’t write a contract that rewards your supplier for every possible quality improvement, then maybe you should own the supplier’s factory outright. Hart’s theory provided the first rigorous answer to Ronald Coase’s famous question about why firms exist: firms are bundles of ownership rights, assembled to give the strongest incentives to the parties whose investments are hardest to protect contractually.

Hart and Holmström shared the 2016 Nobel Prize in Economics for their complementary contributions to contract theory: Holmström for the world of complete but imperfect contracts, Hart for the world where completeness itself is impossible.

X. “And what about the machines?”

Here is where the story arrives at the present moment.

The principal-agent problem has found perhaps its most consequential new application in AI alignment, the challenge of building artificial intelligence systems that actually do what their creators intend.

The structural parallels are uncanny. A human designer (principal) delegates tasks to an AI system (agent). The AI’s objective function may diverge from what the designer truly wants, not because the AI is malicious, but because specifying human values precisely in mathematical form is an incomplete contracting problem of staggering complexity. You cannot write a complete contract with a system operating across millions of unanticipated contexts.

Dylan Hadfield-Menell and others have argued that the entire history of principal-agent theory (the informativeness principle, multi-task distortions, the ratchet effect, Goodhart’s Law as a consequence of rewarding measurable proxies) maps directly onto the challenge of aligning AI behavior. An AI system optimized on a measurable proxy for human welfare will exploit that proxy in ways the designer never intended, precisely as Holmström and Milgrom’s multi-task model predicts. Goodhart’s Law, “when a measure becomes a target, it ceases to be a good measure,” is the multi-task principal-agent problem stated in a single sentence.

The twist is that unlike a human employee, an AI system can potentially be redesigned. The principal has partial coercive power over the agent’s preferences, a feature absent from classical theory. Whether this makes the problem easier or harder is an open question at the frontier of both economics and computer science.

The chain that holds

Step back and notice the architecture of this intellectual journey.

Smith asked: what happens when you hand your money to a stranger? Berle and Means documented that it had happened to the entire economy. Arrow and Akerlof showed that information asymmetry is the root cause. Ross and Jensen gave the problem its name and its cost structure. Holmström showed what optimal contracts look like, then showed why they fail when jobs have unmeasurable parts. Hart showed what to do when contracts themselves are incomplete. And now, a new generation is asking whether these same ideas, forged in debates about sharecropping and CEO pay, can help us align systems more powerful than any agent Adam Smith ever imagined.

Each answer generated a harder question. Each solution revealed a deeper problem. That is the mark of a truly generative idea: it does not settle things. It makes you realize how much more there is to settle.

The principal-agent problem is not solved. It may be unsolvable. But the 250-year effort to understand it has produced something more valuable than a solution: a language for thinking clearly about trust, delegation, and the irreducible cost of relying on others to act on your behalf.

And if you’ve ever wondered why your mechanic always finds something extra wrong with your car, now you know the name for it.

This essay traced the questions. The companion essay maps the answers: every weapon humanity has built to solve the trust problem, and why each one breaks in a new way.

Table of Contents