Economy of Minds : Emerging Multi-Agent Intelligence with Economic Interactions

Agents bid for control, exchange payments, and evolve through selection, producing decentralized credit assignment and collective intelligence without a central orchestrator.

Zhenting Qi1,*,†   Ao Qu2,*   Huangyuan Su1,4,*   Chenyu Wang1,*
Yu Yao2   Han Zheng2   Kushal Chattopadhyay1   Guowei Xu1   Zihan Wang3
Weirui Ye2   Vijay Janapa Reddi1   Ju Li2   Paul Pu Liang2
Himabindu Lakkaraju1   Sham Kakade1,4   Yilun Du1,4,†
1Harvard  ·  2MIT  ·  32077 AI   · 4Kempner Institute     *Core contributors   Corresponding authors
Paper (PDF) arXiv Code Cite
Evolution of an agent society over a stream of tasks
Evolution of an agent society over a stream of tasks. Each panel shows the population at a given stage: agents are continuously created, selected, connected, and eliminated. As the society encounters more tasks, ineffective agents are removed and corrected, while useful ones persist and diversify, leading to an alive and increasingly structured population.

How can agents self-orchestrate without central control? We study an agent economy inspired by Hayekian markets: agents bid for the right to act, exchange payments, and gain wealth from rewards. These signals create decentralized credit assignment and planning. Effective agents persist and mutate; ineffective ones go bankrupt and are replaced.

Starting from weak agents, the economy discovers multi-step reasoning strategies and outperforms stronger monolithic baselines across math, finance, science, accelerator design, and distributed-system optimization. We also connect local incentives to long-term collective performance.

Design the incentives, not the coordination.

We replace central orchestration with a price system. Each agent has a wake-up condition, fixed bid, and wealth balance. Auctions select actors, payments move credit backward, and rent/bankruptcy prune weak agents. Specialization and coordination emerge from these simple rules.

MATH
15.9 57.0
Llama-3.1-8B partial agents, pass@1 (%).
Finance
45.0 60.0
Finance-Agent-Bench, 30 training tasks (%).
Science
5.0 20.0
FrontierScience best-run accuracy (%).
Accelerator
80.2 39.3
Avg. EDP on ResNet-50 vs. DOSA (lower is better).
Cloudcast
930 657
Best data-transfer cost vs. OpenEvolve.

An economy of language agents

We model language agents as an economy. Each agent acts locally from its trigger and policy, while global coordination emerges from prices. The system has two loops: planning, which selects actions and assigns credit within an episode, and adaptation, which evolves the population across episodes.

3.1Problem setup

We model the task as a partially observed Markov decision process = (𝒮, 𝒜, P, r, γ, μ0), with state space 𝒮, action space 𝒜, transition kernel P(s' | s, a), reward r, discount γ, and initial-state distribution μ0. At step t, the system observes ot ∈ 𝒪.

All agents share a frozen LLM backbone; diversity comes from prompts. Each agent is a tuple

a = ( φa, πa, ba, Wa )

where φa : 𝒪 → {0, 1} is a triggering predicate for eligibility, πa : 𝒪 → Δ(𝒜) is its policy, ba ∈ ℝ≥0 is its bid, and Wa ∈ ℝ is its wealth. The trigger and policy use agent-specific prompts pa = (patrigpaact). At episode e, the active population is 𝒫e. This generalizes Baum's Hayek machine from hand-written rules to prompted LLM agents.

3.2Planning with auctions and transactions

At each step, agents compete for control. Given observation ot, each agent checks its trigger. The eligible set is

Et = { a ∈ 𝒫e : φa(ot) = 1 }

If Et = ∅, no agent acts. Otherwise, the highest-bidding eligible agent wins:

at ∈ argmax a ∈ Et ba

with random tie-breaking. Control goes to the highest bidder in the current context, without a central policy.

Auction mechanism diagram
Auctions. Agents whose wake-up conditions are satisfied become eligible to bid; the highest bidder wins the auction, executes the action, and advances the environment from st to st+1.

The winner at acts, producing ot+1 and reward rt. Let at-1 be the previous winner. We then apply a bucket-brigade transfer rule:

(1) WatWatbat + rt,    Wat-1Wat-1 + bat

The winner pays its bid to the previous actor and receives any reward rt. The first payment in an episode goes to the house.

Credit assignment via bucket-brigade transactions
Transactions. Credit assignment naturally emerges as profits flow backward through the action sequence, rewarding agents whose actions enable successful downstream outcomes.

This creates decentralized credit assignment. Agents profit by earning reward or by creating states that downstream agents value. Productive actions accumulate wealth; unproductive ones lose it.

3.3Adaptation with exploration and exploitation

Across episodes, economic selection evolves the population. A prompt generator 𝒢 proposes new agents by mutating successful prompts or amending failed ones. New agents start with wealth W0 ≥ 0; existing agents may pay rent ρ ≥ 0.

Exploitation.

Useful agents accumulate wealth, persist, and periodically spawn mutations. This reuses strong behaviors, refines them, and encourages specialization. Mutations preserve useful triggers or policies while adding small variations.

Exploration.

Weak or inactive agents lose wealth. Once bankrupt, they are removed and replaced by random or complementary variants. This turnover learns from failures, explores new behaviors, and reduces premature convergence.

Between episodes, the population update has three stages:

  1. Rent: each agent pays ρ, so WaWaρ;
  2. Removal: agents with Wa < 0 are deleted;
  3. Injection: new agents are added according to exploitation and exploration until the population satisfies the prescribed maximum-size constraints.

Bids are assigned at birth and then frozen. For a new agent a′, let t be its first eligible step, and let Ct = { a ∈ 𝒫e ∖ {a′} : φa(ot) = 1 } be competing eligible agents. Its bid follows the novice rule:

(2) ba′ = ( maxa ∈ Ct ba ) + εa′,    εa′ ∼ 𝒟ε

with max ∅ := 0 and 𝒟ε a small positive perturbation. The rule ensures a new eligible agent is tested once before market selection decides whether it survives.

Exploitation preserves useful behaviors; exploration adds novelty. Evolution is driven only by wealth gains and losses, with no central supervision or global labels.

3.4Training and evaluation

During optimization, agents bid for control, act, transfer wealth, and receive environmental rewards. These signals drive adaptation: bankrupt agents are removed, profitable ones persist, successful prompts mutate, and the population is replenished.

During evaluation, the population and bids are frozen. Payments, rewards, rent, births, and mutations are disabled, and each test task runs on a thread-local snapshot. Evaluation therefore measures the learned policy without further wealth dynamics.

Experiments

5.1Setup

Partial vs. complete agents.

A partial agent has limited tools, actions, context, or output budget. A complete agent has the full task interface. This tests whether economic organization can rival capability concentrated in one agent.

Baselines.

We compare against complete-agent baselines (ReAct, GEA, OpenEvolve), a partial-agent baseline (Multi-Agent Debate), and DOSA for accelerator design.

5.2Can economics turn weak individuals into stronger systems?

Across five domains, economic coordination turns partial agents into stronger systems. On MATH, EoM improves Llama-3.1-8B from 15.9% → 57.0% and Gemma-2-9B from 4.2% → 45.1%, beating complete-agent baselines. On accelerator design, EoM lowers average EDP to 39.3, versus 43.1 for complete ReAct and 80.2 for DOSA.

Task. Finance-Agent-Bench with four tools; each partial agent gets one.

Finance-Agent-Bench training curves
Finance-Agent-Bench. EoM rises from 45.0% at initialization to 60.0% after 30 training tasks, outperforming Multi-Agent Debate (50.0%), ReAct (45.0%), and GEA (50.0%) — even though each partial agent in EoM can access only one tool.

Task. FrontierScience-Research with literature, planner, executor, and verifier roles.

FrontierScience results
FrontierScience. EoM reaches 8.5% mean / 20.0% best-run accuracy on open-ended scientific questions, versus 1.8% mean / 5.0% best-run for GEA under the same Gemini-3-Flash backbone.

Task. Cloudcast from ADRS: iteratively improve a program to reduce data-transfer cost.

Cloudcast cost trajectories vs OpenEvolve
Cloudcast. EoM reaches an average total cost of 673 over three attempts, with the best attempt at 657 — versus 930 for OpenEvolve. A 28% reduction in best cost while using fewer optimization episodes.

The gain is not just “many agents.” Economic interactions let limited agents match or surpass stronger complete agents.

Task. MATH on an easy-to-hard stream, with planner/executor/verifier agents capped at ~128 output tokens.

Table 1.2 · MATH accuracy (%)
Backbone Partial agents Complete agent
Initial After training
Llama-3.1-8B 15.9 (1.37) 57.0 (3.36) 51.9*
Gemma-2-9B 4.2 (0.52) 45.1 (4.12) 44.3*

Task. Gemmini mapping search over 24 ResNet-50 kernels, minimizing EDP with Historian/Planner/Executor roles.

Table 1.2 · Accelerator design — Avg. EDP (μJ·Mcyc) ↓
Method Avg. EDP ↓
DOSA80.2
Complete agent (Gemma-4-31B-it)43.1
EoM (Gemma-4-31B-it)39.3

Table 1. Performance. Left: MATH accuracy (* official numbers). Right: accelerator-design average EDP (lower is better). EoM outperforms the corresponding baselines.

5.3Beyond multiple agents: the role of economic ingredients

The gains depend on economics, not just multiplicity: control allocation, value transfer, selection, and propagation all matter. Weakening them reduces performance.

On MATH, the original setting is strongest among constrained variants (43.9 mean / 57.0 best). On Finance-Agent-Bench, removing exploration, exploitation, or auctions all hurts. Cloudcast reinforces the point: EoM reaches 673 best cost, while best-of-N sampling reaches 999.

Table 2.1 · MATH ablations (%)
Table 2.2 · Finance-Agent-Bench ablations (%)

Table 2. Ablations. Economic parameters and component removals both affect performance; the full/original system is strongest overall.

5.4How does the economy improve performance?

What changes inside the society as performance improves?

EoM improves by reshaping both agents and population structure. On Finance-Agent-Bench, performance dips during exploration, then rises from 45.0 to 60.0 as control shifts toward stronger specialists.

Per-agent wealth trajectories on accelerator design tasks
Training dynamics in accelerator design. Per-agent wealth on three representative ResNet-50 kernels. Wealth flows to agents producing new EDP records; rent uniformly deducts wealth. Periodic births spawn good-birth children (★, exploitation: mutated from the richest agent) and bad-birth children (+, exploration: amended from the weakest); wealth < 0 triggers bankruptcy (×). Shaded bands are rolling ±1σ. (a) Both Historian descendants bankrupt — inherited bias fails market pressure. (b) A Planner lineage reproduces twice while a Historian bad-birth child eventually fails. (c) A strong Historian and a struggling Executor lineage co-exist.

Does the society learn reusable structure?

EoM achieves a 2.2× geometric-mean EDP gain over DOSA across 24 ResNet-50 kernels, with much larger gains on the hardest 1×1 bottlenecks. Without being given the output-stationary motif, the population repeatedly rediscovers it through EDP rewards.

Per-kernel EDP on ResNet-50
Per-kernel accelerator EDP. Best EDP found by DOSA, ReAct, and EoM on each of the 24 ResNet-50 convolution kernels (log scale; lower is better). Gains are structured rather than uniform: largest on the hardest 1×1 bottleneck convolutions.

5.5Robustness and generalization

Do learned behaviors transfer from easier tasks to harder ones?

On MATH, easy-to-hard training improves every difficulty band, including harder levels not seen early. Both backbones lift Level 5 from ~10% to ~20%, suggesting simple routines transfer to harder problems.

MATH performance across difficulty levels
Easy-to-hard generalization on MATH. Test accuracy across difficulty levels during training. The partial-agent population improves not only on easier levels seen earlier, but also on harder levels initially beyond its capability — behaviors learned on simple problems are reused on more difficult ones.

How sensitive is the society to curriculum order?

Both schedules improve early, but easy-to-hard stays ahead and finishes higher: ~57% versus ~47%. Partial specialists benefit from learning reusable routines before facing the hardest problems.

Curriculum-order comparison on MATH
Curriculum learning on MATH. Comparison between the default easy-to-hard curriculum and a reversed hard-to-easy schedule. The easy-to-hard ordering finishes at ~57%; the reversed ordering at ~47%.

Can a complete generalist monopolize the economy?

Adding an all-tool generalist does not collapse the society. It briefly expands, then shrinks back to one agent while specialists keep growing. The economy rewards local precision over broad but diluted capability.

Generalist vs specialists on Finance
Finance research with a generalist. A complete generalist with access to all tools does not automatically dominate. Specialized populations continue to grow because the market favors locally more precise agents.

Economic dynamics shape both agent policies and social structure. Coordination emerges from aligned local incentives.

From engineering coordination to designing incentives

Simple economic interactions among prompted LLM agents recover specialization, credit assignment, and cross-task transfer across five domains. Rather than centrally engineered pipelines, EoM points toward evolving agent societies shaped by their economies.

Limitation: adaptation is prompt-space only; parameter-space, hybrid, multimodal, and embodied extensions remain future work.

BibTeX

Coming soon.