Economy of Minds: Emerging Multi-Agent Intelligence with Economic Interactions

A population of agents compete via auctions for the right to act, exchange payments peer-to-peer, and evolve through economic selection, yielding decentralized credit assignment and self-organizing collective intelligence without a central orchestrator.

Paper (PDF) arXiv Code Cite
TL;DR

Let a market organize and evolve the agents

Imagine a world with many intelligent agents. Each agent may perform well on certain tasks but remains fundamentally limited: each operates with its own priors, partial observations, and bounded computations.

When faced with more complex tasks that exceed individual capabilities, no single agent can reliably solve problems from start to finish. How, then, can such a population collectively solve these tasks?

The usual fix is a central controller that creates the agents, hands out roles, and routes every decision. But this hurts in two ways. Planning is forced through one gate, so the controller is both a bottleneck and a single point of failure. Moreover, learning and adaptation get harder as the system grows, since the controller must reason about an ever-larger set of agents.

Markets in human society suggest another way. The economist Friedrich Hayek pointed out that no central planner can gather all the knowledge spread across a society. Instead, prices let people coordinate on their own, using only what each can see locally.

To the naive mind that can conceive of order only as the product of deliberate arrangement, it may seem absurd that in complex conditions order, and adaptation to the unknown, can be achieved more effectively by decentralizing decisions and that a division of authority will actually extend the possibility of overall order. Yet that decentralization actually leads to more information being taken into account. Friedrich Hayek, The Fatal Conceit

We give AI agents the same kind of signal. We call it Economy of Minds (EoM), and two simple processes run the society. Planning happens within a task: agents bid in an auction for the right to act, and the winner pays the agent that acted before it. Because a good early move earns the payments that flow back from later winners, this payment chain performs credit assignment on its own, with no central judge. Adaptation happens across tasks: useful agents grow rich while useless ones go broke. Rich agents are copied and tweaked (exploitation), and bankrupt ones are replaced by new variants (exploration). Each agent only decides when to wake up and what to do; there is no manager, no fixed workflow, and no messaging protocol.

Starting from weak agents, the society organizes itself and keeps improving. It matches or beats far stronger single-agent systems on five tasks: math, financial research, scientific research, accelerator design, and distributed-system optimization. Specialization, teamwork, and skill transfer all emerge on their own. The lesson: design the incentives, and the coordination takes care of itself.

Figure 1.1 · Evolution of an agent society over a stream of Financial Research tasks. Five agents are initially created: Edgar, Tavily, ParseHTML, RetrieveInfo, and Answer, each with a different tool. Agents are continuously spawned and eliminated through economy. As the society encounters more tasks, poor agents are removed and corrected while wealthy ones persist and diversify, yielding an alive and increasingly structured population.
Figure 1.2 · The agent society gradually learns to coordinate to solve math problems. One held-out math problem, attempted by the population at six training checkpoints (left → right). Early chains are long and disorganized, with agents sometimes bouncing around and still missing the answer; over training the roles specialize and the team solves the same problem in fewer, better-coordinated steps. Click any checkpoint to read its full transcript.
Live Demo

Watch the economy evolve

An interactive replay of a real CloudCast run: a population of partial agents bid in auctions, win the right to edit a multi-cloud broadcast program, and are mutated or bankrupted across 30 episodes. Press play, or use the ‹ › controls to step through each beat — bid → win → pay — and watch credit flow backward between agents. Scrub the timeline, click any agent for its prompt, economics, and lineage, or open current program & topology to see what each accepted mutation changed.

The task. CloudCast asks the society to evolve a single Python file, initial_program.py, that routes a multi-cloud broadcast: given source and destination regions and a network graph of link costs and throughputs, the program must produce a delivery topology. A simulator scores each program by its total egress cost across five inter- and intra-cloud scenarios, relative to a single-path Dijkstra seed (~$1035); the reward is the fractional cost reduction, max(0, 1 − cost/1035). The workspace persists across episodes, so each episode keeps editing the previous program rather than starting over. Six roles divide the labor — a Reader inspects files, a Planner proposes the next sub-goal, an Implementer edits the code, a Builder runs an import/build check, an Evaluator calls the scorer, and a Finalizer finalizes the program — but nothing fixes which roles act, or in what order: the auction decides at every step, so the workflow is whatever shape the current code calls for.

CloudCast · agent society
Method

An economy of language agents

We model a society of language agents interacting through an economic mechanism. Each agent acts locally, deciding only from its own triggering condition and policy, while global coordination emerges from economic interactions. The system has two coupled processes, and the rest of this section walks through them in turn.

1Planning

Within an episode: agents bid in auctions for the right to act, and transactions pass value backward along the trajectory, assigning credit with no central controller.

2Adaptation

Across episodes: the population evolves by economic selection, where exploration replaces bankrupt agents and exploitation mutates wealthy ones.

Setup. We consider a task environment modeled as a partially observed Markov decision process = (𝒮, 𝒜, P, r, γ, μ0); at step t the system observes ot ∈ 𝒪. Each agent is a language model parameterized by θ; in our simplified setting a shared frozen backbone serves all agents, with diversity arising entirely through system prompts. An agent is a tuple

a = ( φa, πa, ba, Wa )

where φa : 𝒪 → {0, 1} is a triggering predicate (eligibility), πa : 𝒪 → Δ(𝒜) its action policy, ba a fixed bid, and Wa its current wealth. Both φa and πa are instantiated by the same frozen LLM with agent-specific prompts. This generalizes Baum's Hayek machine from hand-specified condition–action rules to prompted LLM agents.

1 · Planning with auctions and transactions

Within each episode, control is allocated and value is redistributed entirely through local market interactions: an auction decides who acts, and a transaction passes value backward along the trajectory.

1.1 Auctions

At each step, every agent whose wake-up condition fires becomes eligible, and the highest fixed bidder wins the right to act (ties broken randomly). The auction is a decentralized action-selection rule: control goes to whichever agent values acting most in the current context.

1.2 Transactions

After acting, the winner pays its bid to the previous actor and collects any environment reward, following a bucket-brigade rule. Value flows backward along successful trajectories, yielding decentralized credit assignment with no central evaluator.

At each environment step, agents compete for control through an auction (Figure 2). Given observation ot, each agent evaluates its triggering predicate; the eligible set is Et = { a : φa(ot) = 1 }. The winner is the highest-bidding eligible agent, at ∈ argmax ba (ties broken randomly). The auction is a decentralized action-selection mechanism: control goes to whoever bids highest in the current context, and the winner samples an action that advances the environment.

Auction mechanism
Figure 2 · Auctions. Agents whose wake-up conditions are satisfied become eligible to bid; the highest bidder wins, executes the action, and advances the environment from st to st+1.

The winning action produces the next observation ot+1 and reward rt. With at−1 the previous winner, we apply a bucket-brigade transfer rule (Figure 3):

(1) WatWatbat + rt ,    Wat−1Wat−1 + bat

The winner pays its bid to the previously active agent while collecting environmental reward; the first winner pays the house. This yields a decentralized form of credit assignment: an agent profits not only by receiving reward directly, but by steering the system into states for which downstream agents pay highly. Value flows backward along successful trajectories: agents enabling productive continuations accumulate wealth, while agents leading into dead ends lose it.

Credit assignment via bucket-brigade transactions
Figure 3 · Transactions. Credit assignment emerges naturally as profit flows backward through the action sequence, rewarding agents whose actions enable successful downstream outcomes.

2 · Adaptation with exploration and exploitation

Across episodes, the population evolves through economic selection. A prompt-generation operator 𝒢 proposes new prompts, either amending a failed agent or mutating a successful one. New agents start with wealth W0 ≥ 0; existing agents pay periodic rent ρ.

2.1 Exploration

Agents lose wealth through unhelpful actions or prolonged inactivity; once wealth goes negative they go bankrupt, are removed, and are replaced by complementary variations of bankrupt agents, which lets the system learn from failures, discover new behaviors, and avoid premature convergence.

2.2 Exploitation

Wealthy agents persist and are periodically selected as parents and mutated, reusing and refining successful patterns, biasing the population toward high-performing behaviors and promoting specialization.

Concretely, between episodes the population update has three stages:

  1. Rent: each agent pays ρ, so WaWaρ;
  2. Removal: agents with Wa < 0 are deleted;
  3. Injection: new agents are added via exploration and exploitation up to the size constraints.

Bids are not learned online: each agent receives a frozen bid when introduced. A newly injected novice a′ gets a bid just above the highest competing eligible bid plus a small positive perturbation:

(2) ba′ = ( maxa ∈ Ct ba ) + εa′ ,    εa′ ∼ 𝒟ε

This guarantees the new agent wins its first eligible auction, forcing the system to test it at least once before market selection decides whether it survives. Evolution is governed entirely by economic signals (wealth gain and loss), with no centralized supervision or global performance labels.

Experiments

Putting the economy to the test

Partial vs. complete agents.

A partial agent is intentionally incomplete: a restricted action space, access to one tool, a short generation budget, a specialized role, or partial observation. A complete agent has the full task interface and attempts the task end-to-end. This lets us test whether economic organization can compensate for, or even outperform, capability concentrated in a single complete agent.

Tasks & baselines.

We instantiate EoM on five domains: math (MATH, easy-to-hard Levels 1–5, planner/executor/verifier capped at ~128 tokens), finance (Finance-Agent-Bench, four tools with each partial agent holding only one), science (FrontierScience-Research, literature/planner/executor/verifier roles), accelerator design (Gemmini suite, 24 ResNet-50 kernels, EDP minimization with Historian/Planner/Executor roles), and distributed systems (CloudCast from ADRS, iteratively improving a program to minimize transfer cost). Baselines include ReAct, GEA, OpenEvolve, Multi-Agent Debate, and the domain-specific DOSA.


Q1. Can economics turn weak individuals into stronger systems?

Across all five domains, economic coordination turns individually partial agents into collective systems that match or surpass complete-agent baselines. On MATH, EoM lifts Llama-3.1-8B from 15.9% to 57.0% and Gemma-2-9B from 4.2% to 45.1%, exceeding the complete agents (51.9% and 44.3%), even though each individual is role-specialized and capped at short outputs. On accelerator design, EoM reduces average EDP to 39.3, versus 43.1 for the same-backbone complete ReAct agent and 80.2 for the domain-specific DOSA baseline (Table 1). The same advantage holds on financial research, scientific research, and distributed-system optimization (Figure 4): EoM reaches 60.0% on Finance-Agent-Bench, 20.0% best-run accuracy on FrontierScience, and a best CloudCast cost of 657.

Table 1 (left) · MATH accuracy (%)
BackbonePartial agentsComplete
InitialTrained
Llama-3.1-8B15.957.051.9*
Gemma-2-9B4.245.144.3*
Table 1 (right) · Accelerator design, Avg. EDP ↓
MethodAvg. EDP ↓
DOSA80.2
Complete agent (Gemma-4-31B-it)43.1
EoM (Gemma-4-31B-it)39.3

Table 1. Performance across domains. Left: MATH accuracy comparing constrained populations to complete agents (* officially reported numbers). Right: accelerator-design results by average EDP (lower is better). EoM beats the corresponding baselines in both settings.

Finance-Agent-Bench training curves
(a) Finance. EoM rises from 45.0% to 60.0% over 30 tasks, beating Multi-Agent Debate (50.0%), ReAct (45.0%), and GEA (50.0%), even though each agent holds only one tool.
FrontierScience results
(b) FrontierScience. 8.5% mean / 20.0% best-run accuracy vs. 1.8% / 5.0% for GEA on the same Gemini-3-Flash backbone.
CloudCast cost vs OpenEvolve
(c) CloudCast. Average cost 673 over three attempts (best 657) vs. 930 for OpenEvolve, a 29% reduction in best cost with fewer episodes.

Figure 4 · Performance across domains. EoM consistently outperforms baselines, showing the benefit of economic coordination among partial agents.

The advantage is not merely “many agents” over “one agent.” A population of partial agents organized by economic interactions can match or surpass complete agents that have greater individual access to the task interface.


Q2. Beyond multiple agents, what is the role of the economic ingredients?

The gains depend on the economic dynamics that allocate control, transfer value, remove the unproductive, and propagate the successful, not on merely having multiple agents. Weakening these dynamics consistently reduces performance (Table 2). On MATH, the original system is strongest (43.9 mean / 57.0 best); changing rent or rewards lowers it. On Finance, removing exploration causes a large drop (26.0 mean / 40.0 best) and removing exploitation lowers the mean to 33.5. On CloudCast, a best-of-N multi-agent baseline reaches only 999, so repeated multi-agent sampling alone is insufficient.

Table 2 (left) · MATH ablations (%)
ConfigurationMeanBest
Complete/51.951.9
Constrainedlarge rent (×10)41.847.0
small reward (×0.2)39.044.0
large reward (×4)40.947.0
original43.957.0
Table 2 (right) · Finance-Agent-Bench ablations (%)
ConfigurationMeanBest
Complete/45.045.0
Constrainedw/o auction48.058.5
w/o exploration26.040.0
w/o exploitation33.560.0
full52.565.0

Table 2. Ablations on MATH (left) and Finance (right). Sensitivity to economic parameters and component removal. In both cases, the full / original system achieves the strongest overall results.


Q3. How does the economy improve performance?

What changes inside the society?

EoM improves by reshaping both the population and the agents themselves. On Finance, the trajectory is non-monotonic but improving (45.0 → dip during exploration → 60.0), like a market that first reallocates control and tests alternative specialists before converging. Accelerator design gives a direct view of the mechanism (Figure 5).

Figure 5 · Training dynamics in accelerator design. Per-agent wealth on three ResNet-50 kernels. Wealth flows to agents producing new EDP records; rent uniformly deducts wealth. Periodic births spawn good-birth children (★, exploitation: mutated from the richest) and bad-birth children (+, exploration: amended from the weakest); wealth < 0 triggers bankruptcy (×). (a) Both Historian descendants bankrupt. (b) A Planner lineage reproduces twice while a Historian bad-birth child fails. (c) A strong Historian and a struggling Executor lineage co-exist.

Does the society learn reusable structure?

EoM achieves a 2.2× geometric-mean EDP gain over DOSA across all 24 ResNet-50 kernels, with much larger gains on the hardest: 37.5×, 26.3×, 17.3×, and 12.0× on Conv 14, 16, 17, and 4 (Figure 7a). These are the 1×1 convolutions in ResNet-50's bottleneck blocks, for which an output-stationary dataflow is a known effective pattern. EoM is never given this motif (the auction rewards only EDP record-breaks), yet the population repeatedly converges on the same tiling, recovering a transferable hardware/software co-design heuristic that DOSA misses.

Per-kernel EDP on ResNet-50
Figure 7a · Per-kernel accelerator EDP. Best EDP found by DOSA, ReAct, and EoM on each of the 24 ResNet-50 kernels (log scale; lower is better). Gains are structured rather than uniform, largest on the hardest 1×1 bottleneck convolutions.

Q4. Is the evolving society robust, and does it generalize?

Do behaviors transfer from easy to hard?

Training on an easy-to-hard MATH stream improves not only the easier levels seen earlier but also harder levels initially beyond reach (Figure 6). Both backbones improve on every band, and Level 5 rises from ~10% to ~20%, so local reasoning routines learned on simple problems are recomposed on harder ones.

MATH performance across difficulty levels
Figure 6 · Easy-to-hard generalization on MATH. Test accuracy across difficulty levels during training. Gains extend to levels initially beyond the population's capability.

How sensitive is it to curriculum order?

Comparing the default easy-to-hard curriculum with a reversed hard-to-easy schedule, easy-to-hard stays ahead for most of training and finishes clearly higher (~57% vs. ~47%; Figure 7b). Partial specialists benefit from first mastering reusable local routines before confronting the hardest problems, though the system still improves under the reversed order.

Can a complete generalist monopolize the economy?

Adding a generalist with access to all tools alongside the partial specialists does not collapse the society. The generalist briefly expands around tasks 11–12, then contracts back to a single agent, while specialist populations (Edgar, Tavily) keep growing to ~5–8 agents (Figure 7c). The economy rewards local value: a specialist tuned to a narrow subproblem outcompetes a generalist whose prompt budget is spread thin.

Curriculum-order comparison on MATH
Figure 7b · Curriculum order on MATH. Easy-to-hard finishes at ~57%; the reversed hard-to-easy schedule plateaus around ~47%. Mastering reusable local routines first helps.
Generalist vs specialists on Finance
Figure 7c · Finance research with a generalist. A complete generalist with access to all tools does not automatically dominate; specialized populations continue to grow because the market favors locally more precise agents.
Citation

Cite this work

@misc{qi2026economymindsemergingmultiagent, title={Economy of Minds: Emerging Multi-Agent Intelligence with Economic Interactions}, author={Zhenting Qi and Huangyuan Su and Ao Qu and Chenyu Wang and Yu Yao and Han Zheng and Kushal Chattopadhyay and Guowei Xu and Zihan Wang and Weirui Ye and Vijay Janapa Reddi and Ju Li and Paul Pu Liang and Himabindu Lakkaraju and Sham Kakade and Yilun Du}, year={2026}, eprint={2606.02859}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2606.02859}, }