Economy of Minds: Emerging Multi-Agent Intelligence with Economic Interactions

**Evolution of an agent society over a stream of tasks.** Each panel shows the population at a given stage: agents are continuously created, selected, connected, and eliminated. As the society encounters more tasks, ineffective agents are removed and corrected, while useful ones persist and diversify, leading to an alive and increasingly structured population.

Abstract

How can agents self-orchestrate without central control? We study an agent economy inspired by Hayekian markets: agents bid for the right to act, exchange payments, and gain wealth from rewards. These signals create decentralized credit assignment and planning. Effective agents persist and mutate; ineffective ones go bankrupt and are replaced.

Starting from weak agents, the economy discovers multi-step reasoning strategies and outperforms stronger monolithic baselines across math, finance, science, accelerator design, and distributed-system optimization. We also connect local incentives to long-term collective performance.

TL;DR

Design the incentives, not the coordination.

We replace central orchestration with a price system. Each agent has a wake-up condition, fixed bid, and wealth balance. Auctions select actors, payments move credit backward, and rent/bankruptcy prune weak agents. Specialization and coordination emerge from these simple rules.

MATH

15.9 → 57.0

Llama-3.1-8B partial agents, pass@1 (%).

Finance

45.0 → 60.0

Finance-Agent-Bench, 30 training tasks (%).

Science

5.0 → 20.0

FrontierScience best-run accuracy (%).

Accelerator

80.2 → 39.3

Avg. EDP on ResNet-50 vs. DOSA (lower is better).

Cloudcast

930 → 657

Best data-transfer cost vs. OpenEvolve.

Method

An economy of language agents

We model language agents as an economy. Each agent acts locally from its trigger and policy, while global coordination emerges from prices. The system has two loops: planning, which selects actions and assigns credit within an episode, and adaptation, which evolves the population across episodes.

3.1Problem setup

We model the task as a partially observed Markov decision process ℰ = (𝒮, 𝒜, P, r, γ, μ₀), with state space 𝒮, action space 𝒜, transition kernel P(s' | s, a), reward r, discount γ, and initial-state distribution μ₀. At step t, the system observes o_t ∈ 𝒪.

All agents share a frozen LLM backbone; diversity comes from prompts. Each agent is a tuple

a = ( φ_a, π_a, b_a, W_a )

where φ_a : 𝒪 → {0, 1} is a triggering predicate for eligibility, π_a : 𝒪 → Δ(𝒜) is its policy, b_a ∈ ℝ_≥0 is its bid, and W_a ∈ ℝ is its wealth. The trigger and policy use agent-specific prompts p_a = (p_a^trig, p_a^act). At episode e, the active population is 𝒫_e. This generalizes Baum's Hayek machine from hand-written rules to prompted LLM agents.

3.2Planning with auctions and transactions

At each step, agents compete for control. Given observation o_t, each agent checks its trigger. The eligible set is

E_t = { a ∈ 𝒫_e : φ_a(o_t) = 1 }

If E_t = ∅, no agent acts. Otherwise, the highest-bidding eligible agent wins:

a_t^★ ∈ argmax_{a ∈ E_t} b_a

with random tie-breaking. Control goes to the highest bidder in the current context, without a central policy.

Auction mechanism diagram — **Auctions.** Agents whose wake-up conditions are satisfied become eligible to bid; the highest bidder wins the auction, executes the action, and advances the environment from *s_t* to *s_t+1*.

The winner a_t^★ acts, producing o_t+1 and reward r_t. Let a_t-1^★ be the previous winner. We then apply a bucket-brigade transfer rule:

(1) W_{a_t^★} ← W_{a_t^★} − b_{a_t^★} + r_t, W_{a_t-1^★} ← W_{a_t-1^★} + b_{a_t^★}

The winner pays its bid to the previous actor and receives any reward r_t. The first payment in an episode goes to the house.

Credit assignment via bucket-brigade transactions — **Transactions.** Credit assignment naturally emerges as profits flow backward through the action sequence, rewarding agents whose actions enable successful downstream outcomes.

This creates decentralized credit assignment. Agents profit by earning reward or by creating states that downstream agents value. Productive actions accumulate wealth; unproductive ones lose it.

3.3Adaptation with exploration and exploitation

Across episodes, economic selection evolves the population. A prompt generator 𝒢 proposes new agents by mutating successful prompts or amending failed ones. New agents start with wealth W₀ ≥ 0; existing agents may pay rent ρ ≥ 0.

Exploitation.

Useful agents accumulate wealth, persist, and periodically spawn mutations. This reuses strong behaviors, refines them, and encourages specialization. Mutations preserve useful triggers or policies while adding small variations.

Exploration.

Weak or inactive agents lose wealth. Once bankrupt, they are removed and replaced by random or complementary variants. This turnover learns from failures, explores new behaviors, and reduces premature convergence.

Between episodes, the population update has three stages:

Rent: each agent pays ρ, so W_a ← W_a − ρ;
Removal: agents with W_a < 0 are deleted;
Injection: new agents are added according to exploitation and exploration until the population satisfies the prescribed maximum-size constraints.

Bids are assigned at birth and then frozen. For a new agent a′, let t be its first eligible step, and let C_t = { a ∈ 𝒫_e ∖ {a′} : φ_a(o_t) = 1 } be competing eligible agents. Its bid follows the novice rule:

(2) b_a′ = ( max_{a ∈ C_t} b_a ) + ε_a′, ε_a′ ∼ 𝒟_ε

with max ∅ := 0 and 𝒟_ε a small positive perturbation. The rule ensures a new eligible agent is tested once before market selection decides whether it survives.

Exploitation preserves useful behaviors; exploration adds novelty. Evolution is driven only by wealth gains and losses, with no central supervision or global labels.

3.4Training and evaluation

During optimization, agents bid for control, act, transfer wealth, and receive environmental rewards. These signals drive adaptation: bankrupt agents are removed, profitable ones persist, successful prompts mutate, and the population is replenished.

During evaluation, the population and bids are frozen. Payments, rewards, rent, births, and mutations are disabled, and each test task runs on a thread-local snapshot. Evaluation therefore measures the learned policy without further wealth dynamics.

Experiments

5.1Setup

Partial vs. complete agents.

A partial agent has limited tools, actions, context, or output budget. A complete agent has the full task interface. This tests whether economic organization can rival capability concentrated in one agent.

Baselines.

We compare against complete-agent baselines (ReAct, GEA, OpenEvolve), a partial-agent baseline (Multi-Agent Debate), and DOSA for accelerator design.

5.2Can economics turn weak individuals into stronger systems?

Across five domains, economic coordination turns partial agents into stronger systems. On MATH, EoM improves Llama-3.1-8B from 15.9% → 57.0% and Gemma-2-9B from 4.2% → 45.1%, beating complete-agent baselines. On accelerator design, EoM lowers average EDP to 39.3, versus 43.1 for complete ReAct and 80.2 for DOSA.

Task. Finance-Agent-Bench with four tools; each partial agent gets one.

Finance-Agent-Bench training curves — **Finance-Agent-Bench.** EoM rises from 45.0% at initialization to 60.0% after 30 training tasks, outperforming Multi-Agent Debate (50.0%), ReAct (45.0%), and GEA (50.0%) — even though each partial agent in EoM can access only one tool.

Task. FrontierScience-Research with literature, planner, executor, and verifier roles.

FrontierScience results — **FrontierScience.** EoM reaches 8.5% mean / 20.0% best-run accuracy on open-ended scientific questions, versus 1.8% mean / 5.0% best-run for GEA under the same Gemini-3-Flash backbone.

Task. Cloudcast from ADRS: iteratively improve a program to reduce data-transfer cost.

Cloudcast cost trajectories vs OpenEvolve — **Cloudcast.** EoM reaches an average total cost of 673 over three attempts, with the best attempt at 657 — versus 930 for OpenEvolve. A 28% reduction in best cost while using fewer optimization episodes.

The gain is not just “many agents.” Economic interactions let limited agents match or surpass stronger complete agents.

Task. MATH on an easy-to-hard stream, with planner/executor/verifier agents capped at ~128 output tokens.

Table 1.2 · MATH accuracy (%)

Backbone	Partial agents		Complete agent
Backbone	Initial	After training	Complete agent
Llama-3.1-8B	15.9 (1.37)	57.0 (3.36)	51.9*
Gemma-2-9B	4.2 (0.52)	45.1 (4.12)	44.3*

Task. Gemmini mapping search over 24 ResNet-50 kernels, minimizing EDP with Historian/Planner/Executor roles.

Table 1.2 · Accelerator design — Avg. EDP (μJ·Mcyc) ↓

Method	Avg. EDP ↓
DOSA	80.2
Complete agent (Gemma-4-31B-it)	43.1
EoM (Gemma-4-31B-it)	39.3

Table 1. Performance. Left: MATH accuracy (* official numbers). Right: accelerator-design average EDP (lower is better). EoM outperforms the corresponding baselines.

5.3Beyond multiple agents: the role of economic ingredients

The gains depend on economics, not just multiplicity: control allocation, value transfer, selection, and propagation all matter. Weakening them reduces performance.

On MATH, the original setting is strongest among constrained variants (43.9 mean / 57.0 best). On Finance-Agent-Bench, removing exploration, exploitation, or auctions all hurts. Cloudcast reinforces the point: EoM reaches 673 best cost, while best-of-N sampling reaches 999.

Table 2.1 · MATH ablations (%)

Complete

Mean

51.9

Best

51.9

Constrained

large rent (×10)

Mean

41.8

Best

47.0

Constrained

small reward (×0.2)

Mean

39.0

Best

44.0

Constrained

large reward (×4)

Mean

40.9

Best

47.0

Constrained
original
Mean43.9
Best57.0

Table 2.2 · Finance-Agent-Bench ablations (%)

Complete

Mean

45.0

Best

45.0

Constrained

w/o auction

Mean

48.0

Best

58.5

Constrained

w/o exploration

Mean

26.0

Best

40.0

Constrained

w/o exploitation

Mean

33.5

Best

60.0

Constrained
full
Mean52.5
Best65.0

Table 2. Ablations. Economic parameters and component removals both affect performance; the full/original system is strongest overall.

5.4How does the economy improve performance?

What changes inside the society as performance improves?

EoM improves by reshaping both agents and population structure. On Finance-Agent-Bench, performance dips during exploration, then rises from 45.0 to 60.0 as control shifts toward stronger specialists.

Per-agent wealth trajectories on accelerator design tasks — **Training dynamics in accelerator design.** Per-agent wealth on three representative ResNet-50 kernels. Wealth flows to agents producing new EDP records; rent uniformly deducts wealth. Periodic births spawn *good-birth* children (★, exploitation: mutated from the richest agent) and *bad-birth* children (+, exploration: amended from the weakest); wealth < 0 triggers bankruptcy (×). Shaded bands are rolling ±1σ. **(a)** Both Historian descendants bankrupt — inherited bias fails market pressure. **(b)** A Planner lineage reproduces twice while a Historian bad-birth child eventually fails. **(c)** A strong Historian and a struggling Executor lineage co-exist.

Does the society learn reusable structure?

EoM achieves a 2.2× geometric-mean EDP gain over DOSA across 24 ResNet-50 kernels, with much larger gains on the hardest 1×1 bottlenecks. Without being given the output-stationary motif, the population repeatedly rediscovers it through EDP rewards.

Per-kernel EDP on ResNet-50 — **Per-kernel accelerator EDP.** Best EDP found by DOSA, ReAct, and EoM on each of the 24 ResNet-50 convolution kernels (log scale; lower is better). Gains are structured rather than uniform: largest on the hardest 1×1 bottleneck convolutions.

5.5Robustness and generalization

Do learned behaviors transfer from easier tasks to harder ones?

On MATH, easy-to-hard training improves every difficulty band, including harder levels not seen early. Both backbones lift Level 5 from ~10% to ~20%, suggesting simple routines transfer to harder problems.

MATH performance across difficulty levels — **Easy-to-hard generalization on MATH.** Test accuracy across difficulty levels during training. The partial-agent population improves not only on easier levels seen earlier, but also on harder levels initially beyond its capability — behaviors learned on simple problems are reused on more difficult ones.

How sensitive is the society to curriculum order?

Both schedules improve early, but easy-to-hard stays ahead and finishes higher: ~57% versus ~47%. Partial specialists benefit from learning reusable routines before facing the hardest problems.

Curriculum-order comparison on MATH — **Curriculum learning on MATH.** Comparison between the default easy-to-hard curriculum and a reversed hard-to-easy schedule. The easy-to-hard ordering finishes at ~57%; the reversed ordering at ~47%.

Can a complete generalist monopolize the economy?

Adding an all-tool generalist does not collapse the society. It briefly expands, then shrinks back to one agent while specialists keep growing. The economy rewards local precision over broad but diluted capability.

Generalist vs specialists on Finance — **Finance research with a generalist.** A complete generalist with access to all tools does not automatically dominate. Specialized populations continue to grow because the market favors locally more precise agents.

Economic dynamics shape both agent policies and social structure. Coordination emerges from aligned local incentives.

Takeaway

From engineering coordination to designing incentives

Simple economic interactions among prompted LLM agents recover specialization, credit assignment, and cross-task transfer across five domains. Rather than centrally engineered pipelines, EoM points toward evolving agent societies shaped by their economies.

Limitation: adaptation is prompt-space only; parameter-space, hybrid, multimodal, and embodied extensions remain future work.

Cite

BibTeX

Coming soon.