What happens to the world when you change the rules?
FlowWorld is a research grade simulator of around fifteen hundred real cities. It has a vectorized
three sector economy, an autonomous infrastructure planner that decides where to build rail and
shipping links on its own, and a machine learning optimizer that hunts for the best mix of policy
levers under randomized shocks. We ran eight contrasting policy scenarios for two hundred timesteps
each, then ran one shock augmented optimizer search across forty eight trials. This document is the
writeup of what the model actually said.
8scenarios analyzed
200timesteps each
320cities × 1,800 edges
3.41×best GDP found
The question this project exists to answer
“If you change borders, transit links, housing rules, and infrastructure spending, what happens to
where people live, the global economy, and which cities boom or collapse over time?”
FlowWorld is not trying to predict reality. It is, very explicitly, a policy laboratory.
The point is to ask “what direction does the world move when I twist this knob?” and get a clean,
reproducible, falsifiable answer in under a minute. Same seed, same world. Same parameters, same
trajectory. Every screenshot in this report can be regenerated bit for bit by anyone who runs the
same script, and every claim in the findings can be argued with on the basis of actual numbers.
That matters because most “what if we opened borders” conversations are run as thought experiments
where the person making the argument never has to commit to a specific prediction. A simulation
forces commitment. You set the friction parameter to 0.05, you press play, and the model gives you
an actual number for global GDP, an actual list of cities that grew, and an actual list of cities
that collapsed. You can disagree with the model on its assumptions. You cannot escape it by being
vague.
The scenarios in this report were not cherry picked to produce nice narratives. The eight presets
were taken straight from the project's scenario library, chosen to span the design space (borders,
infrastructure, housing, migration, prices). The numbers came out the way they came out, including
the ones that were the opposite of what we expected. Especially those.
How this project actually started
Before any of the simulator was written, somebody sat down and wrote out exactly what they wanted to
be able to ask of it. The four original questions are still visible in the codebase today, because
every feature ended up being a tool built to answer one of them.
The original mission, in one sentence
Build a research grade simulator that can honestly answer what the world looks like under different
border policies, where people end up moving when cities run out of room, how resources and trade
routes shape the economy over time, and which infrastructure decisions actually pay off under a
real budget constraint.
That sentence is the reason every other piece of the project exists. The migration model is there
because of borders and crowding. The three sector economy is there because of resources and trade.
The autonomous infrastructure planner with its budget cap is there because of the infrastructure
question. When you look at a feature in the simulator and wonder why it is there, the answer is
almost always “because somebody needed to be able to answer one of the four original questions.”
The four research questions, in plain English
The original framing broke that mission into four specific questions. They are worth stating
individually because each one is deliberately structured to admit a tradeoff rather than to have a
single right answer:
Migration dynamics. How strongly do wages, housing costs, amenities, and connectivity actually drive where people move? And at what point do crowding and rent in the destination start to self-limit how many more people will keep coming?
Infrastructure policy. Which kinds of infrastructure (rail, pipeline, shipping, road) produce the highest long run GDP gain per dollar spent? Under a real budget, when should you build now versus wait?
Resource economy. How do extraction, depletion, logistics, and market pricing interact in practice? Does better connectivity make resource prices more competitive, or does it just shift who captures the profit?
Stability and equity. Can a policy raise GDP while also controlling overcrowding, urban concentration, and economic volatility? Which policy mixes hold up best when something unexpected hits (a resource crunch, a broken trade route, a wave of migration)?
Notice that each question has two halves that pull against each other. Growth versus crowding.
Investment now versus investment later. Prices versus rents. Aggregate output versus distribution.
The project assumed from day one that good policy is a negotiation, not a maximum.
The honesty pledge
The third commitment the project made to itself was a hard line against fantasy modeling. The
statement, more or less verbatim:
100 percent real-world accuracy is not achievable in complex social and economic systems. The goal
is high fidelity calibrated modeling with quantified uncertainty, not perfect prediction.
That sentence is doing a lot of work. It is the difference between an honest simulator and a crystal
ball pretending to be one. A lot of policy modeling fails because the modeler quietly forgets that
their assumptions are assumptions. The project committed up front to treating the simulator as a
calibrated estimator rather than a forecast. You can see that commitment in the user interface
today: every run displays a confidence level (the trust stoplight in the top bar) and the right
panel has a “factor coverage disclosure” section that lists every economic factor the model knows
about along with how complete the underlying data actually is. The simulator tells you when it is
guessing.
The eight non-negotiable design principles
Eight principles were written down as non-negotiable. They are not features. They are constraints
on what counts as a valid version of the project. Each one shows up later as something concrete you
can actually see:
Deterministic replay. Every run can be reproduced bit for bit by anyone with the same seed. (This is why every result in this report uses seed 42.)
Real budget constraints on infrastructure. The autonomous planner cannot build forever. It runs out of money and stops.
Explicit affordability and capacity limits. Housing costs respond to density. People cannot ignore the cost of where they choose to live.
Real resource stocks that get depleted. A mineral rich city cannot pump forever. Reserves draw down over time.
Real supply and demand pricing. Three economic sectors (resources, goods, IP) each have their own price index that affects both GDP and the migration choice.
Full causality logging. Every timestep produces a written record of what happened and why, in plain text. Any decision the simulator made can be reviewed after the fact.
Scenario testing under shocks. A library of 24 contrasting scenarios is built in, and the optimizer evaluates candidates under randomized shocks. Policy that breaks under stress gets penalized.
Explainability. The optimizer does not just output a winning parameter set. It outputs the learning curve, the spread of candidates, and the score breakdown. You can argue with its answer.
How the project actually got built
The project came together in three rough phases. Each one had a different goal and produced a
different kind of work. None of them are visible to a user dropping in today, but the order they
happened in is the reason the simulator behaves the way it does.
First: set the rules of the game
Before any of the simulation code got written, the project spent time writing down what would count
as a finished feature, what would count as a valid experiment, and how decisions would get reviewed.
A governance charter. A quality checklist. A template for documenting every assumption. A schema for
how to describe an experiment so that someone else could reproduce it later.
This sounds boring. It is the opposite of fun engineering work. But it is the reason the rest of
the project did not turn into a research toy that nobody could trust. When the simulator later
started producing data, every result had a place to live, a quality bar it had to pass, and a
paper trail. Most simulation projects skip this step and pay for it the first time someone asks
“wait, is that number real or is it from the early prototype?” This one paid up front instead.
Second: build a data layer that does not lie
The middle phase was about making sure the cities in the simulator were actually cities, not made
up dots on a map. The simulator can fall back to synthetic placements if real data is unavailable,
but the real version uses the GeoNames worldcities dataset, which contains real coordinates,
populations, and country assignments for hundreds of thousands of places.
Getting that data into the simulator was not as simple as importing a CSV. The project built quality
protocols for ingesting city data, a license and jurisdiction matrix so each data source's legal
boundaries were documented, validation rules that flagged missing or anomalous entries, and provenance
audits that traced each field in each city back to its source. Once those were running, the project
could honestly say which cities in the simulation were real and which were filler, and could report
a per city confidence level. (This shows up in the user interface as the “city accuracy” field
visible in the place inspector.)
By the end of this phase, the simulator had a real data layer with documented provenance and a
quality gate that any new dataset had to pass before it could be used in a run.
Third: turn the research tool into something that looks alive
The most ambitious phase was about making the simulator feel like something you actually want to
watch. The visual aspiration was explicit: it should feel like watching the popular city building
game Cities Skylines, but with real planetary data and a real economic engine underneath. That meant
a lot of work on the frontend (the deck.gl rendering layer, the timeline scrubber, the camera, the
flow animations) and a lot of work on the simulation output (per timestep flow records, per leg
throughput data, named bottlenecks, replay bundles).
The same phase also expanded the economy. The original simulator had three abstract sectors
(resources, goods, IP). The expanded version turns those into a real supply chain twin with named
commodities (oil, raw materials, food and agriculture, manufactured goods, technology and IP),
modal transport (rail, pipeline, shipping, road, air), corridor level throughput tracking, and
automated detection of where bottlenecks form. The “commodity master list” you see in the right
panel of the UI is the user facing surface of that work.
Current state
The implementation is done, the runs work, the optimizer works, and the visualization is fully
operational. What remains is formal research signoff: five research gates covering city schema
accuracy, resource accuracy, connectivity, supply chain economics, and visualization integrity
are drafted but not yet signed off, mostly because nobody had produced an evidence bundle that
answered them in one place. This report is essentially that evidence bundle.
Architecture, briefly
Two services run locally. A Python backend on port 8010 does all the math. A web frontend on port
5173 draws the map and the dashboards. They talk over WebSocket so the simulation can stream
timestep updates to the browser in real time, up to about thirty steps per second.
Backend
Python with FastAPI for the API and NumPy for the math. The simulation engine does every per
city calculation as a parallel array operation rather than a Python loop. (Practically, that is
the difference between a run that takes ten minutes and a run that takes ten seconds.) A separate
optimizer module wraps the simulator with a cross entropy search loop. World generation can read
real city data from GeoNames or fall back to synthetic placements if no dataset is installed.
Every run gets persisted in three formats: structured data for replay, raw rows for analysis,
and a human readable play by play log.
React 18 with TypeScript, served by Vite. The map is built with deck.gl on top of a MapLibre GL
basemap (the same technology Mapbox uses, but free and open source). Each timestep arrives over
WebSocket as a JSON message containing per city state, top migration arcs, and global metrics.
Each is rendered as a separate visual layer that the user can toggle independently.
A from scratch cross entropy parameter search over twelve economy levers. Each candidate parameter
set runs a full simulation, augmented with three randomized shocks (resource crunch, infrastructure
failure, migration spike), and gets scored on GDP, volatility, concentration, overpopulation, and
infrastructure spend overage. The top thirty percent of candidates per iteration become the
“elites” and get used to refit the sampling distribution for the next round.
POST /api/optimize
Data layer
Storage follows a standard data lakehouse layout, splitting raw inputs from curated tables from
ready to use features. Every run produces a timestamped summary, a replay bundle, and a decision
audit trail. Twenty four scenario presets ship with the project. Every result in this report can
be regenerated from those presets and a fixed seed.
What the interface looks like
Three panels. The map fills the middle. The left sidebar holds the scenario library, twelve economy
sliders, and all the layer toggles. The right panel shows the inspector (with five tabs in Analyst
mode: Node, Leg, Chain, Policy, Trust) and the live analytics dashboards.
The default view, about 170 timesteps in. Cyan and blue arcs are the top migration flows for this frame. The bright yellow arc near East Asia is a freshly built policy bridge (rail) that the autonomous planner constructed. The right panel shows the largest active corridors sorted by throughput, with Tianjin to Beijing leading at roughly 211,690 tons per day.
Money mode, revenue overlay. Each trade corridor is recolored by whether moving goods across it is profitable. Green means the throughput times the unit price exceeds the transport cost. Amber means net loss. The line width still encodes throughput. This view is the fastest way to see which trade routes are economically alive and which are just structurally present.The population heatmap layer. Hotspots are East Asia, South Asia, West Africa, and Western Europe. Even at a glance this tells you where the model expects migration pressure to concentrate.
Zoomed all the way into northwest Spain. Real city labels (Santiago de Compostela, Lugo, A Coruña) come from the GeoNames worldcities dataset. The simulation is not abstract: these are actual places at their actual coordinates.Analyst mode after applying the Open Borders preset. All twelve economy sliders are exposed (Border friction, Connectivity, Agglomeration, Housing elasticity, Migration propensity, Softmax temperature, three price indices, R&D efficiency, Infra budget, Link base cost). The right panel switches to a five tab inspector with full SLO dashboards and a commodity master list.
The same world, recolored six different ways
The Node metric dropdown changes only what the city dots are colored by. The underlying simulation does not change. This is useful for asking “where are the wages high?” versus “where is housing expensive?” without changing anything else.
WageGDP per capitaHousing costResource outputGoods outputIP / technology output
How a single timestep actually works, in plain English
This is the part the README does not explain. Every timestep, the simulator does the same five things,
in order, over every city in parallel. Once you understand them, the findings later in this report
start to feel inevitable rather than mysterious.
Step 1: Every city gets an attractiveness score
Imagine you are deciding where to move. You probably do not consciously calculate a number, but you
do weigh things: how much money you can make there, how nice it is to live there, how expensive it
is to put a roof over your head, and how many of your industry's jobs are nearby. The simulator
does the same thing, just explicitly. Every city, every step, gets a single number called its
“destination utility,” which is roughly: wages, plus productivity, plus amenity, plus an
agglomeration bonus for being big, minus housing cost, plus a bonus for being near other things
you might need (resources, factories, ports, universities, talent).
The most important word in that sentence is “bonus for being big.” The simulator gives a city extra
attractiveness just because a lot of people already live there. That single mechanic, called
agglomeration, is responsible for almost every interesting dynamic in the model. It is the reason
Tokyo keeps growing. It is the reason Pittsburgh shrank for fifty years. It is also a feedback loop:
bigger cities attract more people, who make them bigger, who attract more people. Without a
counterweight (like housing cost), the model would collapse the entire world into one mega city.
Step 2: Residents pick where to go
Once every city has a score, each city's residents decide whether to move and, if so, where. The
simulator does this with a mechanism called softmax, which is just a fancy way of
saying “the probability you move to city X is the score of city X divided by the sum of scores for
every place you might go, with a temperature knob that controls how rational the choice is.”
Two things happen during this step that matter for everything downstream. The first is that the score
of a destination from your point of view depends not just on how attractive it is, but on how hard it
is to get there. That is where friction comes in. Distance friction, the connectivity multiplier, and
most importantly, border friction (an extra penalty if the destination is in a
different group from where you currently live) all apply here. Crank up border friction and people
stop crossing groups, even if the other side is much more attractive. Crank it down and the borders
effectively dissolve.
The second is that the total number of movers each step is just migration_propensity ×
population. The “Migration Surge” scenario in this report cranks that propensity up three
times. But (and this is the key insight that explains finding number one in the next section) more
movers does not mean different choices. The softmax is still the softmax. The first ten percent of
your population making the optimal choice and the next twenty percent making the optimal choice end
up at roughly the same destinations. That is why tripling migration does not triple GDP growth.
Step 3: Cities update themselves based on who came and went
After the migration round, every city has a new population. From there, a chain of feedback effects
fires. Cities that grew get a small bump to productivity (the agglomeration loop strikes again, this
time on the supply side). Housing costs respond to the new density, rising in growing cities and
falling in shrinking ones, at a rate set by the housing elasticity parameter. Wages
follow a formula that goes up with productivity and down with housing cost.
Housing elasticity is the single most important parameter for explaining the surprise finding later
in this report. Low elasticity means housing supply cannot easily expand, so when a city grows, its
rent skyrockets. That makes the city much less attractive to the next wave of movers, even though
its productivity is still high. The math forces growth to spread out instead of piling up. High
elasticity means housing absorbs growth easily, so productive cities just keep growing without
paying a price. The default in the simulator is 0.08. The Housing Constrained scenario uses 0.04,
which is enough to flip the model from “mega cities win” to “distributed networks win.”
Step 4: Three sectors of the economy produce output
Every city makes three kinds of things every step. Resources (proportional to its
mineral, energy, and agriculture deposits, with the deposits getting drawn down over time).
Manufactured goods (a function of industrial capacity, manufacturing specialization,
and proximity to raw materials). IP and technology (knowledge stock times tech
specialization times human capital, with an R&D efficiency multiplier on top).
Total city GDP combines wage income with monetized sector output, weighted by the three price indices
that the scenario sets. That is why scenarios like Resource Crunch can shift GDP composition without
shifting GDP magnitude much: they raise the per unit value of resource output but do not change the
physical production amount.
Step 5: Every fifteen steps, the AI planner builds something
By default this happens every fifteenth timestep. The autonomous infrastructure planner scans for the
single highest scoring pair of currently unconnected cities and builds a new link between them. We
cover how it decides in the next section.
The big picture, in one paragraph
The whole model rests on two things working at the same time: a discrete choice migration system
(people pick destinations by softmax over utilities) and an agglomeration feedback loop (bigger
cities are both more attractive AND more productive, so they get richer faster). Every interesting
finding later in this report is, at root, those two mechanisms interacting with each other under
different policy regimes. The Housing Constrained finding only makes sense once you see how housing
cost acts as a circuit breaker on the agglomeration loop, redirecting growth into secondary cities
before the big ones can monopolize it.
The autonomous infrastructure planner
Every fifteen timesteps, the simulator pauses the world for a moment and decides where to build one
new piece of infrastructure. There is no machine learning here. It is a fixed, deterministic,
opportunity scoring rule. The fact that it produces exactly the same number of builds in every
macro policy scenario (finding number four below) is a real and reproducible quirk of how it works.
How it picks where to build, in plain English
Each time it runs, the planner builds a candidate set of the top hundred and sixty cities by
population, the top hundred and sixty by raw resource endowment, and the top hundred and sixty by
innovation potential (tech specialization times knowledge stock). It then looks at every possible
pair of cities in that set that are not already directly connected. For each pair, it computes a
score that rewards:
Supply chain match. A city rich in resources connected to a city with lots of factories scores high. (Think Western Australia iron ore connected to Chinese steel mills.)
Innovation pull. A research heavy city connected to a city with a deep knowledge stock scores well. (Think Boston biotech connected to San Diego pharma.)
Coastal logistics. Both endpoints being port cities adds a bonus. (Think Singapore to Rotterdam.)
Raw demand. Bigger cities at either endpoint mean more potential traffic.
All of those factors then get multiplied by a distance penalty (closer is better, but not by much).
The highest scoring pair wins, and the planner builds the link. The type of link is decided by a
handful of rules: if either endpoint has very high resource endowment, it builds a pipeline. If both
endpoints have high port access and the distance is large, it builds shipping. Otherwise it builds
rail. The cost of the link comes out of the policy budget, and the planner just stops building once
the budget runs out.
The planner's output, every scenario, stacked by infrastructure type. The bars are identical at sixteen events across every macro regime. That uniformity is the basis for finding #4.
Why the planner ignores the macro regime
Look back at the score formula above. None of the factors reference the current macro parameters.
The planner does not know whether border friction is currently high or low, whether the housing
market is elastic or constrained, whether migration is surging or freezing. It looks at static city
attributes (population, resources, ports, knowledge) and at the topology of what is already connected,
and that is it. The macro regime should change which links would actually be valuable to build, but
the planner does not have a feedback signal that tells it that.
That is almost certainly a design simplification rather than a bug. A truly adaptive planner is much
more complicated to write and debug than a deterministic opportunity scorer. But it means the planner
is, in effect, a control variable across our eight scenarios. The same infrastructure gets built every
time, so any differences in outcomes are entirely attributable to differences in the macro policy,
not to differences in what got built. That actually makes the scenario comparison cleaner. It is just
not what the planner was probably meant to do.
What we ran, and how
The experimental setup
Eight scenarios from a library of twenty four
Each at 320 cities and 1,800 edges
Each at seed 42 for full reproducibility
Each ran for 200 timesteps
One optimizer search: four iterations of twelve trials, 120 step horizon per trial, three random shocks per trial
All results were pulled by driving the simulator directly from a Python script, not via the frontend, so the runs are as clean and headless as possible
The eight scenarios
Baseline (Default). Core reference state, all parameters at defaults.
Open Borders. Border friction crushed from 1.0 down to 0.05, migration propensity nudged up.
High Connectivity. Connectivity multiplier roughly doubled.
Network Fragmentation. Connectivity multiplier cut and friction raised.
Housing Constrained. Housing elasticity dropped from 0.08 to 0.04, simulating zoning restriction.
Migration Surge. Propensity to move roughly tripled.
Resource Crunch. Resource price index lifted to simulate scarcity.
What we measured
Total world GDP (sum across all cities)
Total migration volume per step (number of people moving)
Top 10 concentration (share of world population in the largest 10 cities)
Sector outputs: resources, manufactured goods, IP / technology
Per city population change over the full run
Autonomous policy events, with breakdown by type
Wall clock performance per scenario
Findings
01
Border policy moves migration 7.5× but barely moves GDP
From Fortress World, where roughly 8.1 million people move each timestep, to Migration Surge,
where 61.1 million people move each timestep, the volume of people in motion varies by almost
an order of magnitude. You would expect, if migration is the thing that allocates labor to where
it is most productive, that an eight times difference in migration would show up in a serious
difference in total output. It does not.
Across all eight scenarios, final GDP growth clusters in a tight 142 percent to 164 percent band.
That is a 1.15 times spread, against a 7.5 times spread in migration. The relationship between
“how much movement the policy allows” and “how much output the world produces” is, in this model,
surprisingly weak.
The mechanism is the softmax. Once people are choosing optimally over local utilities, more
people choosing optimally does not add much new optimization. The marginal mover is going to
roughly the same place as the average mover. So tripling migration triples the flux through
the network without significantly changing the destinations.
Average people moving per timestep, by scenario. Fortress World at 8.1 million per step (red) is the floor. Migration Surge at 61.1 million per step (green) is the ceiling. Everything else falls in between.Cumulative people moved across 200 steps. Migration Surge ships over 12 billion person moves across the run. Fortress World does about 1.6 billion. The horizontal separation is the entire effect of border policy on movement.
02
The counterintuitive winner: Housing Constrained
This was the most surprising single result in the entire study. The scenario in which housing
supply cannot easily expand (low housing elasticity, simulating zoning restriction) produced
the highest GDP growth in the panel (+163.9 percent) AND the lowest urban concentration
(top 10 share at 17.5 percent). Both directions at once. That is rare.
The mechanism, once you trace it back through the timestep loop, is logical. Low housing
elasticity means that when a city grows, its housing cost rockets. The migration utility function
subtracts housing cost. So as the biggest cities grow, they become rapidly less attractive to
the next wave of movers. Those movers spill into secondary cities instead. Secondary cities are
cheaper, get more residents, get more agglomeration bonus, get more productivity, and generate
more output. The end state is a world with more middle weight cities and fewer mega hubs, and
the total GDP is higher because productivity is more widely distributed.
The naive intuition (“if you cannot build housing in productive cities, you lose growth”) only
holds if you assume there are no productive alternatives. The model says there usually are.
Real world equivalents are at least plausible. San Francisco's well documented housing crisis
helped seed Austin, Boise, Salt Lake City, and Miami as new tech hubs. Vancouver's affordability
crisis pushed jobs to Calgary and Seattle. The simulator is surfacing a mechanism that economic
geographers have argued about for years: housing scarcity in superstar cities is not just a
local affordability problem, it is potentially a redistribution mechanism that raises aggregate
growth by forcing it elsewhere.
Each scenario plotted as one point. The horizontal axis is final urban concentration, the vertical axis is GDP growth. Housing Constrained sits alone in the upper left: more growth, less concentration. The seven other scenarios cluster in a much smaller region.Final GDP growth, ranked. Housing Constrained beats Open Borders by 21.7 percentage points. The gap is much larger than the run to run noise of the simulator at this horizon.
03
Open Borders flattens, it does not grow
Open Borders finishes last on GDP growth (+142.2 percent) and has the lowest top 10 concentration
of any non Housing Constrained scenario (18.2 percent). Removing friction lets people pour out of
the largest cities into the next tier, which homogenizes outcomes. The model does not reward
that with extra output.
This is the opposite of the finding most people expect when they bring up open borders. The
standard argument is that frictionless labor allocation should improve aggregate productivity by
letting workers move to wherever they are most valuable. In this simulator, that effect is real
but small, and it is outweighed by the loss of agglomeration concentration when the largest
cities are no longer pulling hardest.
The lesson is not that open borders is bad. The lesson is that GDP and distribution are different
objectives, and a single sim run cannot tell you which one to optimize for. The optimizer (covered
below) actually does pick open borders as part of its winning blend, but only when combined with
strong agglomeration and inelastic housing. By themselves, open borders are an equity lever, not
a growth lever.
GDP curves cluster tightly for the first hundred steps and only fan out after the agglomeration feedback has time to compound. The Housing Constrained curve (orange) breaks away last, after about timestep 150, because its mechanism (forcing growth into secondary cities) takes a long time to add up.Same data, split into one panel per scenario. The final growth percentage is annotated in the lower right of each. Easier to compare the shapes of the trajectories independently.
04
The autonomous planner is scenario insensitive
All eight scenarios produced exactly the same number of policy events: sixteen, over 200 steps,
which is consistent with a 15 step interval and the way the planner is budget bounded. The fact
that the count is identical across every macro regime tells you something specific about the
planner.
As we showed in the AI planner section above, the opportunity scoring function does not reference
any of the scenario level parameters. It scores pairs of cities based on static attributes
(population, resources, ports, knowledge stock) and on topology (what is currently connected).
The macro regime should change which links would actually be valuable to build, but the planner
does not know that.
Two ways to read this. One: it is a modeling gap. A real planner with a budget should respond
differently to Open Borders (where it makes sense to build trans group links) than to Fortress
World (where you might invest in within group resilience instead). Two: it is a useful baseline.
The finding isolates the effect of the macro policy from the effect of the planner. If you want
to study planner responsiveness, that would be a clean follow up study using the existing scoring
function as the control variable.
05
Urban concentration converges to a narrow band
Across the eight scenarios, the final share of world population in the top ten cities lands
between 17.5 percent (Housing Constrained, the most distributed) and 20.6 percent (Network
Fragmentation, the most concentrated). That is a narrow band given how different the underlying
policies are.
What that suggests is that there is some kind of natural settling point in the model for how
concentrated population wants to be, regardless of the macro regime. Push migration up, push
borders down, raise infrastructure budgets, do almost anything, and the system finds its way
back to roughly twenty percent of people living in ten cities. Only inelastic housing breaks
out of that band, by directly capping how large the largest cities can grow.
Top 10 share over time. Notice how all scenarios except Housing Constrained converge into a narrow band between 19 and 21 percent by the end of the run. Concentration appears to be a deeply baked equilibrium of the agglomeration loop, hard to shift without changing housing dynamics directly.
06
Sector composition stays remarkably stable
The three sector outputs (resources, manufactured goods, IP) climb together over each run in
roughly fixed proportions. Even in scenarios that explicitly manipulate one of the price indices,
the underlying physical production volumes barely shift relative composition.
This is because the price index parameters in this version of the model affect monetary valuation
(how much each unit of output is worth) rather than physical production choice. A city's actual
resource extraction depends on its deposits and reserves, not on the price. Raising the price
makes the same physical output worth more on paper. Useful for modeling inflation. Less useful
for modeling supply response.
Combined with finding number four, this suggests that the model is currently strong on
demographic dynamics and weak on production allocation response. A future version might want to
add a sector specialization feedback (cities increase their share of the sector with the highest
price index over time).
Stacked sector outputs for the Baseline run. Notice how the three layers grow in tandem rather than one capturing share from another.
07
The top winners and losers are dramatic in every scenario
Aggregate numbers hide the human story. Even in scenarios where total GDP and total migration
change very little, individual cities can swing wildly. The top winner in Open Borders is Yunfu
with a 173 percent population gain. The top loser is Bahawalpur, which loses 95 percent of its
population. Open Borders is not the only one: every single scenario in the study has at least
one city that gains over 40 percent and at least one that loses over 90 percent.
For each scenario, the magnitude of the top gainer and top loser. The gainers vary scenario to scenario, but every scenario has near total depopulation of at least one city. Stable macro numbers can hide enormous local upheaval.
For a policy paper using this kind of model, this finding matters more than the GDP numbers do.
It means you cannot evaluate a policy purely on its aggregate effect. Two policies that produce
identical world GDP can have radically different local consequences. The losers under one regime
are not the same as the losers under another.
The ML optimizer's verdict
The optimizer is the most interesting feature in the project that you cannot see in the UI. It is a
cross entropy parameter search that asks: across all the possible combinations of these twelve
economy levers, which combination does best when shocks hit? Here is what cross entropy actually
means, why this project uses it, and what it found.
What cross entropy optimization is, in plain English
Imagine you are trying to find the best spot in a darkened field for a picnic. You cannot see the
whole field, but you can drop a friend at any point and they will tell you how nice it is there.
Cross entropy works like this: you scatter a dozen friends around the field. They each report back.
You note which spots got the highest scores (the top thirty percent are your “elites”). You scatter
a new dozen friends around the average location of the elites, with a spread roughly equal to how
spread out the elites were. They report back. You keep doing this. Over time, your cloud of friends
drifts toward whatever spot in the field is best.
That is exactly what the optimizer does, except instead of friends it samples parameter sets, and
instead of asking how nice a spot is, it runs a full simulation under randomized shocks and computes
a score. It is gradient free (it never needs the derivative of the objective), it is parallel (each
sample is independent), and it works fine on noisy, expensive, discontinuous objectives. Which is
exactly what an agent based simulation under shocks is.
What “score” means here
The score the optimizer is maximizing is a weighted combination of five things. Higher final
GDP is good. Higher GDP growth is good. Lower volatility
is good (a stable economy beats a roller coaster). Lower concentration is good (a
world with growth spread across many cities scores better than a world dominated by a few). Lower
overpopulation is good (cities that exceed their carrying capacity get penalized). And
finally there is a small extra penalty for spending more than ninety two percent of the available
infrastructure budget, which discourages the optimizer from blowing the bank just to chase a few more
percentage points of GDP.
The learning curve
Mean score (blue) climbs monotonically from 3.62 to 4.68 across four iterations. Best score (green) is more volatile because the best score is a single sample drawn from the new distribution, and a single sample can get unlucky. In a longer run with more trials per iteration the best curve would smooth out, but mean is the right thing to watch for whether the optimizer is actually learning. It is.
The recommended blend
After four iterations of twelve trials each, here is the parameter set the optimizer landed on:
Parameter
Optimizer's pick
Default
Plain English direction
Border friction
0.015
1.00
essentially open borders
Agglomeration strength
0.71
0.25
roughly three times default. Let big cities pull hard.
Housing elasticity
0.01
0.08
floor of the allowed range. Housing supply barely responds.
IP price index
1.31
1.00
premium tech prices. Knowledge work pays more.
Infrastructure budget
2,955
2,000
about 48 percent more headroom
Link base cost
11.5
25
54 percent cheaper to build each link
Migration propensity
0.026
0.020
slightly raised. People are slightly more willing to move.
Temperature
1.05
0.70
migration choice is slightly less rational, slightly more exploratory
The outcome, under shocks
GDP growth: 3.41x over the horizon, versus 1.5x for the hand picked scenarios
GDP volatility: 0.0098, extremely low (the world barely wobbles when shocks hit)
Top 10 share: 19.2 percent, mid pack
Infrastructure budget actually spent: 3.1 percent of what was available
That last number is the most interesting one. The optimizer chose to keep a big infrastructure
budget around but barely use it. The interpretation: optionality is valuable. Cheap link cost
means each link, when actually built, hits the budget for very little, and the unused budget acts
as a buffer against future shocks. The optimizer figured out something close to a real world
financial principle: hold liquidity, build infrastructure only when it pays back fast.
Combining the picks gives a coherent story. Open borders so people can move. Strong agglomeration
so the productive places stay productive. Inelastic housing in those places so secondary cities
catch the spillover. Cheap and abundant infrastructure budget so the network can adapt when a
shock hits. Premium IP prices so research heavy cities (which tend to be the agglomeration winners)
contribute more to GDP. None of those choices in isolation would look smart. The combination scores
3.41 times growth.
Deep dive: who wins and who collapses under Open Borders?
Aggregate numbers hide the human story. The Open Borders scenario, when you look at it city by city,
redistributes population dramatically. In the seed 42 run, the biggest winner is Yunfu in Guangdong
at plus 173 percent. The biggest losers are Bahawalpur, Athlone, Bhubaneswar, and Jaffna, all of
which lose roughly 95 percent of their populations.
Open Borders over 200 steps. Left: top ten booming cities (Yunfu +173 percent, Lucknow +63, Puyang +58, Shantou +57, Shijiazhuang +55, Lima +54, Abidjan +53, Lagos +47, Bobo-Dioulasso +46, Foshan +46). Right: top ten declining cities (all near complete depopulation).
The pattern, in plain English
The winners cluster in two groups. The first is secondary Chinese cities (Yunfu, Puyang, Shantou,
Shijiazhuang, Foshan) that sit close enough to the megahubs (Beijing, Shanghai, Guangzhou) to benefit
from agglomeration spillover, but are not so close that the megahubs swallow them. The second is
large emerging market capitals (Lucknow, Lima, Abidjan, Lagos) that have strong local economies and
are the natural destination for movers within their region once friction drops.
The losers are smaller cities that sit in the shadow of a much bigger nearby city. Bahawalpur is
near Lahore. Bhubaneswar is near Kolkata. Jaffna is in the orbit of Colombo. Athlone is squeezed
between Dublin and Galway. When friction drops, the softmax migration choice for residents of these
small cities flips hard toward the nearby giant, and they bleed population for the rest of the run.
This is exactly the agglomeration versus regional balance tradeoff that economic geographers have
been writing about for a hundred years. The model is reproducing the classic finding: in a frictionless
world, secondary cities consolidate into a smaller number of larger ones, and the cities that lose
out are not the smallest cities, but the ones closest to a winner.
What all of this actually means for the real world
Reading the findings as a research artifact is one thing. Reading them as a guide to thinking about
real policy and real urbanization over the next thirty years is something else. This section is the
longest in the report because, honestly, the findings raise more interesting questions than the
findings themselves answer. Below are the big takeaways, the ones we think actually transfer outside
the simulator.
Housing is the lever the discourse underestimates
Most popular conversations about urbanization and growth treat housing policy as a downstream
problem: cities get successful, housing costs rise, we then debate what to do about it. The
simulator suggests something stranger and more provocative. Housing elasticity might be
the single most important upstream lever in the entire urban system. Not borders. Not
infrastructure budget. Not migration propensity. Just how easily housing supply responds to demand.
The reason this matters is that in the real world, housing elasticity is largely a policy choice.
It is set by zoning regulations, building codes, height limits, parking requirements, environmental
review processes, and historic preservation rules. Different countries and different cities have
wildly different effective elasticities. Tokyo has loose zoning and high elasticity. San Francisco
has the opposite. The simulator suggests that the answer to “why is one of these cities thriving and
the other one is exporting its tech industry?” is not really about taxes or talent. It is about
whether housing supply can keep up with demand.
Now flip that. The simulator also suggests that restricting housing supply in the most
productive cities, like San Francisco does, might actually produce higher aggregate growth, because
it forces growth out to Austin, Boise, Salt Lake City, Miami, Denver, and so on. That is exactly
the pattern the United States has experienced over the past fifteen years. The simulator predicts
it as a logical consequence of the agglomeration plus housing dynamic, without being told about
any of those cities specifically.
The honest implication is awkward: the cities with the worst housing policies may be inadvertently
doing the rest of the country a favor, by acting as innovation seedlings whose graduates have to
relocate to afford a life. That is not a great policy to be running on purpose, because it imposes
enormous local costs on the people who live in the constrained cities. But it does explain why
aggregate growth has held up better than the housing data alone would suggest.
Open borders is a distributive lever, not a growth lever
Almost every popular argument for or against open borders treats them as a question of total economic
output. Pro side: more people, more productive matching, higher GDP. Con side: pressure on services,
wage suppression, lower per capita GDP. The simulator suggests both sides are talking about the
wrong thing.
In the simulator, Open Borders did not produce particularly high GDP growth. It produced the lowest
growth of any scenario in the study. What it produced instead was a flatter distribution: less
concentration in megahubs, more activity in secondary cities, and (as we saw in the deep dive) huge
individual swings as small cities consolidate into larger ones. That is not a growth story. That is
a distribution story.
The real world implication is that the open borders debate should probably be reframed. The question
is not “will this make us richer?” The question is “where will the prosperity end up, and which
places will hollow out?” The simulator says the answer to that question is mostly determined by
geography (which small cities are next to which big cities) and by housing policy (whether the
receiving big cities can absorb growth). Border policy mostly just opens or closes the valve. It
does not determine the destination of the flow.
The agglomeration loop is the most powerful force in the model
Every interesting dynamic in the simulator, when you trace it back, depends on the agglomeration
feedback loop: bigger cities are more attractive, attract more people, become more productive,
become even more attractive. This is not a controversial claim in academic economic geography.
Edward Glaeser has spent thirty years writing about it. But the simulator makes the force visible
in a way that prose arguments cannot. You can literally watch the rich cities pull harder over the
course of a run.
The implication is that any policy that ignores or fights agglomeration is going to underperform.
“Spread investment evenly across regions” sounds fair but is, in the model, a recipe for slower
aggregate growth. The optimizer figured this out on its own. Its winning blend includes
three times default agglomeration strength. It wanted big cities to pull harder, not less
hard. Combined with inelastic housing in those big cities to redirect the spillover, this is the
closest thing to a real model output that says “the policy mix many countries are running today is
wrong in a specific identifiable way.”
Infrastructure planning, in a smart world, is about optionality more than about building
The single weirdest result in the optimizer output is that the winning policy spent only 3.1 percent
of its infrastructure budget. The natural reaction is “then why have such a big budget?” The model's
implicit answer is “because optionality is valuable when shocks hit.”
In real terms, this maps onto a fairly well known principle in corporate finance and emergency
management: you do not want to spend down your reserves in normal times, because the value of the
reserves comes from being able to deploy them quickly when something breaks. Public infrastructure
has the same property. A country with a large, mostly unused capacity to build emergency rail or
power links is more resilient than a country that has already built every link it could afford. The
simulator's optimizer reinvented this principle from first principles, which is mildly remarkable
given that nobody told it about financial reserves or option value.
The implication for actual infrastructure policy is that big visible projects are not necessarily
better than maintained capacity. Spending the budget all at once might score well on next year's
PR but it is, in the model's terms, throwing away your shock absorber.
Aggregates can lie. Always look at the city level.
Finding number seven (top winners and top losers per scenario) was the most uncomfortable one to
write up, because it shows that every scenario, including the “stable” ones, contained at
least one city losing more than ninety percent of its population. The world's total GDP went up in
all of them. The total population was nearly the same. But underneath, individual cities were
completely transformed.
The real world version of this is the question of who pays the cost of any policy. A policy that
raises aggregate GDP while wiping out half a dozen mid sized cities is not the same as a policy that
raises aggregate GDP while preserving them, even if the GDP numbers are identical. The simulator's
contribution to this debate is that you can now see which cities the policy is wiping out.
Bahawalpur, Athlone, Bhubaneswar, Jaffna. They are named. They are on the map. They were not just
data points. Whether the policy is worth the cost is a value judgment, but the cost itself is no
longer abstract.
What the model is missing, and why that matters
Honesty time. The simulator has real strengths in demographic and migration dynamics, agglomeration
feedback, and the budget logic for infrastructure. It has real weaknesses too. Here is what is
notably absent and what each absence means for how you should read the findings.
Climate is not modeled. Cities do not get more or less attractive based on heat, sea level rise, drought, or natural disasters. In the real world, this is going to be one of the dominant migration forces over the next thirty years. The model's predictions about who wins and who loses should be read as “absent climate change.”
Remote work is not modeled. The migration utility function assumes you need to physically be in a place to work there. The growing share of fully remote knowledge work decouples wage from location in a way the model does not capture.
Political instability is not modeled. Cities cannot become unsafe, lose their institutions, or have their governments collapse. The model assumes a globally stable institutional environment, which is generous.
Demographic transition is not modeled. The model does not include birth rates, death rates, or aging. In a real run over centuries, fertility collapse in East Asia and Europe would dwarf any of the migration effects we observed.
AI productivity shocks are not modeled. The IP and tech sectors grow at a smooth multiplier. If artificial intelligence dramatically raises the productivity of knowledge workers over the next decade, the agglomeration winners in this simulator (cities with high knowledge stock) would get a much bigger boost than the model currently shows.
Returns are not redistributed. There are no taxes, no transfer payments, no social safety nets. The model treats wages as flowing entirely to the people in the city that earned them. Adding redistribution would change which cities are attractive and how shocks propagate.
Each of these omissions is fixable in principle. None of them are trivial. The right way to think
about the current findings is that they describe the dynamics of a world that has resolved every
obvious uncertainty (climate is stable, work requires presence, governments function) and is just
being asked the narrow question of how migration, infrastructure, and prices interact. That is a
smaller question than “what will the world look like in 2050,” but it is a sharper question, and
the model's answers to it are correspondingly more trustworthy.
If these dynamics are real, what should we actually expect?
This is the speculative section. The simulator is a model, not a forecast, and any prediction we
derive from it is conditional on the model being approximately right about the underlying mechanisms.
With that disclaimer fully internalized, here is what the next thirty years might look like if the
forces the simulator describes really are at work, and how to tell whether they are.
Prediction 1: Secondary cities adjacent to constrained superstars will continue to outpace expectations
Austin, Boise, Salt Lake City, Nashville, Raleigh, Miami, Calgary, Lyon, Lisbon. These are not
random. They are the secondary cities sitting in the gravitational shadow of a constrained superstar
(San Francisco, Boston, New York, Toronto, Paris, Madrid). The model says they should keep gaining
economic weight as long as the superstar cities continue to refuse to build housing. Watch their job
growth, their housing prices, and their venture capital deal flow over the next ten years. If they
keep accelerating relative to the national average, that is consistent with the model. If they slow
down or reverse, that is evidence the model is missing something.
Prediction 2: Cities in the shadow of a megahub will hollow out faster than national averages suggest
The seed 42 run had Bahawalpur losing 95 percent to Lahore, Bhubaneswar losing 95 percent to
Kolkata, Jaffna losing 95 percent to Colombo. These are extreme numbers and would not literally
occur, but the directional signal is real. Secondary cities in developing countries that sit close
to a fast growing primary city are likely to experience accelerating population loss, even as the
national population grows. National statistics will hide this because the primary city's gain
offsets the secondary city's loss. Watch the second tier cities in India, Pakistan, Nigeria,
Indonesia, and Bangladesh. The ones near a megahub will likely empty out. The ones distant from
a megahub will likely consolidate into their region's primary.
Prediction 3: The countries that fix housing will pull away on aggregate growth
The simulator's strongest finding is that low housing elasticity in superstar cities is, perversely,
the highest aggregate growth policy. That is consistent with US data. But the cleaner test is the
opposite case: a country that solves housing in its superstar cities. Japan is one of the
only large countries with consistently elastic housing supply in its biggest cities. If our model
is right, Japan should be one of the few countries where the megahub continues to absorb new growth
rather than spilling it out. Watch Tokyo's relative share of Japanese GDP over the next decade. If
it grows, that is consistent with the model. If it stalls, the model is missing a force that pulls
growth out of megahubs even when housing is elastic.
Prediction 4: Optionality will look smarter and smarter as the world gets shockier
The optimizer's preference for an unspent infrastructure budget is a bet about volatility. The bet
is: in a world where shocks (resource crunches, infrastructure failures, migration spikes) are
common, the ability to deploy capital quickly is worth more than the capital being deployed today.
The next ten years are likely to have more shocks per decade than the past ten did. If the
optimizer's preference for optionality is right, the countries that hold dry powder (Norway's
sovereign wealth fund, Singapore's reserve buffers, Switzerland's federal surplus tradition) should
outperform the countries that aggressively front load their infrastructure spend. Watch how the
Inflation Reduction Act money in the United States compares to more conservative infrastructure
strategies in Northern Europe. If the conservative approach wins on resilience metrics, that
supports the optimizer's logic.
Prediction 5: Concentration metrics will stay surprisingly stable, except where housing breaks them
Finding number five showed that top 10 city concentration converges to about 20 percent of world
population almost regardless of macro regime. The only thing that breaks the band is housing policy.
The real world prediction is that the share of the global population living in the top 10 megahubs
should stay roughly constant over the next decade, with the exception of any city that runs a
serious aggressive housing supply program (which the model predicts would let that city absorb a
larger share). Most cities will not do that. Concentration should be a remarkably stable number to
measure year to year.
Prediction 6: Open borders, if implemented, would surprise pro and con sides equally
Pro side would be surprised because GDP growth would not increase much. Con side would be surprised
because the receiving cities would not be the obvious ones. The actual receivers would be secondary
cities in the right geographic position, not the global superstars that everyone fears or hopes
would absorb the migrants. The political coalitions formed around the issue today are arguing about
a phenomenon that, if it occurred, would not look like what either side expects.
How to falsify any of this
The whole point of being explicit about predictions is to make them falsifiable. Each of the six
predictions above has a clear way to be wrong. Secondary city growth slows. Megahub adjacent cities
do not hollow out. Tokyo's share of Japanese GDP does not rise. Optionality buffer countries lose to
big spenders. Concentration shifts dramatically. Open borders, where tried, produces the standard
pro side or con side outcomes. If any of those things happen, the model has missed something real,
and the next version needs to account for it. That is the contract you sign when you write something
down. We are signing it.
Caveats, honestly
Every claim in this report should come with the following asterisks. The point of a research
artifact is to be honest about what it can and cannot say.
Single seed. Every result here is from seed 42. The deterministic property is great for replayability, but a serious study would re-run with twenty seeds and report means and confidence intervals. The 1.15x GDP spread across scenarios may be larger than seed noise, but we have not formally proven it.
200 steps may be too short. Some scenarios' GDP curves had only just started to fan apart by timestep 200 (see the Housing Constrained breakaway in chart 1). A 500 step horizon would make the late stage divergence much clearer.
Identical migration totals appeared for Baseline, Network Fragmentation, and Resource Crunch, suggesting either that the migration system reaches steady state extremely fast, or that those scenarios do not actually differ in any parameter the migration calculation uses. Worth checking before drawing strong conclusions from any one of the three.
The optimizer was budget constrained. Four iterations of twelve trials. A real research run would use eight iterations of forty plus trials, plus multiple seeds, and would probably take eight hours instead of twenty minutes. The 3.41x result is suggestive, not conclusive.
The autonomous planner appears to ignore scenario regime (finding number four). We hypothesized this is because the scoring function reads only static city attributes. That is a hypothesis based on code reading, not on a controlled experiment.
The sector composition stability in finding number six is at least partially an artifact of the price indices controlling monetization rather than physical output. The model is currently labor and demography heavy, supply response light.
The predictions section is speculation. It explicitly extends model findings to real world expectations. Models are wrong in proportion to how much they extrapolate. Treat each prediction as a falsifiable claim, not a forecast.
This is a model. FlowWorld is a policy laboratory, not a forecast. The point is to probe which directions outcomes move when you twist the rules, not to predict actual city populations in 2050. Treat the magnitudes as ordinal, not cardinal.