FlowWorld: A Research Report

The question this project exists to answer

“If you change borders, transit links, housing rules, and infrastructure spending, what happens to where people live, the global economy, and which cities boom or collapse over time?”

FlowWorld is not trying to predict reality. It is, very explicitly, a policy laboratory. The point is to ask “what direction does the world move when I twist this knob?” and get a clean, reproducible, falsifiable answer in under a minute. Same seed, same world. Same parameters, same trajectory. Every screenshot in this report can be regenerated bit for bit by anyone who runs the same script, and every claim in the findings can be argued with on the basis of actual numbers.

That matters because most “what if we opened borders” conversations are run as thought experiments where the person making the argument never has to commit to a specific prediction. A simulation forces commitment. You set the friction parameter to 0.05, you press play, and the model gives you an actual number for global GDP, an actual list of cities that grew, and an actual list of cities that collapsed. You can disagree with the model on its assumptions. You cannot escape it by being vague.

The scenarios in this report were not cherry picked to produce nice narratives. The eight presets were taken straight from the project's scenario library, chosen to span the design space (borders, infrastructure, housing, migration, prices). The numbers came out the way they came out, including the ones that were the opposite of what we expected. Especially those.

How this project actually started

Before any of the simulator was written, somebody sat down and wrote out exactly what they wanted to be able to ask of it. The four original questions are still visible in the codebase today, because every feature ended up being a tool built to answer one of them.

The original mission, in one sentence

Build a research grade simulator that can honestly answer what the world looks like under different border policies, where people end up moving when cities run out of room, how resources and trade routes shape the economy over time, and which infrastructure decisions actually pay off under a real budget constraint.

That sentence is the reason every other piece of the project exists. The migration model is there because of borders and crowding. The three sector economy is there because of resources and trade. The autonomous infrastructure planner with its budget cap is there because of the infrastructure question. When you look at a feature in the simulator and wonder why it is there, the answer is almost always “because somebody needed to be able to answer one of the four original questions.”

The four research questions, in plain English

The original framing broke that mission into four specific questions. They are worth stating individually because each one is deliberately structured to admit a tradeoff rather than to have a single right answer:

Migration dynamics. How strongly do wages, housing costs, amenities, and connectivity actually drive where people move? And at what point do crowding and rent in the destination start to self-limit how many more people will keep coming?
Infrastructure policy. Which kinds of infrastructure (rail, pipeline, shipping, road) produce the highest long run GDP gain per dollar spent? Under a real budget, when should you build now versus wait?
Resource economy. How do extraction, depletion, logistics, and market pricing interact in practice? Does better connectivity make resource prices more competitive, or does it just shift who captures the profit?
Stability and equity. Can a policy raise GDP while also controlling overcrowding, urban concentration, and economic volatility? Which policy mixes hold up best when something unexpected hits (a resource crunch, a broken trade route, a wave of migration)?

Notice that each question has two halves that pull against each other. Growth versus crowding. Investment now versus investment later. Prices versus rents. Aggregate output versus distribution. The project assumed from day one that good policy is a negotiation, not a maximum.

The honesty pledge

The third commitment the project made to itself was a hard line against fantasy modeling. The statement, more or less verbatim:

100 percent real-world accuracy is not achievable in complex social and economic systems. The goal is high fidelity calibrated modeling with quantified uncertainty, not perfect prediction.

That sentence is doing a lot of work. It is the difference between an honest simulator and a crystal ball pretending to be one. A lot of policy modeling fails because the modeler quietly forgets that their assumptions are assumptions. The project committed up front to treating the simulator as a calibrated estimator rather than a forecast. You can see that commitment in the user interface today: every run displays a confidence level (the trust stoplight in the top bar) and the right panel has a “factor coverage disclosure” section that lists every economic factor the model knows about along with how complete the underlying data actually is. The simulator tells you when it is guessing.

The eight non-negotiable design principles

Eight principles were written down as non-negotiable. They are not features. They are constraints on what counts as a valid version of the project. Each one shows up later as something concrete you can actually see:

Deterministic replay. Every run can be reproduced bit for bit by anyone with the same seed. (This is why every result in this report uses seed 42.)
Real budget constraints on infrastructure. The autonomous planner cannot build forever. It runs out of money and stops.
Explicit affordability and capacity limits. Housing costs respond to density. People cannot ignore the cost of where they choose to live.
Real resource stocks that get depleted. A mineral rich city cannot pump forever. Reserves draw down over time.
Real supply and demand pricing. Three economic sectors (resources, goods, IP) each have their own price index that affects both GDP and the migration choice.
Full causality logging. Every timestep produces a written record of what happened and why, in plain text. Any decision the simulator made can be reviewed after the fact.
Scenario testing under shocks. A library of 24 contrasting scenarios is built in, and the optimizer evaluates candidates under randomized shocks. Policy that breaks under stress gets penalized.
Explainability. The optimizer does not just output a winning parameter set. It outputs the learning curve, the spread of candidates, and the score breakdown. You can argue with its answer.

How the project actually got built

The project came together in three rough phases. Each one had a different goal and produced a different kind of work. None of them are visible to a user dropping in today, but the order they happened in is the reason the simulator behaves the way it does.

First: set the rules of the game

Before any of the simulation code got written, the project spent time writing down what would count as a finished feature, what would count as a valid experiment, and how decisions would get reviewed. A governance charter. A quality checklist. A template for documenting every assumption. A schema for how to describe an experiment so that someone else could reproduce it later.

This sounds boring. It is the opposite of fun engineering work. But it is the reason the rest of the project did not turn into a research toy that nobody could trust. When the simulator later started producing data, every result had a place to live, a quality bar it had to pass, and a paper trail. Most simulation projects skip this step and pay for it the first time someone asks “wait, is that number real or is it from the early prototype?” This one paid up front instead.

Second: build a data layer that does not lie

The middle phase was about making sure the cities in the simulator were actually cities, not made up dots on a map. The simulator can fall back to synthetic placements if real data is unavailable, but the real version uses the GeoNames worldcities dataset, which contains real coordinates, populations, and country assignments for hundreds of thousands of places.

Getting that data into the simulator was not as simple as importing a CSV. The project built quality protocols for ingesting city data, a license and jurisdiction matrix so each data source's legal boundaries were documented, validation rules that flagged missing or anomalous entries, and provenance audits that traced each field in each city back to its source. Once those were running, the project could honestly say which cities in the simulation were real and which were filler, and could report a per city confidence level. (This shows up in the user interface as the “city accuracy” field visible in the place inspector.)

By the end of this phase, the simulator had a real data layer with documented provenance and a quality gate that any new dataset had to pass before it could be used in a run.

Third: turn the research tool into something that looks alive

The most ambitious phase was about making the simulator feel like something you actually want to watch. The visual aspiration was explicit: it should feel like watching the popular city building game Cities Skylines, but with real planetary data and a real economic engine underneath. That meant a lot of work on the frontend (the deck.gl rendering layer, the timeline scrubber, the camera, the flow animations) and a lot of work on the simulation output (per timestep flow records, per leg throughput data, named bottlenecks, replay bundles).

The same phase also expanded the economy. The original simulator had three abstract sectors (resources, goods, IP). The expanded version turns those into a real supply chain twin with named commodities (oil, raw materials, food and agriculture, manufactured goods, technology and IP), modal transport (rail, pipeline, shipping, road, air), corridor level throughput tracking, and automated detection of where bottlenecks form. The “commodity master list” you see in the right panel of the UI is the user facing surface of that work.

Current state

The implementation is done, the runs work, the optimizer works, and the visualization is fully operational. What remains is formal research signoff: five research gates covering city schema accuracy, resource accuracy, connectivity, supply chain economics, and visualization integrity are drafted but not yet signed off, mostly because nobody had produced an evidence bundle that answered them in one place. This report is essentially that evidence bundle.

Architecture, briefly

Two services run locally. A Python backend on port 8010 does all the math. A web frontend on port 5173 draws the map and the dashboards. They talk over WebSocket so the simulation can stream timestep updates to the browser in real time, up to about thirty steps per second.

Backend

Python with FastAPI for the API and NumPy for the math. The simulation engine does every per city calculation as a parallel array operation rather than a Python loop. (Practically, that is the difference between a run that takes ten minutes and a run that takes ten seconds.) A separate optimizer module wraps the simulator with a cross entropy search loop. World generation can read real city data from GeoNames or fall back to synthetic placements if no dataset is installed. Every run gets persisted in three formats: structured data for replay, raw rows for analysis, and a human readable play by play log.

FastAPI 0.115 · NumPy 2.1 · Pydantic 2.9 · Uvicorn

Frontend

React 18 with TypeScript, served by Vite. The map is built with deck.gl on top of a MapLibre GL basemap (the same technology Mapbox uses, but free and open source). Each timestep arrives over WebSocket as a JSON message containing per city state, top migration arcs, and global metrics. Each is rendered as a separate visual layer that the user can toggle independently.

React 18.3 · deck.gl 9.0 · MapLibre 4.7 · Recharts 2.12

ML optimizer

A from scratch cross entropy parameter search over twelve economy levers. Each candidate parameter set runs a full simulation, augmented with three randomized shocks (resource crunch, infrastructure failure, migration spike), and gets scored on GDP, volatility, concentration, overpopulation, and infrastructure spend overage. The top thirty percent of candidates per iteration become the “elites” and get used to refit the sampling distribution for the next round.

POST /api/optimize

Data layer

Storage follows a standard data lakehouse layout, splitting raw inputs from curated tables from ready to use features. Every run produces a timestamped summary, a replay bundle, and a decision audit trail. Twenty four scenario presets ship with the project. Every result in this report can be regenerated from those presets and a fixed seed.

What the interface looks like

Three panels. The map fills the middle. The left sidebar holds the scenario library, twelve economy sliders, and all the layer toggles. The right panel shows the inspector (with five tabs in Analyst mode: Node, Leg, Chain, Policy, Trust) and the live analytics dashboards.

Mature sim overview — The default view, about 170 timesteps in. Cyan and blue arcs are the top migration flows for this frame. The bright yellow arc near East Asia is a freshly built policy bridge (rail) that the autonomous planner constructed. The right panel shows the largest active corridors sorted by throughput, with Tianjin to Beijing leading at roughly 211,690 tons per day.

Money revenue mode — Money mode, revenue overlay. Each trade corridor is recolored by whether moving goods across it is profitable. Green means the throughput times the unit price exceeds the transport cost. Amber means net loss. The line width still encodes throughput. This view is the fastest way to see which trade routes are economically alive and which are just structurally present.

Population heatmap — The population heatmap layer. Hotspots are East Asia, South Asia, West Africa, and Western Europe. Even at a glance this tells you where the model expects migration pressure to concentrate.

City-scale zoom — Zoomed all the way into northwest Spain. Real city labels (Santiago de Compostela, Lugo, A Coruña) come from the GeoNames worldcities dataset. The simulation is not abstract: these are actual places at their actual coordinates.

Analyst mode after applying the Open Borders preset. All twelve economy sliders are exposed (Border friction, Connectivity, Agglomeration, Housing elasticity, Migration propensity, Softmax temperature, three price indices, R&D efficiency, Infra budget, Link base cost). The right panel switches to a five tab inspector with full SLO dashboards and a commodity master list.

The same world, recolored six different ways

The Node metric dropdown changes only what the city dots are colored by. The underlying simulation does not change. This is useful for asking “where are the wages high?” versus “where is housing expensive?” without changing anything else.

How a single timestep actually works, in plain English

This is the part the README does not explain. Every timestep, the simulator does the same five things, in order, over every city in parallel. Once you understand them, the findings later in this report start to feel inevitable rather than mysterious.

Step 1: Every city gets an attractiveness score

Imagine you are deciding where to move. You probably do not consciously calculate a number, but you do weigh things: how much money you can make there, how nice it is to live there, how expensive it is to put a roof over your head, and how many of your industry's jobs are nearby. The simulator does the same thing, just explicitly. Every city, every step, gets a single number called its “destination utility,” which is roughly: wages, plus productivity, plus amenity, plus an agglomeration bonus for being big, minus housing cost, plus a bonus for being near other things you might need (resources, factories, ports, universities, talent).

The most important word in that sentence is “bonus for being big.” The simulator gives a city extra attractiveness just because a lot of people already live there. That single mechanic, called agglomeration, is responsible for almost every interesting dynamic in the model. It is the reason Tokyo keeps growing. It is the reason Pittsburgh shrank for fifty years. It is also a feedback loop: bigger cities attract more people, who make them bigger, who attract more people. Without a counterweight (like housing cost), the model would collapse the entire world into one mega city.

Step 2: Residents pick where to go

Once every city has a score, each city's residents decide whether to move and, if so, where. The simulator does this with a mechanism called softmax, which is just a fancy way of saying “the probability you move to city X is the score of city X divided by the sum of scores for every place you might go, with a temperature knob that controls how rational the choice is.”

Two things happen during this step that matter for everything downstream. The first is that the score of a destination from your point of view depends not just on how attractive it is, but on how hard it is to get there. That is where friction comes in. Distance friction, the connectivity multiplier, and most importantly, border friction (an extra penalty if the destination is in a different group from where you currently live) all apply here. Crank up border friction and people stop crossing groups, even if the other side is much more attractive. Crank it down and the borders effectively dissolve.

The second is that the total number of movers each step is just migration_propensity × population. The “Migration Surge” scenario in this report cranks that propensity up three times. But (and this is the key insight that explains finding number one in the next section) more movers does not mean different choices. The softmax is still the softmax. The first ten percent of your population making the optimal choice and the next twenty percent making the optimal choice end up at roughly the same destinations. That is why tripling migration does not triple GDP growth.

Step 3: Cities update themselves based on who came and went

After the migration round, every city has a new population. From there, a chain of feedback effects fires. Cities that grew get a small bump to productivity (the agglomeration loop strikes again, this time on the supply side). Housing costs respond to the new density, rising in growing cities and falling in shrinking ones, at a rate set by the housing elasticity parameter. Wages follow a formula that goes up with productivity and down with housing cost.

Housing elasticity is the single most important parameter for explaining the surprise finding later in this report. Low elasticity means housing supply cannot easily expand, so when a city grows, its rent skyrockets. That makes the city much less attractive to the next wave of movers, even though its productivity is still high. The math forces growth to spread out instead of piling up. High elasticity means housing absorbs growth easily, so productive cities just keep growing without paying a price. The default in the simulator is 0.08. The Housing Constrained scenario uses 0.04, which is enough to flip the model from “mega cities win” to “distributed networks win.”

Step 4: Three sectors of the economy produce output

Every city makes three kinds of things every step. Resources (proportional to its mineral, energy, and agriculture deposits, with the deposits getting drawn down over time). Manufactured goods (a function of industrial capacity, manufacturing specialization, and proximity to raw materials). IP and technology (knowledge stock times tech specialization times human capital, with an R&D efficiency multiplier on top).

Total city GDP combines wage income with monetized sector output, weighted by the three price indices that the scenario sets. That is why scenarios like Resource Crunch can shift GDP composition without shifting GDP magnitude much: they raise the per unit value of resource output but do not change the physical production amount.

Step 5: Every fifteen steps, the AI planner builds something

By default this happens every fifteenth timestep. The autonomous infrastructure planner scans for the single highest scoring pair of currently unconnected cities and builds a new link between them. We cover how it decides in the next section.

The big picture, in one paragraph

The whole model rests on two things working at the same time: a discrete choice migration system (people pick destinations by softmax over utilities) and an agglomeration feedback loop (bigger cities are both more attractive AND more productive, so they get richer faster). Every interesting finding later in this report is, at root, those two mechanisms interacting with each other under different policy regimes. The Housing Constrained finding only makes sense once you see how housing cost acts as a circuit breaker on the agglomeration loop, redirecting growth into secondary cities before the big ones can monopolize it.

The autonomous infrastructure planner

Every fifteen timesteps, the simulator pauses the world for a moment and decides where to build one new piece of infrastructure. There is no machine learning here. It is a fixed, deterministic, opportunity scoring rule. The fact that it produces exactly the same number of builds in every macro policy scenario (finding number four below) is a real and reproducible quirk of how it works.

How it picks where to build, in plain English

Each time it runs, the planner builds a candidate set of the top hundred and sixty cities by population, the top hundred and sixty by raw resource endowment, and the top hundred and sixty by innovation potential (tech specialization times knowledge stock). It then looks at every possible pair of cities in that set that are not already directly connected. For each pair, it computes a score that rewards:

Supply chain match. A city rich in resources connected to a city with lots of factories scores high. (Think Western Australia iron ore connected to Chinese steel mills.)
Innovation pull. A research heavy city connected to a city with a deep knowledge stock scores well. (Think Boston biotech connected to San Diego pharma.)
Coastal logistics. Both endpoints being port cities adds a bonus. (Think Singapore to Rotterdam.)
Raw demand. Bigger cities at either endpoint mean more potential traffic.

All of those factors then get multiplied by a distance penalty (closer is better, but not by much). The highest scoring pair wins, and the planner builds the link. The type of link is decided by a handful of rules: if either endpoint has very high resource endowment, it builds a pipeline. If both endpoints have high port access and the distance is large, it builds shipping. Otherwise it builds rail. The cost of the link comes out of the policy budget, and the planner just stops building once the budget runs out.

Policy events by type — The planner's output, every scenario, stacked by infrastructure type. The bars are identical at sixteen events across every macro regime. That uniformity is the basis for finding #4.

Why the planner ignores the macro regime

Look back at the score formula above. None of the factors reference the current macro parameters. The planner does not know whether border friction is currently high or low, whether the housing market is elastic or constrained, whether migration is surging or freezing. It looks at static city attributes (population, resources, ports, knowledge) and at the topology of what is already connected, and that is it. The macro regime should change which links would actually be valuable to build, but the planner does not have a feedback signal that tells it that.

That is almost certainly a design simplification rather than a bug. A truly adaptive planner is much more complicated to write and debug than a deterministic opportunity scorer. But it means the planner is, in effect, a control variable across our eight scenarios. The same infrastructure gets built every time, so any differences in outcomes are entirely attributable to differences in the macro policy, not to differences in what got built. That actually makes the scenario comparison cleaner. It is just not what the planner was probably meant to do.

What we ran, and how

The experimental setup

Eight scenarios from a library of twenty four
Each at 320 cities and 1,800 edges
Each at seed 42 for full reproducibility
Each ran for 200 timesteps
One optimizer search: four iterations of twelve trials, 120 step horizon per trial, three random shocks per trial
All results were pulled by driving the simulator directly from a Python script, not via the frontend, so the runs are as clean and headless as possible

The eight scenarios

Baseline (Default). Core reference state, all parameters at defaults.
Open Borders. Border friction crushed from 1.0 down to 0.05, migration propensity nudged up.
Fortress World. Border friction raised hard, migration propensity dropped.
High Connectivity. Connectivity multiplier roughly doubled.
Network Fragmentation. Connectivity multiplier cut and friction raised.
Housing Constrained. Housing elasticity dropped from 0.08 to 0.04, simulating zoning restriction.
Migration Surge. Propensity to move roughly tripled.
Resource Crunch. Resource price index lifted to simulate scarcity.

What we measured

Total world GDP (sum across all cities)
Total migration volume per step (number of people moving)
Top 10 concentration (share of world population in the largest 10 cities)
Sector outputs: resources, manufactured goods, IP / technology
Per city population change over the full run
Autonomous policy events, with breakdown by type
Wall clock performance per scenario

Findings

01

Border policy moves migration 7.5× but barely moves GDP

From Fortress World, where roughly 8.1 million people move each timestep, to Migration Surge, where 61.1 million people move each timestep, the volume of people in motion varies by almost an order of magnitude. You would expect, if migration is the thing that allocates labor to where it is most productive, that an eight times difference in migration would show up in a serious difference in total output. It does not.

Across all eight scenarios, final GDP growth clusters in a tight 142 percent to 164 percent band. That is a 1.15 times spread, against a 7.5 times spread in migration. The relationship between “how much movement the policy allows” and “how much output the world produces” is, in this model, surprisingly weak.

The mechanism is the softmax. Once people are choosing optimally over local utilities, more people choosing optimally does not add much new optimization. The marginal mover is going to roughly the same place as the average mover. So tripling migration triples the flux through the network without significantly changing the destinations.

Migration bar comparison — Average people moving per timestep, by scenario. Fortress World at 8.1 million per step (red) is the floor. Migration Surge at 61.1 million per step (green) is the ceiling. Everything else falls in between.

Cumulative migration — Cumulative people moved across 200 steps. Migration Surge ships over 12 billion person moves across the run. Fortress World does about 1.6 billion. The horizontal separation is the entire effect of border policy on movement.

02

The counterintuitive winner: Housing Constrained

This was the most surprising single result in the entire study. The scenario in which housing supply cannot easily expand (low housing elasticity, simulating zoning restriction) produced the highest GDP growth in the panel (+163.9 percent) AND the lowest urban concentration (top 10 share at 17.5 percent). Both directions at once. That is rare.

The mechanism, once you trace it back through the timestep loop, is logical. Low housing elasticity means that when a city grows, its housing cost rockets. The migration utility function subtracts housing cost. So as the biggest cities grow, they become rapidly less attractive to the next wave of movers. Those movers spill into secondary cities instead. Secondary cities are cheaper, get more residents, get more agglomeration bonus, get more productivity, and generate more output. The end state is a world with more middle weight cities and fewer mega hubs, and the total GDP is higher because productivity is more widely distributed.

The naive intuition (“if you cannot build housing in productive cities, you lose growth”) only holds if you assume there are no productive alternatives. The model says there usually are.

Real world equivalents are at least plausible. San Francisco's well documented housing crisis helped seed Austin, Boise, Salt Lake City, and Miami as new tech hubs. Vancouver's affordability crisis pushed jobs to Calgary and Seattle. The simulator is surfacing a mechanism that economic geographers have argued about for years: housing scarcity in superstar cities is not just a local affordability problem, it is potentially a redistribution mechanism that raises aggregate growth by forcing it elsewhere.

GDP vs concentration scatter — Each scenario plotted as one point. The horizontal axis is final urban concentration, the vertical axis is GDP growth. Housing Constrained sits alone in the upper left: more growth, less concentration. The seven other scenarios cluster in a much smaller region.

GDP growth bars — Final GDP growth, ranked. Housing Constrained beats Open Borders by 21.7 percentage points. The gap is much larger than the run to run noise of the simulator at this horizon.

03

Open Borders flattens, it does not grow

Open Borders finishes last on GDP growth (+142.2 percent) and has the lowest top 10 concentration of any non Housing Constrained scenario (18.2 percent). Removing friction lets people pour out of the largest cities into the next tier, which homogenizes outcomes. The model does not reward that with extra output.

This is the opposite of the finding most people expect when they bring up open borders. The standard argument is that frictionless labor allocation should improve aggregate productivity by letting workers move to wherever they are most valuable. In this simulator, that effect is real but small, and it is outweighed by the loss of agglomeration concentration when the largest cities are no longer pulling hardest.

The lesson is not that open borders is bad. The lesson is that GDP and distribution are different objectives, and a single sim run cannot tell you which one to optimize for. The optimizer (covered below) actually does pick open borders as part of its winning blend, but only when combined with strong agglomeration and inelastic housing. By themselves, open borders are an equity lever, not a growth lever.

GDP curves cluster tightly for the first hundred steps and only fan out after the agglomeration feedback has time to compound. The Housing Constrained curve (orange) breaks away last, after about timestep 150, because its mechanism (forcing growth into secondary cities) takes a long time to add up.

GDP small multiples per scenario — Same data, split into one panel per scenario. The final growth percentage is annotated in the lower right of each. Easier to compare the shapes of the trajectories independently.

04

The autonomous planner is scenario insensitive

All eight scenarios produced exactly the same number of policy events: sixteen, over 200 steps, which is consistent with a 15 step interval and the way the planner is budget bounded. The fact that the count is identical across every macro regime tells you something specific about the planner.

As we showed in the AI planner section above, the opportunity scoring function does not reference any of the scenario level parameters. It scores pairs of cities based on static attributes (population, resources, ports, knowledge stock) and on topology (what is currently connected). The macro regime should change which links would actually be valuable to build, but the planner does not know that.

Two ways to read this. One: it is a modeling gap. A real planner with a budget should respond differently to Open Borders (where it makes sense to build trans group links) than to Fortress World (where you might invest in within group resilience instead). Two: it is a useful baseline. The finding isolates the effect of the macro policy from the effect of the planner. If you want to study planner responsiveness, that would be a clean follow up study using the existing scoring function as the control variable.

05

Urban concentration converges to a narrow band

Across the eight scenarios, the final share of world population in the top ten cities lands between 17.5 percent (Housing Constrained, the most distributed) and 20.6 percent (Network Fragmentation, the most concentrated). That is a narrow band given how different the underlying policies are.

What that suggests is that there is some kind of natural settling point in the model for how concentrated population wants to be, regardless of the macro regime. Push migration up, push borders down, raise infrastructure budgets, do almost anything, and the system finds its way back to roughly twenty percent of people living in ten cities. Only inelastic housing breaks out of that band, by directly capping how large the largest cities can grow.

Concentration curves — Top 10 share over time. Notice how all scenarios except Housing Constrained converge into a narrow band between 19 and 21 percent by the end of the run. Concentration appears to be a deeply baked equilibrium of the agglomeration loop, hard to shift without changing housing dynamics directly.

06

Sector composition stays remarkably stable

The three sector outputs (resources, manufactured goods, IP) climb together over each run in roughly fixed proportions. Even in scenarios that explicitly manipulate one of the price indices, the underlying physical production volumes barely shift relative composition.

This is because the price index parameters in this version of the model affect monetary valuation (how much each unit of output is worth) rather than physical production choice. A city's actual resource extraction depends on its deposits and reserves, not on the price. Raising the price makes the same physical output worth more on paper. Useful for modeling inflation. Less useful for modeling supply response.

Combined with finding number four, this suggests that the model is currently strong on demographic dynamics and weak on production allocation response. A future version might want to add a sector specialization feedback (cities increase their share of the sector with the highest price index over time).

Sector output — Stacked sector outputs for the Baseline run. Notice how the three layers grow in tandem rather than one capturing share from another.

07

The top winners and losers are dramatic in every scenario

Aggregate numbers hide the human story. Even in scenarios where total GDP and total migration change very little, individual cities can swing wildly. The top winner in Open Borders is Yunfu with a 173 percent population gain. The top loser is Bahawalpur, which loses 95 percent of its population. Open Borders is not the only one: every single scenario in the study has at least one city that gains over 40 percent and at least one that loses over 90 percent.

Top winners and losers per scenario — For each scenario, the magnitude of the top gainer and top loser. The gainers vary scenario to scenario, but every scenario has near total depopulation of at least one city. Stable macro numbers can hide enormous local upheaval.

For a policy paper using this kind of model, this finding matters more than the GDP numbers do. It means you cannot evaluate a policy purely on its aggregate effect. Two policies that produce identical world GDP can have radically different local consequences. The losers under one regime are not the same as the losers under another.

The ML optimizer's verdict

The optimizer is the most interesting feature in the project that you cannot see in the UI. It is a cross entropy parameter search that asks: across all the possible combinations of these twelve economy levers, which combination does best when shocks hit? Here is what cross entropy actually means, why this project uses it, and what it found.

What cross entropy optimization is, in plain English

Imagine you are trying to find the best spot in a darkened field for a picnic. You cannot see the whole field, but you can drop a friend at any point and they will tell you how nice it is there. Cross entropy works like this: you scatter a dozen friends around the field. They each report back. You note which spots got the highest scores (the top thirty percent are your “elites”). You scatter a new dozen friends around the average location of the elites, with a spread roughly equal to how spread out the elites were. They report back. You keep doing this. Over time, your cloud of friends drifts toward whatever spot in the field is best.

That is exactly what the optimizer does, except instead of friends it samples parameter sets, and instead of asking how nice a spot is, it runs a full simulation under randomized shocks and computes a score. It is gradient free (it never needs the derivative of the objective), it is parallel (each sample is independent), and it works fine on noisy, expensive, discontinuous objectives. Which is exactly what an agent based simulation under shocks is.

What “score” means here

The score the optimizer is maximizing is a weighted combination of five things. Higher final GDP is good. Higher GDP growth is good. Lower volatility is good (a stable economy beats a roller coaster). Lower concentration is good (a world with growth spread across many cities scores better than a world dominated by a few). Lower overpopulation is good (cities that exceed their carrying capacity get penalized). And finally there is a small extra penalty for spending more than ninety two percent of the available infrastructure budget, which discourages the optimizer from blowing the bank just to chase a few more percentage points of GDP.

The learning curve

Optimizer learning curve — Mean score (blue) climbs monotonically from 3.62 to 4.68 across four iterations. Best score (green) is more volatile because the best score is a single sample drawn from the new distribution, and a single sample can get unlucky. In a longer run with more trials per iteration the best curve would smooth out, but mean is the right thing to watch for whether the optimizer is actually learning. It is.

The recommended blend

After four iterations of twelve trials each, here is the parameter set the optimizer landed on:

Parameter	Optimizer's pick	Default	Plain English direction
Border friction	0.015	1.00	essentially open borders
Agglomeration strength	0.71	0.25	roughly three times default. Let big cities pull hard.
Housing elasticity	0.01	0.08	floor of the allowed range. Housing supply barely responds.
IP price index	1.31	1.00	premium tech prices. Knowledge work pays more.
Infrastructure budget	2,955	2,000	about 48 percent more headroom
Link base cost	11.5	25	54 percent cheaper to build each link
Migration propensity	0.026	0.020	slightly raised. People are slightly more willing to move.
Temperature	1.05	0.70	migration choice is slightly less rational, slightly more exploratory

The outcome, under shocks

GDP growth: 3.41x over the horizon, versus 1.5x for the hand picked scenarios
GDP volatility: 0.0098, extremely low (the world barely wobbles when shocks hit)
Top 10 share: 19.2 percent, mid pack
Infrastructure budget actually spent: 3.1 percent of what was available

That last number is the most interesting one. The optimizer chose to keep a big infrastructure budget around but barely use it. The interpretation: optionality is valuable. Cheap link cost means each link, when actually built, hits the budget for very little, and the unused budget acts as a buffer against future shocks. The optimizer figured out something close to a real world financial principle: hold liquidity, build infrastructure only when it pays back fast.

Combining the picks gives a coherent story. Open borders so people can move. Strong agglomeration so the productive places stay productive. Inelastic housing in those places so secondary cities catch the spillover. Cheap and abundant infrastructure budget so the network can adapt when a shock hits. Premium IP prices so research heavy cities (which tend to be the agglomeration winners) contribute more to GDP. None of those choices in isolation would look smart. The combination scores 3.41 times growth.

Deep dive: who wins and who collapses under Open Borders?

Aggregate numbers hide the human story. The Open Borders scenario, when you look at it city by city, redistributes population dramatically. In the seed 42 run, the biggest winner is Yunfu in Guangdong at plus 173 percent. The biggest losers are Bahawalpur, Athlone, Bhubaneswar, and Jaffna, all of which lose roughly 95 percent of their populations.

Open Borders winners and losers — Open Borders over 200 steps. Left: top ten booming cities (Yunfu +173 percent, Lucknow +63, Puyang +58, Shantou +57, Shijiazhuang +55, Lima +54, Abidjan +53, Lagos +47, Bobo-Dioulasso +46, Foshan +46). Right: top ten declining cities (all near complete depopulation).

The pattern, in plain English

The winners cluster in two groups. The first is secondary Chinese cities (Yunfu, Puyang, Shantou, Shijiazhuang, Foshan) that sit close enough to the megahubs (Beijing, Shanghai, Guangzhou) to benefit from agglomeration spillover, but are not so close that the megahubs swallow them. The second is large emerging market capitals (Lucknow, Lima, Abidjan, Lagos) that have strong local economies and are the natural destination for movers within their region once friction drops.

The losers are smaller cities that sit in the shadow of a much bigger nearby city. Bahawalpur is near Lahore. Bhubaneswar is near Kolkata. Jaffna is in the orbit of Colombo. Athlone is squeezed between Dublin and Galway. When friction drops, the softmax migration choice for residents of these small cities flips hard toward the nearby giant, and they bleed population for the rest of the run.

This is exactly the agglomeration versus regional balance tradeoff that economic geographers have been writing about for a hundred years. The model is reproducing the classic finding: in a frictionless world, secondary cities consolidate into a smaller number of larger ones, and the cities that lose out are not the smallest cities, but the ones closest to a winner.

What all of this actually means for the real world

Reading the findings as a research artifact is one thing. Reading them as a guide to thinking about real policy and real urbanization over the next thirty years is something else. This section is the longest in the report because, honestly, the findings raise more interesting questions than the findings themselves answer. Below are the big takeaways, the ones we think actually transfer outside the simulator.

Housing is the lever the discourse underestimates

Most popular conversations about urbanization and growth treat housing policy as a downstream problem: cities get successful, housing costs rise, we then debate what to do about it. The simulator suggests something stranger and more provocative. Housing elasticity might be the single most important upstream lever in the entire urban system. Not borders. Not infrastructure budget. Not migration propensity. Just how easily housing supply responds to demand.

The reason this matters is that in the real world, housing elasticity is largely a policy choice. It is set by zoning regulations, building codes, height limits, parking requirements, environmental review processes, and historic preservation rules. Different countries and different cities have wildly different effective elasticities. Tokyo has loose zoning and high elasticity. San Francisco has the opposite. The simulator suggests that the answer to “why is one of these cities thriving and the other one is exporting its tech industry?” is not really about taxes or talent. It is about whether housing supply can keep up with demand.

Now flip that. The simulator also suggests that restricting housing supply in the most productive cities, like San Francisco does, might actually produce higher aggregate growth, because it forces growth out to Austin, Boise, Salt Lake City, Miami, Denver, and so on. That is exactly the pattern the United States has experienced over the past fifteen years. The simulator predicts it as a logical consequence of the agglomeration plus housing dynamic, without being told about any of those cities specifically.

The honest implication is awkward: the cities with the worst housing policies may be inadvertently doing the rest of the country a favor, by acting as innovation seedlings whose graduates have to relocate to afford a life. That is not a great policy to be running on purpose, because it imposes enormous local costs on the people who live in the constrained cities. But it does explain why aggregate growth has held up better than the housing data alone would suggest.

Open borders is a distributive lever, not a growth lever

Almost every popular argument for or against open borders treats them as a question of total economic output. Pro side: more people, more productive matching, higher GDP. Con side: pressure on services, wage suppression, lower per capita GDP. The simulator suggests both sides are talking about the wrong thing.

In the simulator, Open Borders did not produce particularly high GDP growth. It produced the lowest growth of any scenario in the study. What it produced instead was a flatter distribution: less concentration in megahubs, more activity in secondary cities, and (as we saw in the deep dive) huge individual swings as small cities consolidate into larger ones. That is not a growth story. That is a distribution story.

The real world implication is that the open borders debate should probably be reframed. The question is not “will this make us richer?” The question is “where will the prosperity end up, and which places will hollow out?” The simulator says the answer to that question is mostly determined by geography (which small cities are next to which big cities) and by housing policy (whether the receiving big cities can absorb growth). Border policy mostly just opens or closes the valve. It does not determine the destination of the flow.

The agglomeration loop is the most powerful force in the model

Every interesting dynamic in the simulator, when you trace it back, depends on the agglomeration feedback loop: bigger cities are more attractive, attract more people, become more productive, become even more attractive. This is not a controversial claim in academic economic geography. Edward Glaeser has spent thirty years writing about it. But the simulator makes the force visible in a way that prose arguments cannot. You can literally watch the rich cities pull harder over the course of a run.

The implication is that any policy that ignores or fights agglomeration is going to underperform. “Spread investment evenly across regions” sounds fair but is, in the model, a recipe for slower aggregate growth. The optimizer figured this out on its own. Its winning blend includes three times default agglomeration strength. It wanted big cities to pull harder, not less hard. Combined with inelastic housing in those big cities to redirect the spillover, this is the closest thing to a real model output that says “the policy mix many countries are running today is wrong in a specific identifiable way.”

Infrastructure planning, in a smart world, is about optionality more than about building

The single weirdest result in the optimizer output is that the winning policy spent only 3.1 percent of its infrastructure budget. The natural reaction is “then why have such a big budget?” The model's implicit answer is “because optionality is valuable when shocks hit.”

In real terms, this maps onto a fairly well known principle in corporate finance and emergency management: you do not want to spend down your reserves in normal times, because the value of the reserves comes from being able to deploy them quickly when something breaks. Public infrastructure has the same property. A country with a large, mostly unused capacity to build emergency rail or power links is more resilient than a country that has already built every link it could afford. The simulator's optimizer reinvented this principle from first principles, which is mildly remarkable given that nobody told it about financial reserves or option value.

The implication for actual infrastructure policy is that big visible projects are not necessarily better than maintained capacity. Spending the budget all at once might score well on next year's PR but it is, in the model's terms, throwing away your shock absorber.

Aggregates can lie. Always look at the city level.

Finding number seven (top winners and top losers per scenario) was the most uncomfortable one to write up, because it shows that every scenario, including the “stable” ones, contained at least one city losing more than ninety percent of its population. The world's total GDP went up in all of them. The total population was nearly the same. But underneath, individual cities were completely transformed.

The real world version of this is the question of who pays the cost of any policy. A policy that raises aggregate GDP while wiping out half a dozen mid sized cities is not the same as a policy that raises aggregate GDP while preserving them, even if the GDP numbers are identical. The simulator's contribution to this debate is that you can now see which cities the policy is wiping out. Bahawalpur, Athlone, Bhubaneswar, Jaffna. They are named. They are on the map. They were not just data points. Whether the policy is worth the cost is a value judgment, but the cost itself is no longer abstract.

What the model is missing, and why that matters

Honesty time. The simulator has real strengths in demographic and migration dynamics, agglomeration feedback, and the budget logic for infrastructure. It has real weaknesses too. Here is what is notably absent and what each absence means for how you should read the findings.

Climate is not modeled. Cities do not get more or less attractive based on heat, sea level rise, drought, or natural disasters. In the real world, this is going to be one of the dominant migration forces over the next thirty years. The model's predictions about who wins and who loses should be read as “absent climate change.”
Remote work is not modeled. The migration utility function assumes you need to physically be in a place to work there. The growing share of fully remote knowledge work decouples wage from location in a way the model does not capture.
Political instability is not modeled. Cities cannot become unsafe, lose their institutions, or have their governments collapse. The model assumes a globally stable institutional environment, which is generous.
Demographic transition is not modeled. The model does not include birth rates, death rates, or aging. In a real run over centuries, fertility collapse in East Asia and Europe would dwarf any of the migration effects we observed.
AI productivity shocks are not modeled. The IP and tech sectors grow at a smooth multiplier. If artificial intelligence dramatically raises the productivity of knowledge workers over the next decade, the agglomeration winners in this simulator (cities with high knowledge stock) would get a much bigger boost than the model currently shows.
Returns are not redistributed. There are no taxes, no transfer payments, no social safety nets. The model treats wages as flowing entirely to the people in the city that earned them. Adding redistribution would change which cities are attractive and how shocks propagate.

Each of these omissions is fixable in principle. None of them are trivial. The right way to think about the current findings is that they describe the dynamics of a world that has resolved every obvious uncertainty (climate is stable, work requires presence, governments function) and is just being asked the narrow question of how migration, infrastructure, and prices interact. That is a smaller question than “what will the world look like in 2050,” but it is a sharper question, and the model's answers to it are correspondingly more trustworthy.

If these dynamics are real, what should we actually expect?

This is the speculative section. The simulator is a model, not a forecast, and any prediction we derive from it is conditional on the model being approximately right about the underlying mechanisms. With that disclaimer fully internalized, here is what the next thirty years might look like if the forces the simulator describes really are at work, and how to tell whether they are.

Prediction 1: Secondary cities adjacent to constrained superstars will continue to outpace expectations

Austin, Boise, Salt Lake City, Nashville, Raleigh, Miami, Calgary, Lyon, Lisbon. These are not random. They are the secondary cities sitting in the gravitational shadow of a constrained superstar (San Francisco, Boston, New York, Toronto, Paris, Madrid). The model says they should keep gaining economic weight as long as the superstar cities continue to refuse to build housing. Watch their job growth, their housing prices, and their venture capital deal flow over the next ten years. If they keep accelerating relative to the national average, that is consistent with the model. If they slow down or reverse, that is evidence the model is missing something.

Prediction 2: Cities in the shadow of a megahub will hollow out faster than national averages suggest

The seed 42 run had Bahawalpur losing 95 percent to Lahore, Bhubaneswar losing 95 percent to Kolkata, Jaffna losing 95 percent to Colombo. These are extreme numbers and would not literally occur, but the directional signal is real. Secondary cities in developing countries that sit close to a fast growing primary city are likely to experience accelerating population loss, even as the national population grows. National statistics will hide this because the primary city's gain offsets the secondary city's loss. Watch the second tier cities in India, Pakistan, Nigeria, Indonesia, and Bangladesh. The ones near a megahub will likely empty out. The ones distant from a megahub will likely consolidate into their region's primary.

Prediction 3: The countries that fix housing will pull away on aggregate growth

The simulator's strongest finding is that low housing elasticity in superstar cities is, perversely, the highest aggregate growth policy. That is consistent with US data. But the cleaner test is the opposite case: a country that solves housing in its superstar cities. Japan is one of the only large countries with consistently elastic housing supply in its biggest cities. If our model is right, Japan should be one of the few countries where the megahub continues to absorb new growth rather than spilling it out. Watch Tokyo's relative share of Japanese GDP over the next decade. If it grows, that is consistent with the model. If it stalls, the model is missing a force that pulls growth out of megahubs even when housing is elastic.

Prediction 4: Optionality will look smarter and smarter as the world gets shockier

The optimizer's preference for an unspent infrastructure budget is a bet about volatility. The bet is: in a world where shocks (resource crunches, infrastructure failures, migration spikes) are common, the ability to deploy capital quickly is worth more than the capital being deployed today. The next ten years are likely to have more shocks per decade than the past ten did. If the optimizer's preference for optionality is right, the countries that hold dry powder (Norway's sovereign wealth fund, Singapore's reserve buffers, Switzerland's federal surplus tradition) should outperform the countries that aggressively front load their infrastructure spend. Watch how the Inflation Reduction Act money in the United States compares to more conservative infrastructure strategies in Northern Europe. If the conservative approach wins on resilience metrics, that supports the optimizer's logic.

Prediction 5: Concentration metrics will stay surprisingly stable, except where housing breaks them

Finding number five showed that top 10 city concentration converges to about 20 percent of world population almost regardless of macro regime. The only thing that breaks the band is housing policy. The real world prediction is that the share of the global population living in the top 10 megahubs should stay roughly constant over the next decade, with the exception of any city that runs a serious aggressive housing supply program (which the model predicts would let that city absorb a larger share). Most cities will not do that. Concentration should be a remarkably stable number to measure year to year.

Prediction 6: Open borders, if implemented, would surprise pro and con sides equally

Pro side would be surprised because GDP growth would not increase much. Con side would be surprised because the receiving cities would not be the obvious ones. The actual receivers would be secondary cities in the right geographic position, not the global superstars that everyone fears or hopes would absorb the migrants. The political coalitions formed around the issue today are arguing about a phenomenon that, if it occurred, would not look like what either side expects.

How to falsify any of this

The whole point of being explicit about predictions is to make them falsifiable. Each of the six predictions above has a clear way to be wrong. Secondary city growth slows. Megahub adjacent cities do not hollow out. Tokyo's share of Japanese GDP does not rise. Optionality buffer countries lose to big spenders. Concentration shifts dramatically. Open borders, where tried, produces the standard pro side or con side outcomes. If any of those things happen, the model has missed something real, and the next version needs to account for it. That is the contract you sign when you write something down. We are signing it.

Caveats, honestly

Every claim in this report should come with the following asterisks. The point of a research artifact is to be honest about what it can and cannot say.

Single seed. Every result here is from seed 42. The deterministic property is great for replayability, but a serious study would re-run with twenty seeds and report means and confidence intervals. The 1.15x GDP spread across scenarios may be larger than seed noise, but we have not formally proven it.
200 steps may be too short. Some scenarios' GDP curves had only just started to fan apart by timestep 200 (see the Housing Constrained breakaway in chart 1). A 500 step horizon would make the late stage divergence much clearer.
Identical migration totals appeared for Baseline, Network Fragmentation, and Resource Crunch, suggesting either that the migration system reaches steady state extremely fast, or that those scenarios do not actually differ in any parameter the migration calculation uses. Worth checking before drawing strong conclusions from any one of the three.
The optimizer was budget constrained. Four iterations of twelve trials. A real research run would use eight iterations of forty plus trials, plus multiple seeds, and would probably take eight hours instead of twenty minutes. The 3.41x result is suggestive, not conclusive.
The autonomous planner appears to ignore scenario regime (finding number four). We hypothesized this is because the scoring function reads only static city attributes. That is a hypothesis based on code reading, not on a controlled experiment.
The sector composition stability in finding number six is at least partially an artifact of the price indices controlling monetization rather than physical output. The model is currently labor and demography heavy, supply response light.
The predictions section is speculation. It explicitly extends model findings to real world expectations. Models are wrong in proportion to how much they extrapolate. Treat each prediction as a falsifiable claim, not a forecast.
This is a model. FlowWorld is a policy laboratory, not a forecast. The point is to probe which directions outcomes move when you twist the rules, not to predict actual city populations in 2050. Treat the magnitudes as ordinal, not cardinal.