This one started with Andrej Karpathy on the No Priors podcast, talking about auto research and distributed workers.
I was listening to Andrej Karpathy talk about auto research and distributed workers when something clicked that I had not quite put together before. He was describing a setup where most of the exploratory work happens on machines no one centrally controls, while a smaller trusted layer just checks whether the results are real. The interview was not mainly about energy or infrastructure. But the logic he sketched feels relevant right now, when a lot of the public and policy controversy about AI scaling has turned to data centres, electricity demand, water for cooling, and land use.
The path we’re on
The dominant path at the moment is clear enough. Frontier progress still runs through very large, very concentrated training runs on dedicated clusters. That model has delivered a lot, and the scaling laws have held on longer than many expected. More of the recent gains have come from inference, models that think for longer at the point of use, which is itself a more search-like kind of work, but the heavy training is still centralised. It also concentrates demand in a relatively small number of sites and at a relatively small number of operators. When people raise concerns about grid strain, water stress in certain regions, or the sheer pace of new build required, they are pointing at real physical constraints, not just abstract worries.
Every increment of frontier progress currently has to be backed by more of this. Photo: Carl Lender, CC BY 2.0.
Search is the expensive part
Karpathy’s observation starts from a different place. In many research tasks, especially the kind of automated experimentation he calls auto research, the expensive part is searching. You try thousands of variations. Most of them do not work. The cheap part, relatively speaking, is checking whether one particular variation actually improves the thing you care about. You run it once under controlled conditions and measure the outcome. That asymmetry is the existence proof he reaches for in projects like Folding@Home and SETI@Home. The analogy is not perfect, those projects lean as much on running each task across several machines and comparing notes as on any neat split between searching and checking, but the shape is right. The hard work can be distributed across many machines, while the verification stays manageable and runs on the trusted compute.
He draws a loose analogy to a blockchain. Instead of blocks, you have code commits or experimental changes. They can build on each other. The “proof of work” is the actual experimentation someone had to do to produce a promising commit. The verification step is what keeps the system honest. An untrusted pool of contributors on the open internet can propose improvements. A trusted verification layer accepts or rejects them. The whole thing stays asynchronous and, if the security engineering is done right, reasonably safe.
Why this matters for energy and water
The sustainability angle that interests me is straightforward. If a meaningful fraction of the search work can move onto hardware that already exists, and is already powered and cooled somewhere in the world, then the marginal need to build new hyperscale capacity dedicated purely to exploration drops.
I want to be careful about what I am and am not claiming here. I am not claiming the distributed version uses less energy in total. A consumer machine pushed to full load is not really idle, it is drawing hundreds of extra watts, and the silicon in it is usually less efficient per unit of useful work than a purpose-built data centre accelerator. It’s the same thing we do with the time-shifted model at BrewAI, where much of the saving comes from when the energy-hungry work runs rather than where, moving flexible load into the cleanest and cheapest windows on the grid. So I am not promising less energy overall. What this buys you is fewer new data centres: getting more research out of hardware the world has already manufactured and is already running, ideally on cleaner grids, instead of backing every increment with another dedicated site and the embodied carbon of all the kit inside it. You are not eliminating the requirement for trusted compute. You are changing the ratio. Verification runs are narrower and more predictable than open-ended frontier training. They can often run on smaller clusters, or even on hardware that would otherwise sit idle.
This also opens a route to more targeted progress, rather than relying on one ever-larger general model that has to serve every possible use case. Karpathy noted in the same interview that the labs are currently chasing a single monoculture of a model, arbitrarily capable across every domain, and that he expects a later speciation into more specialised systems. If different groups, or even different loose collectives, can run specialised research tracks on hardware they already control or can access locally, you get a form of that speciation without every track needing its own dedicated multi-hundred-megawatt site.
This is already being built
I should be honest that this is no longer purely theoretical. Some of it is already being built. Prime Intellect has trained models across a permissionless, globally distributed pool of contributors, with a separate trusted layer verifying the work, which is close to the exact architecture Karpathy describes. Others, including Nous Research, Gensyn, and swarm-style systems like Petals, are pushing on the same problem from different angles, and federated learning has been training on hardware no single party controls for years. The early signal from these efforts is as sobering as it is encouraging. Distributed training still lags the centralised frontier by a wide margin on raw efficiency, perhaps a thousandfold on some estimates, mostly because moving data between far-flung machines is slow and costly. That gap is exactly the thing an honest sustainability case has to reckon with, rather than wave away.
So the interesting question is not whether the idea is possible. It is under what conditions the distributed route actually saves anything worth saving. Several things would have to be true.
- The verification layer itself has to stay modest in its own resource demands. If checking a commit requires almost as much compute as generating it, the advantage shrinks. This really only works for objectives where there is a genuinely cheap and trusted way to check the answer: a fixed evaluation, a reward function, a test that either passes or fails, rather than a claim that can only be confirmed by repeating the whole expensive run.
- The security and sandboxing problem has to be solved well enough that organisations are willing to accept results from untrusted submitters. Running arbitrary code from the internet is obviously risky, and at research scale the risk includes contributors gaming the verification itself, not just crashing a machine. The engineering exists in other domains. It is not trivial here, but it is not impossible either.
- The incentive and coordination layer has to work. Today the reward in these systems is mostly reputation and leaderboard position. That may be enough for some contributors. For sustained effort on harder problems, clearer mechanisms would probably be needed. Projects or companies could fund specific tracks the way they currently fund grants or internal research. Contributors would bring compute rather than, or in addition to, cash.
- And most importantly for the falsifiability point, we would need a way to measure the whole claim, which is worth spelling out on its own.
How you would test it
The cleanest version of the test is not to ask which regime reaches some fixed level of improvement, because the two routes may well arrive at different places by different paths. Better to fix the task and the compute budget. Take one well-defined, search-heavy objective with a cheap and trusted way to score it, and run it both ways: once on an optimised central cluster, once with the search phase pushed out to distributed hardware. Then measure the energy, the cooling water, and the carbon embodied in reaching the same validated score, deciding in advance whether you are counting only the marginal draw or the full embodied cost of the hardware, because that choice can flip the result. Carbon, water, and raw energy also pull apart depending on where the machines sit, so you have to say which one you are optimising for. My own prediction, for what it is worth, is that the distributed route loses on total energy and only comes out ahead under a marginal-cost, clean-grid accounting, which is a far narrower and more honest claim than the headline. The comparison is not trivial to run cleanly, but it is not impossible either. Pilot projects on narrower problems could give early proof of concept.
A complementary path, not a replacement
I am not claiming this replaces the current model. Frontier training runs will still need serious clusters for the foreseeable future. What Karpathy’s framing suggests is a complementary path that could reduce the pressure on new dedicated capacity for the parts of research that are mostly search. It also aligns with the broader pattern we have seen in other domains where checking is cheap relative to discovery: the work spreads, the trusted core stays smaller, and the physical footprint changes.
The current concerns about data centre growth are not going away on their own. They reflect real limits on power, grid connections, water, and land, and in many places the queue to connect new load now runs well ahead of what the grid can absorb in the near term. Any approach that lets useful research progress happen without requiring every increment to be backed by another hyperscale build is worth serious engineering attention. The interview made me think the technical ingredients for one such approach already exist, in outline and increasingly in practice. Turning the outline into something measurable is the next step.