The Efficiency gap in hardware driving biology and AI

Why this question, now

AI hardware is having a moment. Hyperscaler capex on AI data centres is on track to clear $690 billion in 2026, and private equity has followed in scale — Blackstone alone reports a $55B+ data-centre portfolio with another $70B in the pipeline. Almost all of that money is being spent on the architecture we already have: more GPUs, more accelerators, more cooling, more power. The pitch is straightforward — train and serve existing AI algorithms more efficiently, and unit costs come down.

On the venture side, huge bets are being made into hardware AI companies as well. In December 2025, Unconventional AI raised a $475M seed at a $4.5B valuation, led by a16z and Lightspeed with Sequoia, Lux, DCVC, and Bezos joining. The pitch is not to make GPUs cheaper or efficient, but to build a computational substrate that runs neural networks directly on the nonlinear physics of silicon, taking inspiration from biology, on the bet that the hardware biology runs on is far more efficient than AI. They are not alone — several neuromorphic startups (Rain, Innatera, SpiNNcloud, BrainChip, and others) are working on variants of this pitch.

This thesis turns on an empirical claim: that biology-inspired hardware can do things that conventional hardware structurally cannot, not just things that conventional hardware does more cheaply. The strong form of the claim is that closing the gap between current AI and human intelligence might require closing the gap between silicon and biology —. The weaker form is that biology has a cost-of-compute advantage so dramatic that the substrate matters even for tasks we already know how to do.

Both forms depend on the same number: how big is the gap, exactly? The literature is full of order-of-magnitude estimates — "the brain runs on 20 watts, GPT-4 takes a city" — that compare two systems doing two different things at two different scales. They are evocative but not falsifiable. The cleaner question is what happens when we hold the algorithm constant and just vary the substrate. That number is what determines whether biology-inspired hardware is a contrarian bet or a structural one.

The rest of this post computes that number for one extremely well-characterised algorithm.

Introduction

Modern AI is the product of two compounding revolutions: better algorithms (transformers, RLHF, scaling laws) and better hardware (GPUs, TPUs, custom accelerators). It is rarely clear which deserves more credit on any given metric. The question matters because it tells you where the next gains will come from.

The cleanest way to disentangle the two is to hold one constant and vary the other. We can hold hardware constant and compare algorithms — that is what most ML benchmarking already does. The harder direction is to hold software constant and compare hardware. Specifically: take a fixed algorithm, run it on biological hardware and on silicon hardware, and measure how much energy each substrate spends to execute one instance of that algorithm.

The roadblock is finding an algorithm. The biological process must be known with enough mechanistic clarity that we can transcribe it directly into code, and it must have a discrete, countable unit of computation so that "one execution" maps 1:1 between biology and silicon. No continuum approximations, no ensemble averages.

This post walks through one such algorithm — a single cycle of the Na⁺/K⁺-ATPase pump — and computes the energy cost on both sides.

The Algorithm: One Cycle of the Na⁺/K⁺-ATPase Pump

The obvious first candidate is the Hodgkin-Huxley action potential — verified for seventy years, atomic-resolution channel structures, every neuroscience student writes a Python implementation. It fails our test. Hodgkin-Huxley is a continuum description: currents and capacitance are per unit area, and a single integration produces the trajectory for an isopotential patch of any size. The biological axon must commit to a physical area; energy scales linearly with it. There is no canonical "one execution." The same problem rules out anything described by a partial differential equation or a mean-field approximation — the ensemble averaging that makes those models accurate is what destroys the 1:1 mapping we need.

The right candidate is a process where biology itself executes a discrete number of well-defined operations. The Na⁺/K⁺-ATPase fits. It is the membrane protein that maintains the Na⁺ and K⁺ gradients in every animal cell, and one cycle hydrolyzes one ATP, exports three Na⁺ ions, and imports two K⁺ ions. The mechanism — the Post-Albers cycle — is among the best-characterized in biochemistry: atomic-resolution structures of multiple intermediate states, single-molecule kinetics, and a complete kinetic scheme with measured rate constants. Heyse, Wuddel, Apell & Stürmer (1994) determined the rate constants from rabbit kidney enzyme at 20 °C, publishing a complete table sufficient to simulate the entire pump in code. We use the linear forward branch of their scheme — eleven states, eleven transitions — under standard physiological substrate concentrations.

The states, in order around the cycle:

E1 — empty, cytoplasm-facing
Na₃·E1 — three Na⁺ bound
Na₃·E1·ATP — ATP bound (high-affinity site)
(Na₃)E1-P — phosphorylated, Na⁺ occluded
P-E2(Na₂) — conformation flipped, first Na⁺ released
P-E2(Na) — second Na⁺ released
P-E2 — third Na⁺ released
P-E2(K) — first K⁺ bound
P-E2(K₂) — second K⁺ bound
E2(K₂) — dephosphorylated, K⁺ occluded
ATP·E2(K₂) — ATP bound (low-affinity site) → returns to state 1

Each transition has a forward and backward rate constant. The rate-limiting steps at 20 °C are the two conformational flips: state 4 → 5 (k_f = 22 s⁻¹) and state 11 → 1 (k_f = 22 s⁻¹). Everything else is at least an order of magnitude faster.

The Code

Biology executes this cycle stochastically: a single protein molecule, under thermal noise, hops between conformational states with rates set by the chemistry. The faithful in-silico equivalent is a Gillespie simulation — sample the time to the next transition from an exponential distribution, sample which transition fires from the rate ratios, and step until the molecule has completed one full cycle.

import time
import numpy as np

# Forward (kf) and backward (kb) rates in s^-1, at 20 C, physiological substrates.
TRANSITIONS = [
    (2.0e3,  8.0e2),    #  0  E1            -> Na3.E1
    (7.5e4,  1.64),     #  1  Na3.E1        -> Na3.E1.ATP
    (2.0e2,  18.5),     #  2  Na3.E1.ATP    -> (Na3)E1-P
    (22.0,   25.2),     #  3  (Na3)E1-P     -> P-E2(Na2)        [rate-limiting]
    (5.0e3,  375.2),    #  4  P-E2(Na2)     -> P-E2(Na)
    (1.0e5,  1.4e5),    #  5  P-E2(Na)      -> P-E2
    (1.7e2,  10.0),     #  6  P-E2          -> P-E2(K)
    (2.5e4,  2.0e3),    #  7  P-E2(K)       -> P-E2(K2)
    (1.0e3,  5.0e3),    #  8  P-E2(K2)      -> E2(K2)
    (2.5e3,  4.0),      #  9  E2(K2)        -> ATP.E2(K2)
    (22.0,   400.0),    # 10  ATP.E2(K2)    -> E1                [rate-limiting]
]
N  = len(TRANSITIONS)
KF = np.array([t[0] for t in TRANSITIONS])
KB = np.array([t[1] for t in TRANSITIONS])

def simulate_one_cycle(rng):
    state, t, net = 0, 0.0, 0
    while net < N:
        # At state s: forward rate = KF[s] (transition s -> s+1).
        # Backward rate = KB[s-1] (the reverse of transition s-1 -> s,
        # which is the one that LANDS on s). Note the offset.
        kf = KF[state]
        kb = KB[(state - 1) % N]
        k_total = kf + kb
        t += -np.log(rng.random()) / k_total
        if rng.random() < kf / k_total:
            state = (state + 1) % N
            net  += 1
        else:
            state = (state - 1) % N
            net  -= 1
    return t

rng       = np.random.default_rng(42)
n_trials  = 100_000
wall_t0   = time.perf_counter()
times     = np.array([simulate_one_cycle(rng) for _ in range(n_trials)])
wall_dt   = time.perf_counter() - wall_t0

bio_J_per_cycle = 50e3 / 6.022e23                       # 1 ATP at 50 kJ/mol
si_J_per_cycle  = (wall_dt / n_trials) * 15.0           # 15 W active CPU power

print(f"Mean simulated cycle time (biological):   {times.mean()*1e3:.2f} ms")
print(f"Wall-clock per simulated cycle (silicon): {wall_dt/n_trials*1e6:.2f} us")
print(f"Biology:  {bio_J_per_cycle:.2e} J / cycle")
print(f"Silicon:  {si_J_per_cycle:.2e} J / cycle")
print(f"Ratio:    {si_J_per_cycle/bio_J_per_cycle:.2e}")

The simulation runs 100,000 independent cycles in about 90 seconds and reports four numbers: the mean simulated cycle time (the molecule's own clock), the wall-clock time per simulated cycle (the silicon clock), the biological energy per cycle, and the silicon energy per cycle.

Cross-checking against the master equation

Stochastic simulations are easy to subtly miswrite, so before trusting the numbers I cross-checked against the analytical steady-state solution of the master equation. For a unicyclic chain the cycle flux J can be solved exactly by parametrising P_i = A_i − B_i·J, walking around the cycle, and imposing P_N = P₀:

def steady_state_flux(kf, kb):
    n = len(kf)
    A = np.empty(n + 1); B = np.empty(n + 1)
    A[0], B[0] = 1.0, 0.0
    for i in range(n):
        A[i+1] = (kf[i] / kb[i]) * A[i]
        B[i+1] = (kf[i] / kb[i]) * B[i] + 1.0 / kb[i]
    J_unnorm = (A[n] - 1.0) / B[n]                 # cyclicity P_n = P_0 = 1
    P_unnorm = A[:n] - B[:n] * J_unnorm
    return J_unnorm / P_unnorm.sum()               # normalize to one molecule

Analytical answer: J = 7.69 s⁻¹. The Gillespie simulation lands at 7.67 s⁻¹ — agreement to three significant figures, which is what convergence at 100,000 trials is expected to give. The simulation is sound.

The Energy Ledger

Biology

One full cycle hydrolyzes exactly one ATP. Under cellular conditions the free energy of ATP hydrolysis is ΔG ≈ −50 kJ/mol. Dividing by Avogadro's number:

E_bio = (50 × 10^3 J/mol) / (6.02 × 10^23 /mol)
      ≈ 8.3 × 10^−20 J per cycle

That is the entire energy bill for one execution of the algorithm in biology — one molecule, one ATP, one trip around the cycle. It does not depend on how long the cycle took.

Silicon

The energy cost of one simulated cycle is wall-clock time × CPU active power. On the laptop I ran this on (Apple M-series), the simulation makes ~1,700 stochastic transitions per cycle — many states sit near equilibrium and the molecule oscillates a lot before making net forward progress — and the wall-clock time per simulated cycle comes out to ~880 microseconds. Active CPU package power on a single saturated core is in the 10–20 W range (measure your own with powermetrics while the simulation runs); 15 W is a reasonable midpoint. That gives:

E_si = (8.8 × 10^−4 s) × (15 W)
     ≈ 1.3 × 10^−2 J per cycle

Ratio

E_si / E_bio ≈ 1.3 × 10^−2 / 8.3 × 10^−20
             ≈ 1.6 × 10^17

Seventeen orders of magnitude.

What This Number Means

The first reaction is to assume something is wrong. Seventeen orders of magnitude is not a number that turns up in normal engineering comparisons. But the calculation is doing what we asked: it is comparing the cost of one execution of the same algorithm on two different substrates.

The gap is real, and it has a structural cause. Biology executes the algorithm at the molecular level — one protein, ~10 nm across, ~10⁵ atoms, dissipating heat directly into the surrounding water. Silicon executes the algorithm by running an instruction stream on a chip with ~10¹⁰ transistors, of which an enormous fraction are switching, leaking, and burning power regardless of the size of the computation being performed. The minimum unit of silicon "doing anything" is the entire active core. The minimum unit of biology doing anything is the protein.

Said differently: biology has zero overhead per molecule. The ATP is the entire bill. Silicon has near-infinite overhead — a fixed cost of running the chip at all — and the algorithm is whatever fits inside that fixed cost.

This is the fundamental observation that motivates neuromorphic computing. If you can build hardware whose minimum active unit is small (a single neuron-equivalent circuit with sub-nanowatt idle power), you can recover orders of magnitude. Mead, Indiveri, Boahen, the Loihi and SpiNNaker programs — all of this work is, at heart, an attempt to close the 10¹⁷ gap.

Caveats

A few things to flag before reading too much into the exact figure.

The 50 kJ/mol for ATP hydrolysis is the cellular ΔG, which is more negative than the standard value (~30.5 kJ/mol) because the cytoplasmic ratio [ATP]/([ADP][Pi]) is far from equilibrium. Maintaining that disequilibrium has its own metabolic cost — mitochondrial respiration, oxygen delivery, blood flow. Strictly, biology's "true" cost per cycle is higher than 8.3 × 10⁻²⁰ J. Including everything, the gap shrinks, but only by about an order of magnitude.

On the silicon side, 15 W is the active package power of one saturated core; it ignores RAM, motherboard, PSU losses, and cooling. Including everything, silicon's cost grows. The two corrections push in opposite directions and roughly cancel.

The 880 µs/cycle is unoptimised CPython doing a Python-level Gillespie loop — a vectorised NumPy implementation, or a Numba/Cython compile of the inner loop, would be 10–100× faster. Even at the best plausible silicon implementation, the gap stays above 10¹⁵. The point of the comparison is structural, not micro-benchmark.

The simulation uses the linear forward Post-Albers chain rather than Heyse's full 15-state scheme with its cardiotonic-steroid-inhibited branches and uncoupled flux modes. Those branches carry no significant flux under physiological conditions, and the linear-chain turnover at 20 °C (7.7 s⁻¹) extrapolates by Q₁₀ ≈ 3 onto Heyse's 37 °C measurement of 60–85 s⁻¹, so the simplification is not load-bearing for the energy ledger.

The conclusion is invariant under these adjustments: the gap is seventeen orders of magnitude, give or take one.

What's Missing

This calculation answers a clean question — energy to perform the algorithm once — but it leaves the bigger comparisons untouched.

Throughput, not just cost

A biological cell runs ~10⁹ Na⁺/K⁺-ATPase pumps in parallel, each independently, with no scheduler. A CPU runs one. Per-cycle silicon is 10¹⁷× costlier; per-second-of-real-axon-time, the gap is far larger because biology is massively parallel and silicon is serial. The hardware comparison and the parallelism comparison are different questions, and only the first is settled here.

Generality

The Na⁺/K⁺-ATPase is a fixed-function machine. Silicon is general-purpose. Most of silicon's overhead pays for that generality — the same chip can run the pump, train a transformer, render a video. Biology only does what biology already evolved to do. The fair comparison is silicon-running-a-fixed-task vs biology-running-its-fixed-task; for any other task the biological side simply doesn't exist.

The 10¹⁷ figure is best read as an upper bound on what neuromorphic and biomolecular computing might recover, not as an indictment of silicon. It is the size of the prize, not the size of the problem.

What this means for the hardware bet

Back to where we started. Of the ~$690B going into AI infrastructure in 2026, the overwhelming majority is being deployed on the assumption that the existing digital substrate is the right one and the only question is how to scale it more cheaply. The Unconventional AI bet — and the broader neuromorphic bet — is that there is a whole quantitative axis that conventional hardware cannot recover, and that progress beyond a certain point may require silicon that operates more like a protein than like a clocked digital pipeline.

Whether that bet pays off is a question about engineering, not just physics. Seventeen orders of magnitude is the headroom; how much of it is reachable depends on what you are willing to give up — generality, programmability, the ability to run an arbitrary instruction stream. Biology gives all of those up. A useful chip probably cannot. The interesting design space is the middle: how much of biology's structural efficiency can you keep while preserving enough of silicon's generality to be worth fabricating.

The number sets the ceiling. The roadmap is the rest.

Code for this post: nakatpase.py. Run it on your laptop, measure your own CPU power, and the numbers will land in the same order of magnitude.