Mission Brief · 001

10× is not a constant.
It's a function of project size.

Your client built that module solo with Claude in an evening. Your team spent a quarter on the same scope. They're not slow — the math is different. Drag the slider and see what scale does to AI leverage.

9.2×
Boost at step 1 · MVP
4.4×
Boost at step 10 · scale-up
1.6×
Boost at step 30 · enterprise
Open the cockpit
Cockpit · interactive

Drag the slider. See the decay.

Same engineer. Same tools. Different codebase. The X-factor is not a marketing number — it is AI efficiency × Human factor, and both terms shrink as complexity grows. Tune the inputs to match your team's reality.

Medium50–250k LOC · 4–8 devs · step 10
4.4×
Realized productivity boost
74%
AI efficiency
model usefulness
×
60%
Human factor
coord. + review + integration
=
44%
Final
of theoretical 10×
Tune the model
Set once — slider applies as geometric sequence
%
%
AI = 97% × 97% × 97% × … × 97% = 73.7%
HF = 95% × 95% × 95% × … × 95% = 59.9%
Typical scope at this size
B2B SaaS product, fintech dashboard, e-commerce platform
Model by Łukasz Graliński · Pirxey
Geometric decay curve
Base factors multiplied step-by-step as project scale grows
scale: 0 → 10×
MicroSmallMediumLargeEnterprise

No assumptions. We measured the boost across 53 products we shipped over the last two years — from solo prototypes to multi-team platforms. Drag and see for yourself.

0×2×4×6×8×10×MICROSMALLMEDIUMLARGEENTERPRISE9.2× — greenfield MVP0.9× — legacy enterprise4.4×

01 · Method

Measured, not modeled

Pulled from 53 production codebases we built or co-built between 2024 and 2026 — greenfield MVPs, scale-ups, legacy rescues. Same toolchain, different terrain.

02 · Definition

Boost = shipped value

We track reviewed, merged, deployed work — not lines generated. Code thrown away, rewritten, or rolled back does not count.

03 · Mechanics

Why the curve bends

Below ~30k LOC AI compounds the engineer. Above ~150k LOC structural forces pull against it: duplicated logic, context rot, review load.

Solo vs Team · the gap

Why your client ships in an evening, and we ship in a quarter.

The honest answer to "why can't your team match what I did at home?" is not "they're worse than you". It is that you and the model worked on a fundamentally different problem. Six concrete differences — all measurable.

01
Solo vs Team

Empty repo vs. living codebase.

Solo The model sees 100% of the project — it fits in context. Every helper it writes is correct by definition: there is nothing to conflict with.
Team On a 250k LOC repo the model loads ~6–11% of relevant context. It rebuilds the validator that already exists 3 folders away. Volume goes up. Real progress doesn't.
Empty terrain is a feature of weekend projects, not a Claude capability.
02
Solo vs Team

No review burden. No reviewers.

Solo You ship straight to your own laptop. The review is "does it run?" — answered in seconds.
Team Every block has to be read, traced, security-reviewed, QA-tested. Faros AI: high-AI teams open 47% more PRs, but review time goes up 91% and throughput stays flat.
The bottleneck moves from typing to verifying.
03
Solo vs Team

You + Claude vs. 12 humans + Claude.

Solo Zero coordination tax. One head holds the spec. Decisions take seconds.
Team 4 devs + design + BA + PO + DM + QA + security. Half the gain is eaten by alignment meetings, async handoffs, and "wait, did you talk to X?"
Brooks's Law didn't go away. AI doesn't reduce the cost of agreement.
04
Solo vs Team

2 integrations vs. 47 integrations.

Solo You hit OpenAI and Stripe. Both well-documented. Both have official SDKs.
Team Production talks to 47 systems. Half have undocumented quirks. Some require a phone call to the vendor. The model can't see any of this from a prompt.
The work nobody saw on the demo is most of the work.
05
Solo vs Team

No compliance. No audit. No data migration.

Solo Demo data. No SOC2 audit. No GDPR review. No real PII. No "what happens to 12M existing rows?".
Team Real customer data. Audit trails. Regulator deadlines. Zero-downtime migrations. This work has nothing to do with how fast the model types.
Production has constraints that prompts can't see.
06
Solo vs Team

vercel deploy vs. blue-green across 3 envs.

Solo One command. One environment. If it breaks at 3am, you fix it at 3am. No SLA.
Team Dev → staging → prod, blue-green, feature flags, canary, rollback plan, observability, on-call rotation, 99.95% SLA. Each one is real engineering — not a prompt.
Shipping a prototype and operating a platform are different jobs.
Why · the six forces

What's actually shrinking the boost?

Six measurable forces — each backed by 2025–2026 research. Together they explain the curve you just dragged through.

01
+38% redundant logic

Silent duplication inflates the diff

On small projects there is little to duplicate, so AI looks clean. On large codebases it often cannot see what already exists — so it rebuilds the same helper, validator, or fetch wrapper three folders away. Output volume goes up. Real progress does not.

— Pirxey internal benchmark · 53 products · 2024–2026
02
2.4× longer fixes

Duplication compounds context rot

Duplication was always a tax. With AI it becomes a multiplier: two copies of the same logic mean two stale fragments in context, two reviewers tracing two call graphs, and two places where the next bug fix has to land.

— Pirxey post-mortems · 17 SaaS rescues
03
6–11% context coverage

AI does not know what it does not know

On real platforms the model loads fragments — a file here, a snippet there — and answers with full confidence. The output reads authoritative, but the relevant constraint often lived in a file the model never opened.

— Pirxey context-coverage audit · Q4 2025
04
Review grows 1.7× faster

More generated code means more review burden

Volume is cheap to produce and expensive to vet. Every generated block has to be read, traced, and understood before it ships — because one confidently wrong block in a critical path is enough. The bottleneck moves from typing to review.

— Pirxey PR telemetry · 2025
05
−2% to +4%

Why seniors on familiar code see ~0 speedup

Our working hypothesis: AI excels at isolated, low-context tasks — a new screen, a clean integration, a one-off script. It struggles where senior developers spend the hard hours: cross-module bugs and product logic that require holding the system in your head.

— Pirxey field study · 14 senior engineers
06
73% decay recoverable

AI is powerful — when the system is designed for it

The answer is not less AI. It is better boundaries. Close context inside isolated modules, even if only at the logic layer. Plan the architecture, brainstorm with AI, challenge its proposals, cut the bad ones, develop the good ones. Soft skills matter: if a developer cannot articulate the need clearly, they will not vibe-code a great product.

— Pirxey delivery playbook · 2024–2026
Evidence · receipts

The numbers — straight from the source.

We didn't make this curve up. Every percentage in our model maps to public, peer-reviewed or industry-scale research. Click through and read.

19%
AI made experienced devs slower, not faster.
METR's 2025 randomized controlled trial: 16 senior open-source developers, 246 real tasks, mature codebases (avg. 1M+ LOC, 22k+ stars). With AI tools allowed, tasks took 19% longer.
METR (2025)
39pt
Perception gap between felt speed and real speed.
Devs predicted AI would speed them up by 24%. After finishing, they still believed AI sped them up ~20%. Reality: 19% slower. Self-reports overstate AI value by ~39 percentage points.
arXiv 2507.09089
47%
More PRs in. Same throughput out.
Faros AI telemetry across 10k+ devs / 1,255 teams: high-AI teams open 47% more pull requests/day and PRs are 154% larger — but review time is up 91%, bugs up 9%, DORA throughput flat.
Faros AI — Lab vs Reality
0%
Senior devs on familiar code: no measurable speedup.
This matches our hypothesis: AI helps most when the task is isolated and low-context. On familiar, mature code senior developers already know the hidden constraints — and spend the extra time checking whether AI missed them.
MIT / Microsoft field evidence + Pirxey hypothesis
53
products measured, not assumed.
Pirxey's curve is based on two years of delivery telemetry across 53 products: shipped scope, PR review load, rework, rollback rate, duplicated logic, and realized throughput. We do not count generated code as value until it survives review and production.
Pirxey internal benchmark, 2024–2026
Soft skills became the hard skill.
The bottleneck has moved upstream: naming the constraint, describing the product intent, and knowing when to challenge the model. Great AI development is not letting the model run loose — it is designing the box it can safely be smart inside.
Pirxey delivery teams
Mission control standing by

Want to know your project's real X-factor before we start?

We'll size your codebase, score your team's review and integration capacity, and give you a Pirxey number for your specific mission. Free. No slide deck.

Pirxey · Aleja Grunwaldzka 472, 80-309 Gdańsk, Poland · 130+ engineers · 100+ missions delivered