The app my agents built while I slept
The idea worth queuing
I wanted to find out whether an app could go from idea to working MVP while I was asleep. The candidate was myWarranties, a free, offline-first warranty and receipt tracker. Nothing exotic — just a sane place to store purchase dates, photos, and reminders without sending data anywhere. The stack was Expo SDK 56 and React Native, local SQLite, ML Kit OCR for receipts, and local notifications for expiry nudges.
The goal was not to avoid work. It was to move the mechanical work to the hours I wasn't touching the keyboard, so my morning could be review instead of typing.
The contract layer comes first
Before I went to bed I wrote the foundation directly to main. Domain types, the SQLite schema and migrations, the routing _layout gate, and purchaseService — add, edit, delete, reminders, and the image lifecycle. That is the part that carries decisions. Get it wrong and every branch built on top of it is wrong in the same direction.
I also wrote the integration glue and the test harness first. If the agents cannot import a stable contract, they invent one, and then you have three different contracts merged by accident. I wanted each overnight branch to start from the same truth.
Overnight waves
I fanned the build out in dependency-ordered waves, each dispatched to a Kimi worker in its own git worktree:
- Wave 1 — pure logic, fully tested.
- Wave 2 — DAOs, primitives, and services.
- Wave 3 — store and form state.
- Wave 4 — screens.
- Wave 5 — onboarding flow.
The rule was simple: every branch could only import what was already on main. That broke sibling dependencies and let the waves run wide without merge soup. The harness chained them across turns, so Wave 2 did not start until Wave 1 was reviewed and merged.
When I woke up, the numbers looked good: tsc 0 errors, lint 0, 26 of 26 Jest tests green. The repo compiled, the tests passed, and the screens rendered. On paper the MVP was done.
Morning review
Paper is not the app. The first thing I do after any overnight run is read the diff and exercise the actual behavior. This time it caught three problems the test suite could not:
- A form remount bug that lost state when the user navigated back into the add-flow.
- A missing notification permission check — the reminder code existed, but the prompt to allow it did not.
- An over-claim in onboarding that said storage was encrypted. It was not.
That third one is the failure mode to watch for: the agent wrote it confidently, the copy read well, and a user would have believed it. The code compiled, the tests passed, and the claim was still false. This is why "looks done" is not a shipping criterion.
I fixed the logic and the copy, merged the real MVP, and moved on.
The second overnight queue
The next evening I ran a different kind of overnight job: a design restyle plus Phase 2 features. I locked an editorial, data-forward direction in a design system, exported the screens as HTML rather than screenshots, and told the worker model to restyle components against the HTML refs only — no API changes allowed.
The queue worked. The restyle landed cleanly. Phase 2 added PDF claim export and local data export, and the test count grew from 26 to 35 green. The iOS native build passed on a headless Mac in the homelab, which meant the ML Kit OCR, camera, and notification pods actually compiled under Expo SDK 56 / React Native 0.85 / New Architecture. A simulator build is not a device build, but it is a real gate: expo start can lie; xcodebuild cannot.
By morning I also had decision-ready MVP specs queued up for the next two app ideas. The fleet had kept moving while I slept.
What the fleet got wrong
The mistakes were not random. They were the predictable kind:
- Green tests masked real bugs. Passing tests prove the tests pass, not that the product is correct.
- UI copy was confidently wrong. Agents are good at plausible text and bad at truth-checking it against the implementation.
- Permissions are easy to forget. If a feature needs user consent, the agent usually writes the happy path and skips the gate.
- Style without reference drifts. The first MVP was functional but plain because no design reference was attached to the task.
None of these are model failures. They are workflow failures, and the fix is in the process, not the prompt.
Practical takeaways
- Write the contract and glue yourself. Fan out only the leaves. Decisions are not parallelizable.
- Order branches by dependency. Each branch should only see merged code, not sibling work in progress.
- Treat green tests as a floor, not a ceiling. Read the diff, run the app, and check the copy.
- Give UI tasks an actionable reference. Screenshots are useless to a visionless model; exported HTML or design tokens are not.
- Separate restyle from API changes. Mixing them turns review into archaeology.
- Run a real native build before calling an Expo app done. The JavaScript bundle is only half the product.
Agents can build while you sleep. The throughput is real. But the morning after is not optional — that is when you separate a demo from something you would actually ship. I am still the only supervisor, and that is the point.