A practical launch gate for turning an AI prototype into a product someone can depend on after the first users arrive.
The prototype is most dangerous right after the demo works. Everyone can see the magic. Nobody has yet watched the model time out, retry a payment twice, expose another user's row, or spend a week's API budget in one background job.
That is the point where builders usually add features. The better move is colder: pause, pick the one workflow that would create the most trust damage if it failed, and run it through a production gate. Not a platform. A gate. OpenAI's rate-limit guidance points to bounded retry and backoff. OpenAI's safety guidance points to adversarial testing and human review on risky paths. LangSmith separates offline evals before release from online checks on live traces. Supabase says RLS must be enabled on exposed tables. Those are not enterprise rituals. They are the minimum answers a small AI app needs before users bring real data.
I would keep this article as the front door for the pivot because it sets the editorial contract: the site is not about prompting tricks, tool lists, or celebrating speed. It is about the work that starts when the first user depends on the output. The rest of the corpus can point back to this gate.
The gate exists to stop launches with vague answers, not to make the team feel mature.
A useful gate has one job: it turns hidden uncertainty into named blockers. The question is not whether the application feels polished. The question is whether the builder can explain what happens when the model is slow, wrong, unavailable, over budget, or unsafe. If the answer is a shrug, the product is not ready for real users.
The gate should be run against one workflow at a time. Pick signup, checkout, document generation, import, onboarding, or the agent action that mutates customer data. Draw the path from user action to model call to tool call to storage write to user-visible result. Then ask the same questions at each boundary: what can fail, who sees it, what gets retried, what gets logged, what can be rolled back, and what data is exposed.
The small-builder trap is treating this as paperwork. It is not. The gate produces tests. A missing timeout becomes an integration test around a fake slow provider. A missing data boundary becomes a second-user RLS test. A missing prompt guard becomes an eval case. A missing cost answer becomes per-workflow usage logging. The artifact is not a checklist screenshot. The artifact is a set of checks that fail when the app regresses.
A launch gate run against the riskiest workflow catches more than a generic review of the whole app.
Ten realistic cases are enough to expose prompt drift, broken retrieval, and obvious output-format failures.
Publishable keys can sit in the client; secret or service-role credentials cannot.
| Failure mode | Minimum check | Launch answer |
|---|---|---|
| Slow or rate-limited model call | Timeout, random exponential backoff, retry cap, trace ID | User sees a recoverable state, not a spinner forever |
| Prompt or model regression | Golden cases run before deploy | The release is blocked when expected behavior drops |
| Unsafe or malformed output | Output validation plus human review on high-impact paths | The app rejects unsafe action instead of formatting it nicely |
| Exposed customer data | RLS, server-only secrets, second-user access test | A non-owner cannot read or mutate another user's records |
| Runaway spend | Per-workflow token and retry logs | The owner can name the expensive path before the invoice arrives |
The first misses should be ranked by blast radius, not by convenience.
The first run of this gate will produce more misses than a solo builder wants to see. That is normal. The useful move is to rank misses by user harm, not by how easy they are to patch. Data exposure and non-idempotent actions sit at the top because a bad retry can create irreversible damage. A wrong answer with a clear correction path is lower. A missing animation is not on the same list.
Security boundaries usually outrank model quality. A mediocre answer is embarrassing. A leaked table is a breach. Supabase's current RLS guidance is blunt: RLS must be enabled on tables in an exposed schema, and client access should use a publishable key governed by policies. The same pattern applies outside Supabase. Browser code can hold public credentials. Server code holds secrets. The model never receives data it does not need for the task.
After data boundaries, inspect actions that mutate state. Retrying a read is annoying when it fails. Retrying a charge, email send, reservation, or account deletion can create real damage. The launch gate should force non-idempotent operations to name their idempotency key, rollback path, and failure copy. If the workflow cannot answer those questions, it should not ship.
Add another feature because the demo worked once
Trust the model because the last answer looked right
Treat auth as solved because signup works for the owner
Look at total monthly API spend after the invoice
Block launch until the riskiest workflow has failure states
Run a tiny eval before changing prompts, models, or retrieval
Test data access with a second user and least-privilege credentials
Track token, cache, retry, and batch behavior per workflow
Primary user workflow has one browser-tested happy path.
Every model and tool call has a timeout, retry cap, and user-visible failure state.
Retries cannot repeat non-idempotent actions without an idempotency key.
A first eval dataset covers at least ten real or realistic cases.
Prompt, model, and tool versions are recorded with important outputs.
Secrets stay server-side; service-role credentials never reach the browser.
Database access rules are tested with a non-owner user.
Token, cache, and retry costs are visible per workflow.
There is one rollback path for prompt, model, and deployment changes.
Use the path that would embarrass you most if it broke: signup, checkout, import, document generation, or an agent action that writes data.
For each row, prove the check exists or mark it as a blocker. A missing answer is not a future feature when users arrive today.
Add one eval, one boundary test, and one browser workflow before adding another feature. The point is a small net that keeps catching regressions.
It is a launch floor for small teams, not a substitute for deeper operations.
This gate is deliberately small. It does not replace incident management, security review, data governance, or model-risk work for regulated systems. It gives a builder the first defensible line between a demo and a product. That distinction matters because the earliest users do not care that the app was built quickly. They care whether their work disappears, leaks, doubles, or costs them money.
The gate also gives the rest of the team a shared vocabulary. Reliability owns failure states and recovery. Evals own behavioral regressions. Security owns data boundaries and unsafe actions. Cost owns usage patterns. In a one-person project those owners may be the same human, but the categories still matter. They stop the builder from hiding every problem under the label 'AI quality'.
The non-obvious benefit is editorial. Once this gate exists, every article in the corpus can answer the same reader question: which production risk does this reduce, and what evidence proves it? That is why this article should stay. It is the map for the pivot.
Is this checklist only for Lovable or other vibe-coding tools?
No. The same checks apply to any AI-assisted prototype that calls a model, touches user data, or performs an action for a user. Tool choice changes the implementation details, not the production risks.
Do I need an eval platform before launch?
No. You need a small set of cases and a repeatable way to run them. A platform becomes useful when you have more traces, more collaborators, and enough releases that comparison history matters.
What should block launch immediately?
Data exposure, browser-side secrets, unbounded retries, non-idempotent writes, no visible failure state for the main workflow, and no rollback path for prompt or deployment changes.