Most production deploys that break did not break because of bad code. They broke because of context the deployer could not see. A pre-deploy risk score replaces gut feel with six measurable signals and a HOLD/PROCEED/WATCH verdict the pipeline enforces.
Most enterprise AI lives between pilot and replacement. Five patterns for the 12-18 months it actually takes — strangler fig, sidecar, parallel run, dual-write, eval-based rollback — with the rollback signals that catch silent quality drift.