Cosine similarity scores look fine while your RAG pipeline gives wrong answers. Four failure modes that produce confident, wrong outputs — and the retrieval stack that actually fixes them.