Most production RAG systems we audit fail in one of five ways. None of them are obvious in a demo. All of them are fixable.
Anti-pattern 1: chunk and pray. A single recursive splitter on every document, 1000-token chunks, 200-token overlap. It works for blog posts. It collapses on contracts and tabular data. The fix: per-document-type chunking strategies. PDFs go through layout-aware extraction. Spreadsheets become structured rows. Code goes through AST-aware splitting.
Anti-pattern 2: dense-only retrieval. Embedding similarity is great for fuzzy matches and bad for exact matches. If a user asks for invoice number INV-2024-0042, dense retrieval will return semantically similar invoices, not the right one. The fix: hybrid retrieval. Combine BM25 (keyword) and dense (semantic) and re-rank with a cross-encoder.
Anti-pattern 3: citation as decoration. The UI shows links beneath the answer. Nobody verified that the answer was actually grounded in those documents. The fix: citation enforcement at generation time. The model is constrained to only assert claims it can attribute, and the orchestrator validates each citation before returning the response.
Anti-pattern 4: infinite context. Stuff 80 documents into the prompt and let the model figure it out. Latency triples, cost balloons, and quality drops because of mid-context attention degradation. The fix: tight retrieval and aggressive re-ranking. A good system returns 3–5 high-quality passages, not 50 mediocre ones.
Anti-pattern 5: silent re-indexing. The corpus changes daily but the index is rebuilt monthly. Users get stale answers and no warning. The fix: incremental indexing on document change events, with a freshness timestamp visible in the UI.
None of these are exotic. They are just discipline. The pattern that holds: most RAG systems would be 30% better with no model change, just better retrieval engineering.
