The honest baseline: RAG does not guarantee truth. It finds passages that look relevant and asks a model to summarize them. Accuracy depends on (1) what you ingested, (2) whether retrieval found the right passages, and (3) whether the model stayed faithful to those passages.
Things to try over time:
- Citations first. Always show which excerpts were used. If the answer cannot point to a source, treat it as suspect.
- Retrieve before you answer. The model should only speak from retrieved text, not from general world knowledge (your system prompt already pushes this way).
- Better source material. Garbage in, garbage out. Notes that are clear, dated, and generalized beat messy OCR.
- Smell tests. Ask questions you already know the answer to. If it gets those wrong, fix retrieval or prompts before trusting it on hard questions.
- Separate “I don’t know” from guessing. Prefer a short “the notes don’t say” over a confident wrong answer.
Accuracy is not one knob. It is retrieval quality + faithful synthesis + corpus quality working together.