RAG that references its sources

Published on Jan 5, 2026

3 min read

Prelude

You are setting up a RAG chatbot and would like to make it reference whatever it says.

Say your RAG is helping you read books by answering questions about it. You not only want it to give you a (relevant) answer but to also provide the source of that answer, namely which sections of the book it is using.

Implementation

We are directly in the realm of prompt engineering. At least this is the most obvious implementation route.

Milk and eggs

And prompt engineering is exactly like the well-known joke above. One must be specific, logical, provide examples, and so on. This article goes much deeper into the topic; do give it a read if interested. We however will stick to the most simple yet effective implementation.

So how does it look like?

Recall the previous prompt was roughly:

{context}
Based only on the context above extracted from ingested books, answer the following question.
Be concise.
{query}

And now we need to instruct it to include references. The most straight forward way would be:

{context}
Based only on the context above extracted from ingested books, answer the following question.
+The context above contains one or more documents and their ids,
+for example: "Document[1]: text" denotes that the text comes from the document with id equal to 1.
+Whenever referencing something from the provided context, include the document id and the particular
+text segment.
Be concise.
{query}

Which then produces this sample output:

Wait, you say the context provides the id for each document. Where?

Yes, this implies the context should also include the id. This is a simple change:

-context_documents_str = "\n\n".join(doc.page_content for doc in relevant_docs)
+context_documents_str = "\n\n".join(f"Document[{i+1}]: {doc.page_content}" for i, doc in enumerate(relevant_docs))

Addendum

Are there no better ways than updating the prompt? This seems fickle!

And indeed it might be. The prompt could be extended further with more examples and so on, however there are other ways of achieving this referencing feature. One way is by forcing a structured output. This is typically best achieved via post-training, however one can also do it via the prompt and some related techniques.

We try that out in another post; stay tuned :)

Yet there are other techniques in literature, such as fine tuning the models, intercepting the tokens as they are generated, choosing specialized models, etc. These I will not cover as they are rather “deep” technically, require more resources, and quite frankly, are overkill in most situations. Pareto principle after all.

What if there are no relevant documents to answer the question? How to prevent it from hallucinating?

Prompt engineering sounds like the logical solution. One should be able to explain to the LLM that unless it’s sure, it should not provide an answer. Recall that when we retrieve data from the vector db, it does internally compute a score of sorts. We could also use that!

Will try that out in another post too :)