RAG that references its sources - alternative

Published on Jan 19, 2026

3 min read

Prelude

You are setting up a RAG chatbot and would like to make it reference whatever it says.

Say your RAG is helping you read books by answering questions about it. You not only want it to give you a (relevant) answer but to also provide the source of that answer, namely which sections of the book it is using.

Perhaps the previous blogpost sparked your curiosity.

Implementation

While we are still in the prompt engineering space, we will rely on some rather programmer techniques to achieve our goal here.

So besides updating the prompt in a similar way, we will validate whether its output follows an expected pattern. This way, we will make sure that references are provided and also in the expected format.

And we will do this using our friend pydantic.

class CitedClaim(BaseModel):
    claim: str = Field(description="A factual statement derived from the context.")
    source_id: int = Field(description="The integer ID of the document used to support this claim.")

class RAGResponse(BaseModel):
    answer_summary: str = Field(description="A brief overall summary of the answer.")
    cited_statements: List[CitedClaim] = Field(description="A list of specific claims and their sources.")

structured_llm = llm.with_structured_output(RAGResponse)

First we need to define the structure we expect. We define the complete response and the reference details along. The field descriptions are directly used by langchain to instruct the LLM on the expected output. Internally, this works like a tool call.

system_prompt = """
--- CONTEXT ---
{context}
--- END CONTEXT ---

Based only on the context above extracted from ingested books, answer the following question.
The context above contains one or more documents and their source_id's,
for example: "Document[1]: text" denotes that the text comes from the document with source_id equal to 1.
Whenever referencing something from the provided context, you MUST include the source_id from the context.
Be concise.

--- QUESTION ---
{query}
"""
prompt = ChatPromptTemplate.from_messages([
    ("system", system_prompt),
    ("human", "{query}"),
])

We also need to update the prompt setup. Note how we don’t directly substitute the context and query like in the “simple” approach we used in the previous blogpost. We will do it when invoking the model. Otherwise, besides using the ChatPromptTemplate as syntactic sugar, the prompt is pretty similar.

chain = prompt | structured_llm
result: RAGResponse = chain.invoke({"context": context_documents_str, "query": query}

print("Response:")
print(result.answer_summary)
for citation in result.cited_statements:
    print(f"[{citation.source_id}]: "{citation.claim}"")

We create a small chain - another langchain syntactic sugar - and then simply invoke it with the arguments we had before.

But now we have model validation from pydantic; if the output from the llm does not fit the schema, then we have an error. We no longer rely on complicated hacks to verify the output before showing to the user, or worse, simply showing whatever the model returns.

Plus, we now use tool calling which is an improvement on many models that have been finetuned to provide structured output; at least on many models, such as our llama 3.2.

Is the output better?

Well it definitely looks better. Testing whether it is better is a different story. Perhaps we will dive into that later ;)

Addendum

Does this approach prevent the LLM from hallucinating though?

Nope :/

Well, it does help when the model fails to provide a reference, but technically it could still make those up.

But the technique in the next blogpost could improve on this :)