RAG that admits when it's wrong

Published on Feb 2, 2026

3 min read

Prelude

You are setting up a RAG chatbot and would like to make it admit when it does not know the answer.

Say your RAG is helping you read books by answering questions about it. You not only want it to give you a (relevant) answer but to also admit defeat when it does not know.

Perhaps the previous blogpost sparked your curiosity.

Implementation

There are many available techniques here. As The Engineer, we will again strive to choose the most effective ones.

The most obvious choice is to try it via prompt engineering. A related one, as hinted before, is to take a deeper look at those relevance scores returned by the vector db.

Lastly, we do have the temperature hyperparameter. The lower the value, the more deterministic the model and thus less likely to pick “creative” answers. Implicitly however, this does rely on the prompt containing sufficient relevant data to be able to compute such a deterministic answer. Recall: rubbish in - rubbish out.

So let’s do all :)

Prompt

This part is very simple. Essentially add to the prompt any variation of:

If the answer is not contained within the provided context,
state that you do not know. Do not use outside knowledge.

We can see that working in the following screenshot:

Example output on prompt change

While questions which do have answers in the books are still answered as expected.

Relevance scores

The simplest is to use langchain’s retriever as

retriever = vector_store.as_retriever(
    search_type="similarity_score_threshold",
    search_kwargs={"score_threshold": threshold}
)

and then tune the threshold. However, this abstracts away the actual score.

So let’s use the slightly lower level approach on the vector_store itself:

relevant_docs = vector_store.similarity_search_with_relevance_scores(query, k=2)
relevant_docs = [doc for doc, score in relevant_docs if score > threshold]

This way we can manually filter out docs as we see fit, be it on score or other heuristics (hint ;)).

But how do we choose the threshold?

Well this is not easy. As a proof of concept, one can manually try out some good vs bad examples and pick a value.

Diving deeper, understanding the metric is critical. There are different options with different ranges and so on.

One would then create a gold queries dataset with both good and bad questions given the known data the RAG has access to, then finally do some nice data science and find a more accurate threshold.

Temperature

This option is even trickier to evaluate, however requires a very simple change:

llm = ChatOllama(
    model="llama3",
+    temperature=0,
)

as opposed to the default 0.8.

So again, this affects the token generation itself, meaning the good answers should more obviously be picked than non-related ones.

Addendum

So what were those other options?

Updating the data retrieval itself would implicitly prevent hallucination. The next blogpost shall dive deeper into this topic.

Other techniques include asking the model to correct itself. This essentially means adding another generation step with a prompt such as:

Look at this context and response; does the response contain information not in the context?
If yes, remove it.

This second step could even be via a different smaller model, to save on costs.

Lastly, hallucination detection frameworks do exist specifically for this use case, such as Ragas.