Posted in

RAG and CAG: Two Ways to Make AI Smarter, without Retraining It

Artificial intelligence has made remarkable progress in recent years. Tools like ChatGPT and Claude can explain complex topics, draft reports, and answer clinical questions with surprising fluency. But there is a fundamental problem that even the most sophisticated AI shares: it only knows what it was taught during training, and that training has a cutoff date.

This matters enormously in clinical microbiology. Antimicrobial resistance patterns shift. Guidelines are updated. New pathogens emerge. A model trained even six months ago may give you yesterday’s answer for today’s problem.

Two techniques, RAG (Retrieval-Augmented Generation) and CAG (Cache-Augmented Generation), have emerged as practical solutions. Neither requires retraining the AI from scratch. Instead, they extend what the AI can see at the moment you ask it a question.

Think of a large language model like a very well-read colleague who studied intensively for several years, then went into a sealed room with no internet or newspapers. Their knowledge is deep — but frozen in time. They cannot know what changed after they went in.

RAG and CAG are two different ways to pass documents under the door of that sealed room.

Retrieval-Augmented Generation (RAG)

RAG works like an open-book exam where the books are searched in real time. When you ask a question, the system first searches a database for the most relevant chunks of information, then hands those chunks to the AI alongside your question. The AI reads them and answers using both its training and the retrieved material.

Analogy: You are a registrar on call. You ask a question, and before answering, a very fast librarian retrieves the three most relevant pages from BNF, EUCAST, and your local policy, and places them on the desk in front of you.

RAG is well-suited to large, continuously updated knowledge bases, published literature, resistance surveillance data, or national guideline repositories. Its limitation is that the quality of the answer depends on whether the right chunks were retrieved. If the search pulls the wrong pages, the AI may still answer confidently, but from the wrong source.

Cache-Augmented Generation (CAG)

CAG takes the opposite approach. Instead of searching at query time, a document or a defined set of documents is loaded into the AI’s working memory in full before the conversation begins. The AI then has the entire document in view, all at once, with no retrieval step needed.

Analogy: Before your shift, you read your hospital’s antimicrobial policy cover to cover. When a query comes in during the shift, you are not searching; you already know every page of it.

CAG is ideal for smaller, stable, well-defined reference sets: a specific formulary, a single guideline, a resistance profile for a known outbreak strain. Its constraint is size – the AI’s context window can only hold so much text at once, making it unsuitable for very large or continuously changing knowledge bases.

These techniques are not mutually exclusive. Some systems combine both — using CAG to pre-load a core policy document, while using RAG to search a wider literature database for supplementary evidence.

In clinical microbiology

Microbiology sits at the intersection of rapidly evolving science and high-stakes clinical decisions. The MIC breakpoints that determine whether an isolate is sensitive or resistant can change year to year. Outbreak guidance is written in real time. A surveillance report from last quarter may already be outdated.

AI tools built without RAG or CAG will default to their training data when answering such queries, which may be months or years out of date, and which may not include your local resistance ecology, your institution’s formulary, or your regional outbreak context.

It must be remembered that RAG and CAG do not make AI infallible – the problem of not understanding the context, position bias or hallucinations still remains as risks. However, they represent a meaningful step toward an AI capable of operating in clinical environments where accuracy, fidelity, and specificity matter

As these tools become more integrated into diagnostic support, antimicrobial stewardship platforms, and laboratory information systems, understanding what they can and cannot do will be part of the clinical microbiologist’s working knowledge.

How can you use it?

Here is a practical guide to how a clinical microbiologist, an infection specialist, or a healthcare organisation might actually use these techniques today. For this I should give you some available systems where you can experience RAG and CAG.

Google’s NotebookLM is an example of RAG. Here you can upload a large number of documents (50 or more) and then use them as a library to generate your content or answer your questions. You can use this now.
An example of CAG is even simpler. When you open an LLM like ChatGPT/Claude/Gemini, you have an option to upload a document in the chat and ask a question from there. This is CAG. However, if you must be careful, as there is lmit of how big a document LLM can process. This is called context limit (token limit), so if you give a large document, LLM could make a mistake.

If you have ChatGPT or Claude Pro or above, you can access something called “projects”. You can upload a set of documents here and use it as your library. These are somewhere between RAG and CAG. They work by prepending your instructions and uploading documents directly into the context window at the start of each conversation. That is closer to CAG or more precisely, it is context stuffing / persistent context. However, there is a caveat: when document sets are large enough to exceed the context window, these platforms may apply chunking and retrieval behind the scenes. So the behaviour can shift depending on document size.

Ask the right question first.

Before choosing RAG or CAG, ask yourself: what knowledge does the AI need that it cannot possibly already have? The answer will point you to the right tool.

Here are some examples

Sepsis biomarker decision support — MetaSepsisKnowHub
A team in China built a RAG-powered platform indexing 427 sepsis biomarkers across 423 studies. When clinicians queried it, expert-reviewed recommendations scored significantly higher than answers from standalone GPT-4, GPT-4o, or Qwen2.5. The improvement was statistically significant (GPT-4 mean score rose from 75.8 to 81.6; p = 0.02). The key gain was that the AI could ground its sepsis management advice in current literature rather than its training snapshot alone. [Journal of Medical Internet Research – A Knowledge-Enhanced Platform (MetaSepsisKnowHub) for Retrieval Augmented Generation–Based Sepsis Heterogeneity and Personalized Management: Development Study]

Future

Leave a Reply

Your email address will not be published. Required fields are marked *