Fine-tuning, prompting, or RAG: which one do you actually need?

Three very different ways to make an AI behave the way you want — explained without jargon, with a simple rule for picking between them.

May 10, 2026 4 min read fine-tuning / rag / primer

There are three main ways to make a general AI model do something useful for your business: prompt it, retrieve for it, or fine-tune it. These get mixed up constantly, and picking the wrong one wastes money and time.

Here they are in plain English, and how to choose between them.

The teaching analogy

Imagine you’ve hired a very smart new employee who knows almost everything in general, but nothing about your company specifically.

Prompting is giving them instructions every time they do a task. “When a customer complains, respond politely, restate the problem, and offer a refund up to €50.”
RAG is giving them a filing cabinet. They can pull relevant files before each task, so they can refer to your actual policies, not what they vaguely remember.
Fine-tuning is training them. Over time, your way of doing things becomes second nature — they don’t need to be told the rules; they’ve internalised them.

All three are valid. Most real systems use a combination. The interesting question is: for any given problem, which one(s)?

Prompting

What it is: you give the model carefully written instructions, possibly with a few examples, every time you call it.

Best for:

Tweaking style and format. Make it write in your tone. Make it return JSON. Make it answer in three bullet points.
Adding short-term context that fits in a few thousand words.
Quick experiments. You can test a new behaviour in minutes.

Limits:

Long instructions cost tokens every time. They add up.
The model can ignore parts of long prompts.
You can’t fit a 500-page policy manual in a prompt and expect reliable use.

Prompting is the cheapest and fastest of the three. Always your starting point.

RAG (retrieval-augmented generation)

What it is: before each answer, the system fetches relevant information from your data and inserts it into the prompt.

Best for:

Working with your data — documents, manuals, tickets, anything private or specific to you.
Changing data — pricing, inventory, policies, anything that updates after the model was trained.
Answers that need citations and grounded sources.

Limits:

It doesn’t change how the model behaves — only what it sees. The model still answers in its default style.
It doesn’t help if the retrieval finds the wrong things, or if your documents are bad.
It adds engineering: chunking, embeddings, retrieval, evaluation. (See What is RAG?.)

RAG is the right answer when the problem is knowledge: “the model doesn’t know about us.”

Fine-tuning

What it is: you take a pre-trained model and continue training it on a curated dataset of examples that look like your task. The model’s weights change — its actual behaviour shifts.

Best for:

Teaching the model a specific style, format, or domain that’s hard to fit into a prompt.
Smaller models. You can take a small, cheap model and fine-tune it to outperform a much larger one on your specific task.
Latency and cost. A fine-tuned small model is dramatically cheaper and faster to run than a big general-purpose one.

Limits:

Needs a labelled dataset, often hundreds to thousands of examples.
Costs real money and time to train.
Doesn’t add new facts as well as you’d hope — it changes behaviour, not knowledge.
Risks “forgetting” — push too hard and the model gets worse at general things.

Fine-tuning is the right answer when the problem is behaviour: “the model can do this, but not the way we need it to.”

How to choose

A useful rule of thumb:

Symptom	Likely answer
”The model can’t write the way we need.”	Prompting (then fine-tuning if it keeps failing)
“The model doesn’t know about us.”	RAG
”The model is great, but it’s too slow / too expensive at scale.”	Fine-tune a smaller model
”The model gets it almost right, but the format is wrong.”	Prompting, sometimes fine-tuning
”The model knows things, but they’re out of date.”	RAG

The combination that wins

In practice, most production AI systems combine these:

RAG to ground the model in your data.
Prompting to control the style and structure of the answer.
Fine-tuning when the volume is high enough that the cost of training pays back through cheaper inference, or when the task needs a behaviour that prompting can’t reliably hit.

There’s no medal for picking the most complex option. The goal is the simplest combination that does the job — and the discipline to measure whether each addition is actually pulling its weight.

Need help putting an LLM system into production?

Get in touch