Fine-tuning is the process of taking a model that has already been pretrained on broad data and continuing to train it on a curated set of your own examples. Unlike prompting — which only changes the instructions you give a fixed model — fine-tuning updates the model's weights, so the resulting model bakes in the behaviour you trained for. The output is a customised version of the model that you then call like any other.
The mechanics are straightforward in principle: you assemble high-quality input-output pairs that demonstrate the behaviour you want, run a training job that nudges the model's parameters toward producing those outputs, and validate the result against held-out examples. Modern practice often uses parameter-efficient methods (such as LoRA adapters) that train only a small set of additional weights, which is far cheaper than updating the whole model and easy to serve.
Fine-tuning earns its place when you need consistent structure or style that prompting can't reliably enforce, when you want to compress a long, expensive prompt into the model itself, when you need a smaller, cheaper model to match a larger one's behaviour on a narrow task, or when latency and cost rule out stuffing examples into every prompt. It is the wrong tool for injecting fresh facts — that's what retrieval is for — because retraining to add knowledge is slow, expensive, and goes stale.
Fine-tuning matters because it is how you turn a general model into a specialist that is faster, cheaper, and more consistent on the job you actually run. But it carries real costs: you need a clean, representative dataset, evaluations to prove the tuned model is actually better, and a plan to maintain it as base models improve. The common mistake is reaching for fine-tuning first; in most applications, good prompting plus retrieval gets you most of the way, and fine-tuning is the optimisation you apply once the task is well understood.