General-purpose LLMs are built to handle everything, which means they're optimised for nothing specific. They don't know your product catalogue, your regulatory requirements, your clinical protocols, or your legal precedents. Fine-tuning changes that. Katalyst AI Lab trains language models on your proprietary data by producing specialist models that outperform generic APIs on your specific tasks, at a fraction of the ongoing inference cost.
| Technique | What It Is | Best For |
|---|---|---|
| Supervised Fine-Tuning (SFT) | Train on labelled input-output pairs to teach domain vocabulary, formatting conventions, and task-specific behaviour. | Consistent output format, domain terminology, instruction-following behaviour |
| RLHF (Reinforcement Learning from Human Feedback) | Human raters compare model outputs; the preferred response reinforces the model's behaviour | Subjective quality: tone, helpfulness, safety, qualities that are hard to capture in labels alone |
| DPO (Direct Preference Optimisation) | Trains on preference pairs without a separate reward model. Get faster iteration, lower compute than full RLHF. | Most enterprise alignment tasks; a practical alternative to full RLHF |
| LoRA / QLoRA Fine-Tuning | Parameter-efficient adaptation using Low-Rank Adaptation, which trains small adapter matrices on top of a frozen base model. | 7B–70B parameter models; reduces compute cost by 60–80% vs full fine-tuning |
| Full Fine-Tuning | Updates all model weights for maximum task performance. | Smaller base models (≤3B); very high-value use cases where peak accuracy is required |
| Embedding Model Training | Train custom embedding models on your domain corpus for retrieval, semantic search, and clustering. | RAG systems where off-the-shelf embeddings underperform on your domain vocabulary |
| Dimension | Fine-Tuning | RAG | Prompt Engineering |
|---|---|---|---|
| Best for | Stable domain knowledge; consistent output style | Dynamic or frequently updated content | Quick iteration; few-shot examples; no training data |
| Training data required | Yes — labelled examples (500–50,000+) | No (but documents must be indexed) | None |
| Inference cost | Lower (smaller, faster specialised model) | Medium (retrieval + generation per query) | Higher (long prompts = more tokens) |
| Update latency | Days to weeks (retraining cycle) | Minutes (re-index documents) | Immediate (edit the prompt) |
| IP / data privacy | Highest — model runs entirely on your infra | High (self-hostable vector DB) | Variable (depends on API provider logging policy) |
| When to combine | Use all three: prompt sets behaviour, RAG adds live context, fine-tuning ensures consistent domain capability |
Production-grade MLOps tooling for pipelines, experiment tracking, serving, and monitoring.
Select a category to explore our tooling
For SFT, meaningful improvement is typically seen with 500–1,000 high-quality examples. DPO/RLHF requires preference-labelled pairs, usually 2,000–10,000. Embedding model fine-tuning needs a larger domain corpus. We assess your data readiness during the discovery phase and tell you directly whether to collect more or proceed with what you have.
Not when done correctly. LoRA/QLoRA add a small adapter on top of the frozen base model, preserving general capabilities while adding domain-specific behaviour. Full fine-tuning with proper regularisation also preserves most general capability. We build regression testing into our evaluation harness to verify this.
Yes, completely. Fine-tuned weights, LoRA adapters, training scripts, and data preprocessing pipelines are all transferred to you at project close. You can self-host, further fine-tune, or deploy internally without restriction.
RLHF (Reinforcement Learning from Human Feedback) trains a model using human preference signals. It is most valuable when desired quality is subjective and hard to capture in labels, tone, helpfulness, safety. For most enterprise fine-tuning tasks, SFT combined with DPO achieves comparable results more efficiently and is what we recommend for first engagements.
Share your use case, training data volume, target task, and preferred base model, and we'll return a scoping estimate within 48 hours.
Request a Fine-Tuning Estimate