Strategy Overview¶

Understanding training strategies in ModelForge v2.0.

What Are Strategies?¶

Strategies define how models are trained: - Model preparation (adapters, PEFT configuration) - Dataset formatting - Trainer setup - Training algorithm

Different strategies offer different trade-offs in terms of memory, speed, and quality.

Available Strategies¶

Strategy	Memory	Speed	Quality	Use Case
SFT	Baseline	1x	High	General-purpose fine-tuning
QLoRA	-30-50%	0.9x	High	Limited VRAM
RLHF	High	Slow	Very High	Alignment with human preferences
DPO	Medium	Medium	Very High	Simpler alternative to RLHF

Choosing a Strategy¶

Use SFT When:¶

✅ First time fine-tuning
✅ Have sufficient VRAM
✅ Standard supervised learning task
✅ Want simplest setup

Use QLoRA When:¶

✅ Limited VRAM (< 12GB for 7B models)
✅ Want to train larger models
✅ Memory is the bottleneck
✅ Can accept slightly slower training

Use RLHF When:¶

✅ Aligning model with human preferences
✅ Have reward model or feedback data
✅ Quality is critical
✅ Have computational resources

Use DPO When:¶

✅ Have preference pairs (chosen/rejected)
✅ Want simpler alternative to RLHF
✅ Alignment without reward model
✅ More stable training than RLHF

Configuration¶

Specify strategy in training config:

{
  "strategy": "sft"  // or "qlora", "rlhf", "dpo"
}

Next Steps¶

SFT Strategy - Standard supervised fine-tuning
QLoRA Strategy - Memory-efficient training
Configuration Guide - All options

Choose the right strategy for your needs! 🎯