Skip to content

Strategy Overview

Understanding training strategies in ModelForge v2.0.

What Are Strategies?

Strategies define how models are trained: - Model preparation (adapters, PEFT configuration) - Dataset formatting - Trainer setup - Training algorithm

Different strategies offer different trade-offs in terms of memory, speed, and quality.

Available Strategies

Strategy Memory Speed Quality Use Case
SFT Baseline 1x High General-purpose fine-tuning
QLoRA -30-50% 0.9x High Limited VRAM
RLHF High Slow Very High Alignment with human preferences
DPO Medium Medium Very High Simpler alternative to RLHF

Choosing a Strategy

Use SFT When:

✅ First time fine-tuning
✅ Have sufficient VRAM
✅ Standard supervised learning task
✅ Want simplest setup

Use QLoRA When:

✅ Limited VRAM (< 12GB for 7B models)
✅ Want to train larger models
✅ Memory is the bottleneck
✅ Can accept slightly slower training

Use RLHF When:

✅ Aligning model with human preferences
✅ Have reward model or feedback data
✅ Quality is critical
✅ Have computational resources

Use DPO When:

✅ Have preference pairs (chosen/rejected)
✅ Want simpler alternative to RLHF
✅ Alignment without reward model
✅ More stable training than RLHF

Configuration

Specify strategy in training config:

{
  "strategy": "sft"  // or "qlora", "rlhf", "dpo"
}

Next Steps


Choose the right strategy for your needs! 🎯