Unsloth Provider¶
The Unsloth provider enables 2x faster training with 20% less memory through optimized CUDA kernels and efficient memory management.
Overview¶
Unsloth is a specialized library that patches HuggingFace Transformers to use optimized implementations for: - Flash Attention 2 - Fused optimizer kernels - Efficient gradient checkpointing - Optimized LoRA implementations
Features¶
✅ 2x faster training compared to standard HuggingFace
✅ 20% memory reduction for the same batch size
✅ Zero code changes - same API as HuggingFace
✅ Supports popular architectures: Llama, Mistral, Qwen, Gemma, Phi
✅ Compatible with all strategies: SFT, QLoRA, RLHF, DPO
Platform Support¶
| Platform | Supported | Notes |
|---|---|---|
| Linux (Native) | ✅ | Recommended |
| WSL 2 | ✅ | Full support |
| Docker | ✅ | With NVIDIA runtime |
| Windows (Native) | ❌ | Use WSL or Docker |
Installation¶
Linux¶
pip install unsloth
Windows (WSL)¶
See Windows Installation Guide.
Docker¶
FROM nvidia/cuda:12.6.0-devel-ubuntu22.04
RUN pip install unsloth
Verify Installation¶
python -c "import unsloth; print('Unsloth version:', unsloth.__version__)"
Usage¶
Basic Configuration¶
{
"provider": "unsloth",
"model_name": "meta-llama/Llama-3.2-3B",
"max_seq_length": 2048,
"task": "text-generation",
"strategy": "sft",
"num_train_epochs": 3,
"lora_r": 16,
"lora_alpha": 32
}
Important: max_sequence_length Constraint¶
⚠️ CRITICAL: When using Unsloth, you MUST specify a fixed max_seq_length. Auto-inference (-1) is NOT supported.
Valid:
{
"provider": "unsloth",
"max_seq_length": 2048 // ✅ Fixed value
}
Invalid:
{
"provider": "unsloth",
"max_seq_length": -1 // ❌ NOT supported
}
Common values:
- 512 - Short sequences, lower memory
- 1024 - Medium sequences
- 2048 - Standard (recommended)
- 4096 - Long contexts, more memory
- 8192 - Very long contexts, high memory
Via UI¶
- Go to Training tab
- Select Provider:
unsloth - Set Max Sequence Length:
2048(or your preferred value) - Configure other settings
- Start training
Via API¶
curl -X POST http://localhost:8000/api/start_training \
-H "Content-Type: application/json" \
-d '{
"provider": "unsloth",
"model_name": "meta-llama/Llama-3.2-3B",
"max_seq_length": 2048,
"task": "text-generation",
"strategy": "sft",
"dataset": "/path/to/dataset.jsonl",
"num_train_epochs": 3
}'
Supported Models¶
Fully Supported¶
- Llama (1, 2, 3, 3.1, 3.2)
meta-llama/Llama-3.2-1Bmeta-llama/Llama-3.2-3B-
meta-llama/Llama-3.1-8B -
Mistral
mistralai/Mistral-7B-v0.1-
mistralai/Mistral-7B-Instruct-v0.3 -
Qwen
Qwen/Qwen2-1.5B-
Qwen/Qwen2-7B -
Gemma
google/gemma-2b-
google/gemma-7b -
Phi
microsoft/phi-2microsoft/phi-3-mini
Limited Support¶
- BART - Some optimizations not available
- T5 - Not recommended with Unsloth
Performance Benchmarks¶
Training Speed Comparison¶
Setup: Llama-3.2-3B, 1000 examples, NVIDIA RTX 3090
| Provider | Time | Speedup |
|---|---|---|
| HuggingFace | 45 min | 1.0x |
| Unsloth | 22 min | 2.0x |
Memory Usage Comparison¶
Setup: Llama-3.2-7B, batch_size=4, seq_length=2048
| Provider | VRAM | Reduction |
|---|---|---|
| HuggingFace | 16.2 GB | - |
| Unsloth | 12.8 GB | 21% |
Throughput¶
Setup: Llama-3.2-3B, batch_size=8
| Provider | Tokens/sec | Improvement |
|---|---|---|
| HuggingFace | 2,400 | - |
| Unsloth | 4,800 | 2x |
Configuration Tips¶
Optimal Settings for Unsloth¶
{
"provider": "unsloth",
"model_name": "meta-llama/Llama-3.2-3B",
"max_seq_length": 2048,
"strategy": "qlora",
"use_4bit": true,
"bf16": true,
"gradient_checkpointing": true,
"per_device_train_batch_size": 8,
"gradient_accumulation_steps": 4,
"lora_r": 64,
"lora_alpha": 16,
"lora_dropout": 0.1
}
Memory Optimization¶
For limited VRAM:
{
"provider": "unsloth",
"max_seq_length": 1024, // Reduce sequence length
"per_device_train_batch_size": 2,
"gradient_accumulation_steps": 8,
"gradient_checkpointing": true,
"use_4bit": true
}
Speed Optimization¶
For maximum speed:
{
"provider": "unsloth",
"max_seq_length": 2048,
"per_device_train_batch_size": 16,
"gradient_accumulation_steps": 1,
"bf16": true,
"optim": "adamw_8bit"
}
Advanced Features¶
Custom Target Modules¶
Unsloth auto-detects optimal LoRA target modules, but you can override:
{
"provider": "unsloth",
"target_modules": [
"q_proj",
"k_proj",
"v_proj",
"o_proj",
"gate_proj",
"up_proj",
"down_proj"
]
}
Gradient Checkpointing¶
Unsloth uses optimized gradient checkpointing:
{
"provider": "unsloth",
"gradient_checkpointing": true // Automatically optimized
}
Troubleshooting¶
"Unsloth is not installed"¶
Problem: Provider error when selecting Unsloth
Solution: Install Unsloth:
pip install unsloth
"Unsloth not supported on Windows"¶
Problem: Running on native Windows
Solution: Use WSL or Docker. See Windows Installation.
"max_seq_length cannot be -1"¶
Problem: Auto-inference not supported
Solution: Set a fixed value:
{
"max_seq_length": 2048
}
CUDA Out of Memory¶
Problem: OOM errors during training
Solutions:
1. Reduce max_seq_length: 2048 → 1024
2. Reduce per_device_train_batch_size: 8 → 4
3. Enable gradient_checkpointing: true
4. Use 4-bit quantization: use_4bit: true
Model Not Supported¶
Problem: Specific model doesn't work with Unsloth
Solution: Fall back to HuggingFace provider:
{
"provider": "huggingface"
}
Flash Attention Errors¶
Problem: Flash Attention 2 compatibility issues
Solution: Disable Flash Attention:
export UNSLOTH_DISABLE_FLASH_ATTN=1
modelforge run
Comparison with HuggingFace¶
| Feature | HuggingFace | Unsloth |
|---|---|---|
| Training Speed | 1x | 2x |
| Memory Usage | Baseline | -20% |
| Platform Support | All | Linux/WSL/Docker |
| Model Support | All | Llama, Mistral, Qwen, Gemma, Phi |
| Complexity | Simple | Simple |
| Stability | Stable | Stable |
| Documentation | Extensive | Growing |
When to Use Unsloth¶
✅ Use Unsloth When:¶
- Training on Linux or WSL
- Using supported models (Llama, Mistral, etc.)
- Need faster training times
- Have limited VRAM
- Training large models (7B+)
❌ Don't Use Unsloth When:¶
- Running on native Windows (use HuggingFace)
- Using unsupported models (BART, T5)
- Debugging issues (HuggingFace has better error messages)
- Need maximum compatibility
Migration from HuggingFace¶
Switching is simple - just change the provider:
Before:
{
"provider": "huggingface",
"model_name": "meta-llama/Llama-3.2-3B",
...
}
After:
{
"provider": "unsloth",
"model_name": "meta-llama/Llama-3.2-3B",
"max_seq_length": 2048, // Add this!
...
}
All other settings remain the same!
Next Steps¶
- Provider Overview - Compare all providers
- HuggingFace Provider - Standard provider docs
- Configuration Guide - All config options
- Performance Optimization - Get the best results
Unsloth: Train faster, use less memory! 🚀