Unsloth Provider¶

The Unsloth provider enables 2x faster training with 20% less memory through optimized CUDA kernels and efficient memory management.

Overview¶

Unsloth is a specialized library that patches HuggingFace Transformers to use optimized implementations for: - Flash Attention 2 - Fused optimizer kernels - Efficient gradient checkpointing - Optimized LoRA implementations

Features¶

✅ 2x faster training compared to standard HuggingFace
✅ 20% memory reduction for the same batch size
✅ Zero code changes - same API as HuggingFace
✅ Supports popular architectures: Llama, Mistral, Qwen, Gemma, Phi
✅ Compatible with all strategies: SFT, QLoRA, RLHF, DPO

Platform Support¶

Platform	Supported	Notes
Linux (Native)	✅	Recommended
WSL 2	✅	Full support
Docker	✅	With NVIDIA runtime
Windows (Native)	❌	Use WSL or Docker

Installation¶

Linux¶

pip install unsloth

Windows (WSL)¶

See Windows Installation Guide.

Docker¶

FROM nvidia/cuda:12.6.0-devel-ubuntu22.04
RUN pip install unsloth

Verify Installation¶

python -c "import unsloth; print('Unsloth version:', unsloth.__version__)"

Usage¶

Basic Configuration¶

{
  "provider": "unsloth",
  "model_name": "meta-llama/Llama-3.2-3B",
  "max_seq_length": 2048,
  "task": "text-generation",
  "strategy": "sft",
  "num_train_epochs": 3,
  "lora_r": 16,
  "lora_alpha": 32
}

Important: max_sequence_length Constraint¶

⚠️ CRITICAL: When using Unsloth, you MUST specify a fixed max_seq_length. Auto-inference (-1) is NOT supported.

Valid:

{
  "provider": "unsloth",
  "max_seq_length": 2048  // ✅ Fixed value
}

Invalid:

{
  "provider": "unsloth",
  "max_seq_length": -1  // ❌ NOT supported
}

Common values: - 512 - Short sequences, lower memory - 1024 - Medium sequences - 2048 - Standard (recommended) - 4096 - Long contexts, more memory - 8192 - Very long contexts, high memory

Via UI¶

Go to Training tab
Select Provider: unsloth
Set Max Sequence Length: 2048 (or your preferred value)
Configure other settings
Start training

Via API¶

curl -X POST http://localhost:8000/api/start_training \
  -H "Content-Type: application/json" \
  -d '{
    "provider": "unsloth",
    "model_name": "meta-llama/Llama-3.2-3B",
    "max_seq_length": 2048,
    "task": "text-generation",
    "strategy": "sft",
    "dataset": "/path/to/dataset.jsonl",
    "num_train_epochs": 3
  }'

Supported Models¶

Fully Supported¶

Llama (1, 2, 3, 3.1, 3.2)
meta-llama/Llama-3.2-1B
meta-llama/Llama-3.2-3B
meta-llama/Llama-3.1-8B
Mistral
mistralai/Mistral-7B-v0.1
mistralai/Mistral-7B-Instruct-v0.3
Qwen
Qwen/Qwen2-1.5B
Qwen/Qwen2-7B
Gemma
google/gemma-2b
google/gemma-7b
Phi
microsoft/phi-2
microsoft/phi-3-mini

Limited Support¶

BART - Some optimizations not available
T5 - Not recommended with Unsloth

Performance Benchmarks¶

Training Speed Comparison¶

Setup: Llama-3.2-3B, 1000 examples, NVIDIA RTX 3090

Provider	Time	Speedup
HuggingFace	45 min	1.0x
Unsloth	22 min	2.0x

Memory Usage Comparison¶

Setup: Llama-3.2-7B, batch_size=4, seq_length=2048

Provider	VRAM	Reduction
HuggingFace	16.2 GB	-
Unsloth	12.8 GB	21%

Throughput¶

Setup: Llama-3.2-3B, batch_size=8

Provider	Tokens/sec	Improvement
HuggingFace	2,400	-
Unsloth	4,800	2x

Configuration Tips¶

Optimal Settings for Unsloth¶

{
  "provider": "unsloth",
  "model_name": "meta-llama/Llama-3.2-3B",
  "max_seq_length": 2048,
  "strategy": "qlora",
  "use_4bit": true,
  "bf16": true,
  "gradient_checkpointing": true,
  "per_device_train_batch_size": 8,
  "gradient_accumulation_steps": 4,
  "lora_r": 64,
  "lora_alpha": 16,
  "lora_dropout": 0.1
}

Memory Optimization¶

For limited VRAM:

{
  "provider": "unsloth",
  "max_seq_length": 1024,  // Reduce sequence length
  "per_device_train_batch_size": 2,
  "gradient_accumulation_steps": 8,
  "gradient_checkpointing": true,
  "use_4bit": true
}

Speed Optimization¶

For maximum speed:

{
  "provider": "unsloth",
  "max_seq_length": 2048,
  "per_device_train_batch_size": 16,
  "gradient_accumulation_steps": 1,
  "bf16": true,
  "optim": "adamw_8bit"
}

Advanced Features¶

Custom Target Modules¶

Unsloth auto-detects optimal LoRA target modules, but you can override:

{
  "provider": "unsloth",
  "target_modules": [
    "q_proj",
    "k_proj",
    "v_proj",
    "o_proj",
    "gate_proj",
    "up_proj",
    "down_proj"
  ]
}

Gradient Checkpointing¶

Unsloth uses optimized gradient checkpointing:

{
  "provider": "unsloth",
  "gradient_checkpointing": true  // Automatically optimized
}

Troubleshooting¶

"Unsloth is not installed"¶

Problem: Provider error when selecting Unsloth

Solution: Install Unsloth:

pip install unsloth

"Unsloth not supported on Windows"¶

Problem: Running on native Windows

Solution: Use WSL or Docker. See Windows Installation.

"max_seq_length cannot be -1"¶

Problem: Auto-inference not supported

Solution: Set a fixed value:

{
  "max_seq_length": 2048
}

CUDA Out of Memory¶

Problem: OOM errors during training

Solutions: 1. Reduce max_seq_length: 2048 → 1024 2. Reduce per_device_train_batch_size: 8 → 4 3. Enable gradient_checkpointing: true 4. Use 4-bit quantization: use_4bit: true

Model Not Supported¶

Problem: Specific model doesn't work with Unsloth

Solution: Fall back to HuggingFace provider:

{
  "provider": "huggingface"
}

Flash Attention Errors¶

Problem: Flash Attention 2 compatibility issues

Solution: Disable Flash Attention:

export UNSLOTH_DISABLE_FLASH_ATTN=1
modelforge run

Comparison with HuggingFace¶

Feature	HuggingFace	Unsloth
Training Speed	1x	2x
Memory Usage	Baseline	-20%
Platform Support	All	Linux/WSL/Docker
Model Support	All	Llama, Mistral, Qwen, Gemma, Phi
Complexity	Simple	Simple
Stability	Stable	Stable
Documentation	Extensive	Growing

When to Use Unsloth¶

✅ Use Unsloth When:¶

Training on Linux or WSL
Using supported models (Llama, Mistral, etc.)
Need faster training times
Have limited VRAM
Training large models (7B+)

❌ Don't Use Unsloth When:¶

Running on native Windows (use HuggingFace)
Using unsupported models (BART, T5)
Debugging issues (HuggingFace has better error messages)
Need maximum compatibility

Migration from HuggingFace¶

Switching is simple - just change the provider:

Before:

{
  "provider": "huggingface",
  "model_name": "meta-llama/Llama-3.2-3B",
  ...
}

After:

{
  "provider": "unsloth",
  "model_name": "meta-llama/Llama-3.2-3B",
  "max_seq_length": 2048,  // Add this!
  ...
}

All other settings remain the same!

Next Steps¶

Provider Overview - Compare all providers
HuggingFace Provider - Standard provider docs
Configuration Guide - All config options
Performance Optimization - Get the best results

Unsloth: Train faster, use less memory! 🚀