ModelForge v2.0 Architecture¶
Technical overview of ModelForge's modular architecture.
Architecture Overview¶
ModelForge v2.0 uses a clean, modular architecture based on SOLID principles:
┌─────────────────────────────────────────────────────┐
│ Web Interface (React) │
└────────────────────┬────────────────────────────────┘
│
│ HTTP/REST
▼
┌─────────────────────────────────────────────────────┐
│ FastAPI Application Layer │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────┐ │
│ │ Finetuning │ │ Models │ │Playground│ │
│ │ Router │ │ Router │ │ Router │ │
│ └──────┬───────┘ └──────┬───────┘ └────┬─────┘ │
└─────────┼──────────────────┼───────────────┼────────┘
│ │ │
│ Dependency Injection (FastAPI) │
▼ ▼ ▼
┌─────────────────────────────────────────────────────┐
│ Service Layer │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────┐ │
│ │ Training │ │ Model │ │ Hardware │ │
│ │ Service │ │ Service │ │ Service │ │
│ └──────┬───────┘ └──────┬───────┘ └────┬─────┘ │
└─────────┼──────────────────┼───────────────┼────────┘
│ │ │
▼ ▼ ▼
┌─────────────────────────────────────────────────────┐
│ Business Logic Layer │
│ │
│ ┌─────────────────┐ ┌────────────────────┐ │
│ │ Provider │ │ Strategy │ │
│ │ Factory │ │ Factory │ │
│ │ │ │ │ │
│ │ ┌─────────────┐ │ │ ┌────────────────┐ │ │
│ │ │ HuggingFace │ │ │ │ SFT │ │ │
│ │ │ Provider │ │ │ │ Strategy │ │ │
│ │ └─────────────┘ │ │ └────────────────┘ │ │
│ │ ┌─────────────┐ │ │ ┌────────────────┐ │ │
│ │ │ Unsloth │ │ │ │ QLoRA │ │ │
│ │ │ Provider │ │ │ │ Strategy │ │ │
│ │ └─────────────┘ │ │ └────────────────┘ │ │
│ └─────────────────┘ └────────────────────┘ │
│ │
│ ┌─────────────────┐ ┌────────────────────┐ │
│ │ Evaluation │ │ Quantization │ │
│ │ System │ │ Factory │ │
│ └─────────────────┘ └────────────────────┘ │
└──────────────────────────────────────────────────────┘
│ │
▼ ▼
┌─────────────────────────────────────────────────────┐
│ Data Layer │
│ ┌──────────────────┐ ┌────────────────────┐ │
│ │ Database │ │ File Manager │ │
│ │ Manager │ │ │ │
│ │ (SQLAlchemy) │ │ - Datasets │ │
│ │ │ │ - Checkpoints │ │
│ │ - Models │ │ - Logs │ │
│ │ - Training │ └────────────────────┘ │
│ └──────────────────┘ │
└──────────────────────────────────────────────────────┘
Core Components¶
1. Routers (API Layer)¶
Location: ModelForge/routers/
Responsibility: HTTP request handling
Files:
- finetuning_router.py - Training endpoints
- models_router.py - Model management
- playground_router.py - Inference testing
- hub_management_router.py - Model hub operations
Pattern: Thin controllers, delegate to services
2. Services (Business Logic)¶
Location: ModelForge/services/
Responsibility: Core business logic
Files:
- training_service.py - Training orchestration
- model_service.py - Model CRUD operations
- hardware_service.py - Hardware detection
Pattern: Service layer with dependency injection
3. Providers (Model Loading)¶
Location: ModelForge/providers/
Responsibility: Model and tokenizer loading
Files:
- __init__.py - Provider protocol
- huggingface_provider.py - HuggingFace implementation
- unsloth_provider.py - Unsloth implementation
- provider_factory.py - Provider creation
Pattern: Protocol + Factory
4. Strategies (Training Algorithms)¶
Location: ModelForge/strategies/
Responsibility: Training algorithm implementation
Files:
- __init__.py - Strategy protocol
- sft_strategy.py - Supervised fine-tuning
- qlora_strategy.py - Quantized LoRA
- rlhf_strategy.py - RLHF
- dpo_strategy.py - DPO
- strategy_factory.py - Strategy creation
Pattern: Strategy + Factory
5. Database Layer¶
Location: ModelForge/database/
Responsibility: Data persistence
Files:
- models.py - SQLAlchemy models
- database_manager.py - DB operations
Pattern: Repository with ORM
6. Evaluation System¶
Location: ModelForge/evaluation/
Responsibility: Training evaluation
Files:
- metrics.py - Task-specific metrics
- dataset_validator.py - Dataset validation
Design Patterns¶
Dependency Injection¶
Implementation: FastAPI's Depends()
Example:
from fastapi import APIRouter, Depends
from ..dependencies import get_training_service
router = APIRouter()
@router.post("/start_training")
async def start_training(
config: TrainingConfig,
service: TrainingService = Depends(get_training_service),
):
return service.train_model(config.model_dump())
Factory Pattern¶
Used for: Providers and Strategies
Example:
class ProviderFactory:
_providers = {
"huggingface": HuggingFaceProvider,
"unsloth": UnslothProvider,
}
@classmethod
def create_provider(cls, provider_name: str):
provider_class = cls._providers.get(provider_name)
if not provider_class:
raise ProviderError(f"Unknown provider: {provider_name}")
return provider_class()
Strategy Pattern¶
Used for: Training algorithms
Example:
class TrainingStrategy(Protocol):
def prepare_model(self, model, config): ...
def prepare_dataset(self, dataset, tokenizer, config): ...
def create_trainer(self, model, dataset, ...): ...
Protocol (Interface)¶
Used for: Defining contracts
Example:
from typing import Protocol
class ModelProvider(Protocol):
def load_model(self, model_id: str, ...): ...
def load_tokenizer(self, model_id: str, ...): ...
def validate_model_access(self, model_id: str, ...): ...
def get_provider_name(self) -> str: ...
Data Flow¶
Training Request Flow¶
- User submits training request via UI
- React Frontend sends POST to
/api/start_training - FastAPI Router receives request, validates with Pydantic
- Router injects
TrainingServicevia dependency - TrainingService orchestrates:
- Validates dataset
- Creates provider from
ProviderFactory - Loads model via provider
- Creates strategy from
StrategyFactory - Prepares model and dataset via strategy
- Creates trainer and starts training
- Training runs with callbacks for progress
- Results saved to database and file system
- Response returned to user
Model Loading Flow¶
User Request
↓
ProviderFactory.create_provider(provider_name)
↓
Provider.load_model(model_id, config)
↓
Provider-specific implementation
↓
Return (model, tokenizer)
Extension Points¶
Adding a Provider¶
- Create class implementing
ModelProviderprotocol - Register in
ProviderFactory._providers - That's it! No other changes needed.
Adding a Strategy¶
- Create class implementing
TrainingStrategyprotocol - Register in
StrategyFactory._strategies - That's it! No other changes needed.
Adding a Task¶
- Add task-specific formatter in
services/training_service.py - Add metrics in
evaluation/metrics.py - Update schema validation
Error Handling¶
Exception Hierarchy¶
ModelForgeException (base)
├── ProviderError
├── StrategyError
├── DatasetValidationError
├── TrainingError
├── ConfigurationError
├── HardwareError
└── DatabaseError
Error Handler¶
All exceptions caught by FastAPI error handlers and converted to appropriate HTTP responses.
Configuration Management¶
Pydantic Schemas¶
Location: ModelForge/schemas/
Validation and serialization of configuration.
Environment Variables¶
HUGGINGFACE_TOKEN- HuggingFace API tokenMODELFORGE_DB_PATH- Custom database pathMODELFORGE_DISABLE_TENSORBOARD- Disable TensorBoard
Testing Strategy¶
Unit Tests¶
Test individual components in isolation:
def test_provider_factory():
provider = ProviderFactory.create_provider("huggingface")
assert provider.get_provider_name() == "huggingface"
Integration Tests¶
Test component interactions:
def test_training_flow():
service = TrainingService(mock_db, mock_file_manager)
result = service.train_model(config)
assert result["status"] == "success"
Mocking¶
Use dependency injection for easy mocking:
mock_db = MagicMock(spec=DatabaseManager)
service = TrainingService(mock_db, file_manager)
Performance Considerations¶
Connection Pooling¶
SQLAlchemy connection pool: - Pool size: 10 - Max overflow: 20 - Recycle: 3600 seconds
Lazy Loading¶
Models and datasets loaded only when needed.
Gradient Checkpointing¶
Reduces memory at cost of compute.
Mixed Precision¶
bf16/fp16 for faster training on modern GPUs.
Security¶
Input Validation¶
All inputs validated via Pydantic schemas.
SQL Injection Prevention¶
SQLAlchemy ORM prevents SQL injection.
File Access¶
File paths validated and sandboxed.
Token Security¶
HuggingFace tokens stored in environment, never in code.
Scalability¶
Current Limitations¶
- Single-GPU training
- Single-process server
- SQLite database
Future Improvements¶
- Multi-GPU support (already structured for it)
- Distributed training
- PostgreSQL for production
- Redis for caching
- Kubernetes deployment
Code Quality Metrics¶
| Metric | Value |
|---|---|
| Code Duplication | 0% |
| Cyclomatic Complexity | Low (< 10 per function) |
| Test Coverage | (To be added) |
| Type Hints | Extensive |
| Documentation | Comprehensive |
Contributing¶
See Contributing Guide for: - Code style guidelines - Testing requirements - PR process
Understanding the architecture makes contributing easy! Read the code in ModelForge/ to see it in action.