PLATFORM // FINE-TUNE
The best models are trained on your data. Fine-tune Qwen, InternVL, Cosmos, and PaliGemma with LoRA, QLoRA, or full SFT on managed GPU clusters. Configure hyperparameters, launch training, close your browser. Get notified when your model is ready.
Base Model
System Prompt
You are a medical imaging analysis assistant. Describe findings using AO classification. Always provide reasoning steps.
TRAINING METHODS
Choose the training strategy that fits your compute budget and accuracy target. All three methods produce checkpoints compatible with Vi deployment pipelines.
Freeze the base model weights and train small rank-decomposition matrices injected into attention layers. Typically 0.1-1% of total parameters are trainable. Fastest to train, lowest VRAM requirement, easy to swap and merge adapters post-training.
Load the base model in NF4 4-bit precision and apply LoRA adapters on top. 4x memory savings over full-precision LoRA with minimal quality loss. Enables fine-tuning 32B-parameter models on a single A100.
Update all model parameters with your training data. Highest potential accuracy when you have sufficient data and compute. Recommended for domain adaptation where the base model distribution diverges significantly from your target domain.
GPU INFRASTRUCTURE
Select your GPU tier and cluster size from the training config. Vi provisions hardware, configures NVLink interconnect for multi-GPU runs, and deallocates when training completes. Up to 64 GPUs per run.

Inference, LoRA on 3B Models
Models That Fit

LoRA Fine-Tuning on 7B Models
Models That Fit

General Purpose Training
Models That Fit

Full SFT, Large LoRA Runs
Models That Fit

Large-Scale Production Training
Models That Fit

Largest Models, Multi-Node
Models That Fit
Multi-GPU Scaling
Scale to 2, 4, 8, 16, 32, or 64 GPUs per training run. H100 and B200 clusters use NVLink interconnect for high-bandwidth gradient synchronization. Automatic sharding with FSDP or DeepSpeed ZeRO-3.
HYPERPARAMETER CONFIGURATION
Configure every training parameter from the dashboard or submit a JSON config via the API. Vi validates configurations against model architecture constraints and VRAM limits before provisioning hardware.
Learning Rate Scheduling
Cosine annealing, linear decay, or constant rate. Configurable warmup steps with linear or exponential ramp.
LoRA Target Modules
Select which attention projection matrices to adapt: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj.
System Prompt Templates
Define custom system prompts per training run. Templates support variable injection for domain, task type, and output format.
Validation Split
Automatic holdout validation with configurable split ratio. Early stopping based on validation loss plateau detection.
Training Configuration
{
"base_model": "Qwen/Qwen2.5-VL-7B",
"training_method": "lora",
"lora_config": {
"rank": 16,
"alpha": 32,
"target_modules": ["q_proj", "v_proj"],
"dropout": 0.05
},
"epochs": 5,
"learning_rate": 2e-4,
"batch_size": 4,
"optimizer": "adamw",
"warmup_steps": 100,
"weight_decay": 0.01,
"scheduler": "cosine",
"quantization": "none",
"gpu": {
"type": "H100",
"count": 2
},
"system_prompt": "You are a medical..."
}
MODEL ZOO
Start from any supported vision-language model. Vi handles tokenizer configuration, conversation templates, and architecture-specific training optimizations automatically.
ALIBABA
Dynamic Resolution for Images and Video. Recommended Default for Most Tasks.
OPENGVLAB
Visual Resolution Router for Adaptive Token Compression. Fine-Grained Phrase Grounding.
NVIDIA
Physical-World Reasoning for Robotics and Embodied AI. Chain-of-Thought Spatial Reasoning.
NVIDIA
Compact Model Optimized for Edge Deployment. Scale-Then-Compress Architecture.
ALIBABA
Interleaved Multimodal Context with Thinking Mode for Chain-of-Thought Reasoning. Extensible to 1M Tokens.
SigLIP Vision Encoder with Gemma 2 Text Decoder. Multi-Resolution Variants.
CHECKPOINT BRANCHING
Train a base model on your core dataset. Fork the checkpoint for each use case, team, or deployment target. Each branch inherits the foundation and fine-tunes further with specialized data.
Train on your full private dataset. This becomes the base checkpoint all downstream variants inherit.
Each branch fine-tunes on specialized data. Internal teams, external deployments, or customer-specific models.
Branch users never access your base training data. They only see their checkpoint and deployment endpoint.
The 10th deployment costs the same as the 1st. Fork, fine-tune, deploy. Repeatable pipeline.
300 compute credits free. All model architectures included. No credit card required.