PLATFORM // FINE-TUNE

FINE-TUNE VLMS ON
MANAGED INFRASTRUCTURE.

The best models are trained on your data. Fine-tune Qwen, InternVL, Cosmos, and PaliGemma with LoRA, QLoRA, or full SFT on managed GPU clusters. Configure hyperparameters, launch training, close your browser. Get notified when your model is ready.

Datature/Bone Fracture/Training Config
Configure TrainingReady
New Training RunStep 1 of 4
01Model
02Hyperparams
03GPU
04Launch

Base Model

Qwen2.5-VL-7BSELECTED
InternVL3.5-8B
Cosmos-Reason2-8B

TRAINING METHODS

THREE PATHS TO A BETTER MODEL.

Choose the training strategy that fits your compute budget and accuracy target. All three methods produce checkpoints compatible with Vi deployment pipelines.

LoRA

Low-Rank Adaptation

Freeze the base model weights and train small rank-decomposition matrices injected into attention layers. Typically 0.1-1% of total parameters are trainable. Fastest to train, lowest VRAM requirement, easy to swap and merge adapters post-training.

Trainable Params0.1 - 1%
Rank4, 8, 16, 32, 64
Target Modulesq_proj, v_proj, k_proj, o_proj
Min VRAM~16 GB (7B model)

QLoRA

Quantized LoRA

Load the base model in NF4 4-bit precision and apply LoRA adapters on top. 4x memory savings over full-precision LoRA with minimal quality loss. Enables fine-tuning 32B-parameter models on a single A100.

Base PrecisionNF4 (4-bit)
Adapter PrecisionBF16
Memory Savings~4x vs FP16
Min VRAM~12 GB (7B model)

Full SFT

Full Supervised Fine-Tuning

Update all model parameters with your training data. Highest potential accuracy when you have sufficient data and compute. Recommended for domain adaptation where the base model distribution diverges significantly from your target domain.

Trainable Params100%
PrecisionBF16 / FP16
Multi-GPUFSDP / DeepSpeed ZeRO-3
Min VRAM~160 GB (7B model)

GPU INFRASTRUCTURE

FROM T4 TO B200. NO PROVISIONING.

Select your GPU tier and cluster size from the training config. Vi provisions hardware, configures NVLink interconnect for multi-GPU runs, and deallocates when training completes. Up to 64 GPUs per run.

NVIDIA

T4

Starter

Inference, LoRA on 3B Models

VRAM16 GB
CUDA Cores2,560
ArchitectureTuring
Multi-GPUUp to 4

Models That Fit

Qwen 3B (LoRA)NVILA-Lite 2B
NVIDIA

L4

Developer

LoRA Fine-Tuning on 7B Models

VRAM24 GB
CUDA Cores7,424
ArchitectureAda Lovelace
Multi-GPUUp to 8

Models That Fit

Qwen 7B (LoRA)InternVL 8B (LoRA)
NVIDIA

A10

Developer

General Purpose Training

VRAM24 GB
CUDA Cores9,216
ArchitectureAmpere
Multi-GPUUp to 8

Models That Fit

Qwen 7B (LoRA)Cosmos 8B (QLoRA)
NVIDIA

A100

Developer

Full SFT, Large LoRA Runs

VRAM80 GB
CUDA Cores6,912
ArchitectureAmpere
Multi-GPUUp to 32

Models That Fit

Qwen 32B (LoRA)Qwen 7B (Full SFT)
NVIDIA

H100

Professional

Large-Scale Production Training

VRAM80 GB
CUDA Cores16,896
ArchitectureHopper
Multi-GPUUp to 64

Models That Fit

Qwen 32B (Full SFT)InternVL 38B (LoRA)
NVIDIA

B200

Enterprise

Largest Models, Multi-Node

VRAM192 GB
CUDA Cores18,000+
ArchitectureBlackwell
Multi-GPUUp to 64

Models That Fit

Qwen 72B (LoRA)Any Model (Full SFT)

Multi-GPU Scaling

Scale to 2, 4, 8, 16, 32, or 64 GPUs per training run. H100 and B200 clusters use NVLink interconnect for high-bandwidth gradient synchronization. Automatic sharding with FSDP or DeepSpeed ZeRO-3.

HYPERPARAMETER CONFIGURATION

FULL CONTROL. EVERY KNOB.

Configure every training parameter from the dashboard or submit a JSON config via the API. Vi validates configurations against model architecture constraints and VRAM limits before provisioning hardware.

  • Learning Rate Scheduling

    Cosine annealing, linear decay, or constant rate. Configurable warmup steps with linear or exponential ramp.

  • LoRA Target Modules

    Select which attention projection matrices to adapt: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj.

  • System Prompt Templates

    Define custom system prompts per training run. Templates support variable injection for domain, task type, and output format.

  • Validation Split

    Automatic holdout validation with configurable split ratio. Early stopping based on validation loss plateau detection.

Training Configuration

{

"base_model": "Qwen/Qwen2.5-VL-7B",

"training_method": "lora",

"lora_config": {

"rank": 16,

"alpha": 32,

"target_modules": ["q_proj", "v_proj"],

"dropout": 0.05

},

"epochs": 5,

"learning_rate": 2e-4,

"batch_size": 4,

"optimizer": "adamw",

"warmup_steps": 100,

"weight_decay": 0.01,

"scheduler": "cosine",

"quantization": "none",

"gpu": {

"type": "H100",

"count": 2

},

"system_prompt": "You are a medical..."

}

MODEL ZOO

SUPPORTED BASE MODELS.

Start from any supported vision-language model. Vi handles tokenizer configuration, conversation templates, and architecture-specific training optimizations automatically.

ALIBABA

ALIBABA

Qwen2.5-VL

CTX32K

Dynamic Resolution for Images and Video. Recommended Default for Most Tasks.

3B7B32B72BRECOMMENDED
OPENGVLAB

OPENGVLAB

InternVL3.5

CTX32K

Visual Resolution Router for Adaptive Token Compression. Fine-Grained Phrase Grounding.

1B2B8B38BFINE-GRAINED
NVIDIA

NVIDIA

Cosmos-Reason2

CTX256K

Physical-World Reasoning for Robotics and Embodied AI. Chain-of-Thought Spatial Reasoning.

2B8BCHAIN-OF-THOUGHT
NVIDIA

NVIDIA

NVILA-Lite

CTX32K

Compact Model Optimized for Edge Deployment. Scale-Then-Compress Architecture.

2B8BEDGE
ALIBABA

ALIBABA

Qwen3-VL

CTX256K

Interleaved Multimodal Context with Thinking Mode for Chain-of-Thought Reasoning. Extensible to 1M Tokens.

2B8B32BLATEST
GOOGLE

GOOGLE

PaliGemma 2

CTX8K

SigLIP Vision Encoder with Gemma 2 Text Decoder. Multi-Resolution Variants.

3B10B28BMULTI-RES

CHECKPOINT BRANCHING

TRAIN THE BASE. FORK THE REST.

Train a base model on your core dataset. Fork the checkpoint for each use case, team, or deployment target. Each branch inherits the foundation and fine-tunes further with specialized data.

FOUNDATIONSUB-DOMAINDEPLOYMENTRadiology Foundation12K Images / v1.0F1: 0.89Orthopedic+3.2K X-rays / v1.1F1: 0.93Hospital APediatric / +400Hospital BSports Med / +800Neuro MRI+1.8K MRI / v1.1F1: 0.91Clinic CStroke Unit / +600

Build Your Foundation

Train on your full private dataset. This becomes the base checkpoint all downstream variants inherit.

Fork Per Use Case

Each branch fine-tunes on specialized data. Internal teams, external deployments, or customer-specific models.

Preserve IP

Branch users never access your base training data. They only see their checkpoint and deployment endpoint.

Scale Without Retraining

The 10th deployment costs the same as the 1st. Fork, fine-tune, deploy. Repeatable pipeline.

TRAIN YOUR VLM.
SHIP IT TODAY.

300 compute credits free. All model architectures included. No credit card required.