Fine-Tune: Train VLMs on Managed GPUs | Datature Vi

Datature/Bone Fracture/Training Config

Configure Training● Ready

◻Overview

▦Explorer

◇Annotator

▶Runs

◈Models

New Training RunStep 1 of 4

01Model

02Hyperparams

03GPU

04Launch

Base Model

Qwen2.5-VL-7BSELECTED

InternVL3.5-8B

Cosmos-Reason2-8B

Run DetailsReady

Base ModelQwen2.5-VL-7B

Training MethodLoRA

LoRA Rank16

LoRA Alpha32

Learning Rate2e-4

OptimizerAdamW

Batch Size4

Epochs5

Warmup Steps100

Weight Decay0.01

GPU2x H100

Dataset2,847 images

System Prompt

You are a medical imaging analysis assistant. Describe findings using AO classification. Always provide reasoning steps.

TRAINING METHODS

THREE PATHS TO A BETTER MODEL.

Choose the training strategy that fits your compute budget and accuracy target. All three methods produce checkpoints compatible with Vi deployment pipelines.

LoRA

Low-Rank Adaptation

Freeze the base model weights and train small rank-decomposition matrices injected into attention layers. Typically 0.1-1% of total parameters are trainable. Fastest to train, lowest VRAM requirement, easy to swap and merge adapters post-training.

Trainable Params0.1 - 1%

Rank4, 8, 16, 32, 64

Target Modulesq_proj, v_proj, k_proj, o_proj

Min VRAM~16 GB (7B model)

QLoRA

Quantized LoRA

Load the base model in NF4 4-bit precision and apply LoRA adapters on top. 4x memory savings over full-precision LoRA with minimal quality loss. Enables fine-tuning 32B-parameter models on a single A100.

Base PrecisionNF4 (4-bit)

Adapter PrecisionBF16

Memory Savings~4x vs FP16

Min VRAM~12 GB (7B model)

Full SFT

Full Supervised Fine-Tuning

Update all model parameters with your training data. Highest potential accuracy when you have sufficient data and compute. Recommended for domain adaptation where the base model distribution diverges significantly from your target domain.

Trainable Params100%

PrecisionBF16 / FP16

Multi-GPUFSDP / DeepSpeed ZeRO-3

Min VRAM~160 GB (7B model)

GPU INFRASTRUCTURE

FROM T4 TO B200. NO PROVISIONING.

Select your GPU tier and cluster size from the training config. Vi provisions hardware, configures NVLink interconnect for multi-GPU runs, and deallocates when training completes. Up to 64 GPUs per run.

T4

Starter

Inference, LoRA on 3B Models

VRAM16 GB

CUDA Cores2,560

ArchitectureTuring

Multi-GPUUp to 4

Models That Fit

Qwen 3B (LoRA)NVILA-Lite 2B

L4

Developer

LoRA Fine-Tuning on 7B Models

VRAM24 GB

CUDA Cores7,424

ArchitectureAda Lovelace

Multi-GPUUp to 8

Models That Fit

Qwen 7B (LoRA)InternVL 8B (LoRA)

A10

Developer

General Purpose Training

VRAM24 GB

CUDA Cores9,216

ArchitectureAmpere

Multi-GPUUp to 8

Models That Fit

Qwen 7B (LoRA)Cosmos 8B (QLoRA)

A100

Developer

Full SFT, Large LoRA Runs

VRAM80 GB

CUDA Cores6,912

ArchitectureAmpere

Multi-GPUUp to 32

Models That Fit

Qwen 32B (LoRA)Qwen 7B (Full SFT)

H100

Professional

Large-Scale Production Training

VRAM80 GB

CUDA Cores16,896

ArchitectureHopper

Multi-GPUUp to 64

Models That Fit

Qwen 32B (Full SFT)InternVL 38B (LoRA)

B200

Enterprise

Largest Models, Multi-Node

VRAM192 GB

CUDA Cores18,000+

ArchitectureBlackwell

Multi-GPUUp to 64

Models That Fit

Qwen 72B (LoRA)Any Model (Full SFT)

▸

Multi-GPU Scaling

Scale to 2, 4, 8, 16, 32, or 64 GPUs per training run. H100 and B200 clusters use NVLink interconnect for high-bandwidth gradient synchronization. Automatic sharding with FSDP or DeepSpeed ZeRO-3.

HYPERPARAMETER CONFIGURATION

FULL CONTROL. EVERY KNOB.

Configure every training parameter from the dashboard or submit a JSON config via the API. Vi validates configurations against model architecture constraints and VRAM limits before provisioning hardware.

▸
Learning Rate Scheduling
Cosine annealing, linear decay, or constant rate. Configurable warmup steps with linear or exponential ramp.
▸
LoRA Target Modules
Select which attention projection matrices to adapt: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj.
▸
System Prompt Templates
Define custom system prompts per training run. Templates support variable injection for domain, task type, and output format.
▸
Validation Split
Automatic holdout validation with configurable split ratio. Early stopping based on validation loss plateau detection.

Training Configuration

{

"base_model": "Qwen/Qwen2.5-VL-7B",

"training_method": "lora",

"lora_config": {

"rank": 16,

"alpha": 32,

"target_modules": ["q_proj", "v_proj"],

"dropout": 0.05

"epochs": 5,

"learning_rate": 2e-4,

"batch_size": 4,

"optimizer": "adamw",

"warmup_steps": 100,

"weight_decay": 0.01,

"scheduler": "cosine",

"quantization": "none",

"gpu": {

"type": "H100",

"count": 2

"system_prompt": "You are a medical..."

}

MODEL ZOO

SUPPORTED BASE MODELS.

Start from any supported vision-language model. Vi handles tokenizer configuration, conversation templates, and architecture-specific training optimizations automatically.

ALIBABA

Qwen2.5-VL

CTX32K

Dynamic Resolution for Images and Video. Recommended Default for Most Tasks.

3B7B32B72BRECOMMENDED

OPENGVLAB

InternVL3.5

CTX32K

Visual Resolution Router for Adaptive Token Compression. Fine-Grained Phrase Grounding.

1B2B8B38BFINE-GRAINED

NVIDIA

Cosmos-Reason2

CTX256K

Physical-World Reasoning for Robotics and Embodied AI. Chain-of-Thought Spatial Reasoning.

2B8BCHAIN-OF-THOUGHT

NVIDIA

NVILA-Lite

CTX32K

Compact Model Optimized for Edge Deployment. Scale-Then-Compress Architecture.

2B8BEDGE

ALIBABA

Qwen3-VL

CTX256K

Interleaved Multimodal Context with Thinking Mode for Chain-of-Thought Reasoning. Extensible to 1M Tokens.

2B8B32BLATEST

GOOGLE

PaliGemma 2

CTX8K

SigLIP Vision Encoder with Gemma 2 Text Decoder. Multi-Resolution Variants.

3B10B28BMULTI-RES

CHECKPOINT BRANCHING

TRAIN THE BASE. FORK THE REST.

Train a base model on your core dataset. Fork the checkpoint for each use case, team, or deployment target. Each branch inherits the foundation and fine-tunes further with specialized data.

Build Your Foundation

Train on your full private dataset. This becomes the base checkpoint all downstream variants inherit.

Fork Per Use Case

Each branch fine-tunes on specialized data. Internal teams, external deployments, or customer-specific models.

Preserve IP

Branch users never access your base training data. They only see their checkpoint and deployment endpoint.

Scale Without Retraining

The 10th deployment costs the same as the 1st. Fork, fine-tune, deploy. Repeatable pipeline.

FINE-TUNE VLMS ON
MANAGED INFRASTRUCTURE.

THREE PATHS TO A BETTER MODEL.