MedGemma 1.5: Google's Open Medical AI Just Got Serious

Google just dropped MedGemma 1.5, and it’s a significant upgrade for anyone building clinical AI systems. As someone who’s spent the last two years building production medical AI at Mount Sinai, I want to break down what’s actually new and why it matters.

What’s New in 1.5

MedGemma 1.5 isn’t just an incremental update. It adds entirely new capabilities:

mindmap
  root((MedGemma 1.5))
    3D Imaging
      CT Volumes
      MRI Sequences
    Longitudinal
      Prior Comparisons
      Disease Progression
    Pathology
      Whole Slide Images
      Multi-patch Input
    Documents
      Lab Reports
      EHR Data
    Localization
      Bounding Boxes
      Anatomical Features

Architecture Overview

MedGemma builds on Gemma 3’s decoder-only transformer with a specialized medical image encoder:

flowchart TB
    subgraph Input
        A[Medical Image] --> B[MedSigLIP Encoder]
        C[Text Prompt] --> D[Tokenizer]
    end

    subgraph Processing
        B --> E[256 Image Tokens]
        D --> F[Text Tokens]
        E --> G[Gemma 3 Decoder]
        F --> G
    end

    subgraph Output
        G --> H[Generated Text]
        H --> I[Diagnosis/Report/Analysis]
    end

Key Specs

Specification	Value
Parameters	4B
Context Length	128K tokens
Image Resolution	896 x 896
Image Tokens	256 per image
Output Length	8192 tokens
Architecture	Decoder-only Transformer
Attention	Grouped-Query Attention (GQA)

Performance Deep Dive

Medical Text Reasoning

The text-only benchmarks show solid improvements over the previous version:

{
  "title": {
    "text": "Medical QA Benchmarks",
    "left": "center"
  },
  "tooltip": {
    "trigger": "axis"
  },
  "legend": {
    "data": ["Gemma 3 4B", "MedGemma 1 4B", "MedGemma 1.5 4B"],
    "top": "10%"
  },
  "xAxis": {
    "type": "category",
    "data": ["MedQA", "MedMCQA", "PubMedQA", "MMLU Med"]
  },
  "yAxis": {
    "type": "value",
    "min": 40,
    "max": 80
  },
  "series": [
    {
      "name": "Gemma 3 4B",
      "type": "bar",
      "data": [50.7, 45.4, 68.4, 67.2]
    },
    {
      "name": "MedGemma 1 4B",
      "type": "bar",
      "data": [64.4, 55.7, 73.4, 70.0]
    },
    {
      "name": "MedGemma 1.5 4B",
      "type": "bar",
      "data": [69.1, 59.8, 68.2, 69.6]
    }
  ]
}

Imaging Performance

The real story is in the imaging benchmarks. MedGemma 1.5 shows dramatic improvements in 3D imaging and whole-slide pathology:

Task	Gemma 3 4B	MedGemma 1 4B	MedGemma 1.5 4B
CT Classification (7 conditions)	54.5%	58.2%	61.1%
MRI Classification (10 conditions)	51.1%	51.3%	64.7%
WSI Pathology (ROUGE)	2.3	2.2	49.4
EyePACS Fundus	14.4%	64.9%	76.8%
Longitudinal CXR	59.0%	61.1%	65.7%

The jump from 2.2 to 49.4 ROUGE on whole-slide pathology is remarkable.

Quick Start Code

Here’s how to get started with the pipeline API:

from transformers import pipeline
from PIL import Image
import requests
import torch

# Initialize the pipeline
pipe = pipeline(
    "image-text-to-text",
    model="google/medgemma-1.5-4b-it",
    torch_dtype=torch.bfloat16,
    device="cuda",
)

# Load a chest X-ray
image_url = "https://upload.wikimedia.org/wikipedia/commons/c/c8/Chest_Xray_PA_3-8-2010.png"
image = Image.open(
    requests.get(image_url, headers={"User-Agent": "example"}, stream=True).raw
)

# Create the prompt
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "image": image},
            {"type": "text", "text": "Describe this X-ray"}
        ]
    }
]

# Generate
output = pipe(text=messages, max_new_tokens=2000)
print(output[0]["generated_text"][-1]["content"])

Or using the model directly for more control:

from transformers import AutoProcessor, AutoModelForImageTextToText
import torch

model_id = "google/medgemma-1.5-4b-it"

model = AutoModelForImageTextToText.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto",
)
processor = AutoProcessor.from_pretrained(model_id)

# Process inputs
inputs = processor.apply_chat_template(
    messages,
    add_generation_prompt=True,
    tokenize=True,
    return_dict=True,
    return_tensors="pt"
).to(model.device, dtype=torch.bfloat16)

# Generate with control over parameters
with torch.inference_mode():
    generation = model.generate(
        **inputs,
        max_new_tokens=2000,
        do_sample=False
    )

Integration Architecture

Here’s how I’m thinking about integrating MedGemma into a clinical RAG system:

flowchart LR
    subgraph Input Layer
        A[Clinical Query] --> B{Query Type?}
        I[Medical Image] --> B
    end

    subgraph Processing
        B -->|Text Only| C[Text RAG Pipeline]
        B -->|Image + Text| D[MedGemma 1.5]
        C --> E[Vector Search]
        E --> F[Context Assembly]
        F --> D
    end

    subgraph Output
        D --> G[Clinical Response]
        G --> H[Citation + Evidence]
    end

What This Means for Clinical AI

The Good

Open weights - Unlike GPT-4V or Med-PaLM, you can actually run this locally
4B parameters - Fits on a single GPU, practical for production
3D imaging support - CT/MRI interpretation was a major gap
EHR understanding - Finally, a model that can parse clinical notes

The Limitations

Important: MedGemma is not intended for direct clinical use. All outputs require independent verification.

Single-image evaluation only (no multi-image comparison in benchmarks)
Not optimized for multi-turn conversations
Prompt-sensitive (more than base Gemma)
English-only evaluation

Comparison with Alternatives

quadrantChart
    title Model Positioning
    x-axis Closed Source --> Open Source
    y-axis General Purpose --> Medical Specialized
    quadrant-1 Best for Research
    quadrant-2 Enterprise Only
    quadrant-3 Limited Medical Use
    quadrant-4 Production Ready
    GPT-4V: [0.2, 0.4]
    Med-PaLM: [0.15, 0.85]
    LLaVA-Med: [0.75, 0.6]
    MedGemma: [0.8, 0.8]
    Gemma-3: [0.85, 0.3]

Next Steps

I’m planning to:

Benchmark against our RAG system - How does MedGemma compare to Mistral-7B for clinical QA?
Test 3D imaging - We have CT data from the MMAP pipeline
Fine-tune for hematology - Our OncoCITE system could benefit from better image understanding

Resources

What are you building with MedGemma? I’d love to hear about your use cases in the comments below.