OncoCITE

Multi-Agent Genomic Evidence Extraction System (Submitted to Nature Cancer)

Overview

OncoCITE is a multi-agent AI system for automated genomic evidence extraction from scientific literature, currently submitted to Nature Cancer.

Try OncoCITE Live

Interactive genomic evidence extraction for Multiple Myeloma

Launch Demo →

The Problem

Through systematic exploratory data analysis on the CIViC database (11,312 evidence items, 3,083 publications), I quantified 12 structural bottlenecks:

  • Curation latency: Median 31 days, P90 >21 months
  • Pareto inefficiency: Top 100 sources = only 29% coverage
  • Emerging target gaps: GPRC5D has 0 items despite FDA approval

Technical Architecture

6-Agent Collaborative System

Built using Claude Agent SDK with sophisticated orchestration:

  • 22 custom MCP tools for genomic data extraction
  • State serialization enabling pause-resume for long-running extractions
  • Vision-based PDF extraction (300 DPI) replacing traditional OCR
  • Hierarchical multi-tier architecture with supervisor-worker pattern

Novel Validation Framework

Developed a three-way validation methodology treating source publications as ground truth—identified 24.2% curation errors in expert-curated databases.

Results

Metric Value
Ground Truth Recovery 84%
Novel Discovery Precision 97.8%
Critical Errors 0% (n=108)
Ontology Resolution 83.12% across 20 Tier-2 fields

Validated on 15-paper corpus and deployed normalizer to enrich all 11,312 CIViC items.

Key Innovations

  • State Serialization: Enables deterministic replay and debugging of multi-agent workflows
  • Cross-field Validation: Automated consistency checking across extracted fields
  • Hallucination Prevention: Input/output guardrails with reasoning chain validation

This project resulted in the following publications:

Technical Stack

Agent Framework:    Claude Agent SDK
Tools:              22 custom MCP tools
PDF Processing:     Vision-based extraction (300 DPI)
Orchestration:      State graphs, pause-resume workflows
Validation:         Three-way ground truth framework
Database:           CIViC (11,312 items)
Ontologies:         DOID, EFO, NCIt, HPO via EMBL-EBI OLS

References

2025

  1. Nature Cancer
    oncocite.png
    OncoCITE: AI-Driven Genomic Evidence Curation for Hematologic Malignancies
    Mujahid Ali Quidwai, Santiago Thibaud, Sundar Jagannath, and 2 more authors
    Nature Cancer, 2025
    Submitted to Nature Cancer
  2. ASH
    oncodif.png
    Oncodif: An Auditable AI Framework for Automated Genomic Curation and Natural-Language Clinical Querying in Hematologic Malignancies
    Mujahid Ali Quidwai, Santiago Thibaud, Sundar Jagannath, and 2 more authors
    Blood, 2025
    ASH 2025 Conference Abstract