MMAP Pipeline
GPU-Accelerated Genomic Pipeline (56× Speedup, Same-Day Analysis)
Overview
MMAP (Multiple Myeloma Analysis Platform) is a GPU-accelerated genomic processing pipeline with 57 processes across 3 integrated workflows. Transforms patient analysis from 7 days to 3 hours, enabling same-day molecular tumor board readiness.
See the entire system architecture here
The Problem
Traditional genomic pipelines for RNA-seq and whole exome sequencing (WES) required 7+ days of processing time, delaying critical treatment decisions for oncology patients.
| Traditional CPU Pipeline | GPU-Accelerated MMAP |
|---|---|
| 168 hours (7 days) | 3 hours |
| 0.14 patients/day | 8 patients/day |
| RNA-seq: 8-12 hours | RNA-seq: 40 minutes |
| WES Alignment: 48-72 hours | WES Alignment: 2.5 hours |
| Integration: 6-8 hours | Integration: 50 minutes |
Performance
| Metric | Value |
|---|---|
| Performance Gain | 56× |
| Time Reduction | 95.8% |
| Throughput | 8 patients/day (274 cluster-wide) |
| Total Processes | 57 |
| Workflows | 3 (RNA, WES, Integration) |
| Clinical Outputs | 10 TSV/CSV files |
Pipeline Architecture
Process Categories
| Category | Processes | Description |
|---|---|---|
| RNA Primary | 5 | RNA-seq QC, alignment, quantification |
| RNA Secondary | 7 | Expression normalization, scoring, clinical annotation |
| WES Primary | 23 | DNA QC, alignment, variant calling, CNV detection |
| WES Secondary | 17 | Phylogenetics, biomarkers, clinical annotation, cohort analysis |
| Integration | 5 | Merge CNV + Expression, generate clinical recommendations |
Data Flow
RNA Pipeline (12 processes)
- FASTQ → trimmed reads → aligned BAM
- BAM → gene counts → normalized expression
- Expression → risk scores + pathway scores
- Expression → drug sensitivity predictions
- Chimeric reads → fusions → clinical annotations
WES Pipeline (36 processes)
- FASTQ → analysis-ready BAM (6 steps)
- BAM → variants (Mutect2, Lancet, Manta)
- Variants → consensus → annotated VCF
- BAM → CNVs + purity/ploidy (FACETS)
- CNVs + variants → phylogenetic trees
- Trees → clonal architecture
- Biomarkers: MSI, HRD, TMB
Integration (5 processes)
- Expression + CNV → PSN inputs
- PSN → translocation predictions (Neural Net)
- PSN → molecular subtype (Random Forest)
- All data → myeloma hallmarks check
- predict_engine → 7 clinical reports
Clinical Outputs
| Output | Description |
|---|---|
actionable_variants.tsv | Tier 1A-3 variants with evidence |
drug_recommendations.tsv | Ranked therapies by evidence level |
variant_associations.tsv | Clinical associations & trials |
somatic_mutation.tsv | All detected mutations |
cna.tsv | Copy number alterations |
expression.tsv | Clinically relevant gene expression |
venetoclax.tsv | Venetoclax sensitivity prediction |
predicted_translocations.csv | IgH translocation predictions (t(4;14), t(11;14), etc.) |
predicted_class.csv | PSN molecular subtype (PR, MS, CD1, CD2) |
mm_hallmarks_results.csv | Trial eligibility & risk category |
Key Technologies
GPU-Accelerated:
- NVIDIA Parabricks - GPU-accelerated genomics toolkit (fq2bam, HaplotypeCaller)
- NVIDIA RAPIDS - cuDF & cuML for GPU data processing
- NVIDIA H100 GPUs - 196 GPUs (80GB each) on Minerva HPC
- TensorFlow/DeepVariant - Neural network variant calling on GPU
- STAR/Salmon - RNA-seq alignment & quantification (GPU-accelerated)
- BWA-MEM/GATK - DNA alignment & variant calling (GPU via Parabricks)
Pipeline Infrastructure:
- Nextflow DSL2 - Workflow orchestration with LSF scheduler
- FACETS - CNV calling
- PhyloWGS - Tumor evolution
- VICC/CIViC - Clinical annotation
Technical Stack
GPUs: NVIDIA H100 (196 available, 80GB each)
Cluster: Minerva HPC
Accelerators: NVIDIA Parabricks, RAPIDS (cuDF, cuML)
Workflow: Nextflow DSL2
Scheduling: LSF
Containers: Singularity
Languages: Python, Bash, R