This page provides a comprehensive guide to all raw and processed data files in the osteosarcoma genomics project. Data is stored in a Google Cloud Storage bucket and organized by data type and time point. Each section attributes the data to the lab or individual who generated or processed it.
(This page was originally generated from this source Google Doc, but we have since added more details to this page that are not in the GDoc.)
gs://osteosarc-genomicsThe dataset spans four clinical time points and includes whole genome sequencing (WGS), whole exome sequencing (WES), bulk RNA-seq, single-cell RNA-seq (Illumina & Oxford Nanopore), spatial transcriptomics, flow cytometry, and pathology imaging.
Contributors: Boston Gene UCLA UCSF Tempus Elucidate Bio Hudson Lab Jeremiah & Alfredo RCRF / Pattern Unify
gs://osteosarc-genomicsDNA sequencing is available at four time points (T0–T3), including both whole exome (WES) and whole genome (WGS) data.
Coverage: T2 January 2025 WGS — 130.97×. T1 Blood Normal WGS — 62.56×.
| Timepoint | Callers | Bucket Path | Pipeline |
|---|---|---|---|
| Jun 2024 (T1) | ASCAT, CNVkit, FreeBayes, HaplotypeCaller, Manta, Mutect2, Strelka | genomics_reprocessing/DNA/T1_2024_WGS_sarek_variants | Sarek 3.5.1 |
| Jan 2025 (T2) | ASCAT, Manta, Mutect2, Strelka | genomics_reprocessing/DNA/T2_2025_01_WGS_sarek_variants/variant_calling | Sarek 3.5.1 |
| Timepoint | Sample | Bucket Path | Source |
|---|---|---|---|
| Nov 2022 (T0) | Tumor | genomics/genomics-bulk/2022.12.16/DNA/2022.12.16.dna.bostongene.WES/fastqs/tumor | Boston Gene |
| Nov 2022 (T0) | Normal (Blood) | genomics/genomics-bulk/2022.12.16/DNA/2022.12.16.dna.bostongene.WES/fastqs/normal-blood | Boston Gene |
| Jun 2024 (T1) | Tumor | genomics/genomics-bulk/2024.06.06/DNA/2024.06.06.dna.bostongene.WES/raw/tumor | Boston Gene |
Bulk RNA-sequencing is available at three tumor time points (T0, T1, T2), including both short-read (Illumina) and long-read (PacBio) data.
| Timepoint | Data Type | Bucket Path | Size | Source |
|---|---|---|---|---|
| Nov 2022 (T0) | FASTQ | rna-seq/fastq/bostongene_2022 | 10 GB | Boston Gene |
| Nov 2022 (T0) | STAR alignments | genomics/genomics-bulk/2022.12.16/RNA/2022.12.16.rna.bostongene/processed | Alfredo | |
| Nov 2022 (T0) | FASTQ | rna-seq/fastq/tempus_2022 | 1.4 GB | Tempus |
| Jun 2024 (T1) | FASTQ | rna-seq/fastq/bostongene_2024 | 8.3 GB | Boston Gene |
| Jun 2024 (T1) | FASTQ | rna-seq/fastq/personalis_2024 | Personalis | |
| Jun 2024 (T1) | STAR alignments | genomics/genomics-bulk/2024.06.06/RNA/2024.06.06.rna.bostongene/processed | Alfredo | |
| Jun 2024 (T1) | PacBio long-read (all) | ucsf/T1/pacbio_bams/IPISRC044_T1_sclrs | 287 GB | UCSF |
| Jan 2025 (T2) | FASTQ | rna-seq/fastq/ucla_2025 | UCLA | |
| Jan 2025 (T2) | STAR alignments | genomics/genomics-bulk/2025.01.06/RNA/2025.01.06.rna.ucla-core/processed/STAR | Alfredo |
Tumor single-cell and long-read RNA sequencing data from UCSF, spanning time points T1–T3. Includes 10x Illumina scRNA-seq and Oxford Nanopore (ONT) long-read sequencing.
All Illumina scRNA-seq data generated by UCSF
Re-analysis of tumor scRNA-seq data by Kamil, including Cell Ranger multi outputs (GEX + TCR + BCR), scanpy-based clustering, cell type prediction, and CNV analysis.
| Timepoint | Data Type | Bucket Path |
|---|---|---|
| Jun 2024 (T1) | Cell Ranger output | kamil/tumor/output/T1 |
| Jan 2025 (T2) | Cell Ranger output | kamil/tumor/output/T2 |
| Apr 2025 (T3) | Cell Ranger output | kamil/tumor/output/T3 |
| Apr 2025 (T3) | Cell Ranger output (CD45-) | kamil/tumor/output/T3_CD45neg |
| T1–T3 | Analysis (h5ad, markers, CNV, TCR) | kamil/tumor/analysis |
| Timepoint | Data Type | Bucket Path | Size |
|---|---|---|---|
| Jun 2024 (T1) | FASTQ | ONT/IPISRC044_ONT_upload/IPISRC044_ONT/fastqs/IPISRC044_T1_ONT_fastqs/fastq_pass | 208 GB |
| Jun 2024 (T1) | BAMs | ONT/IPISRC044_ONT_upload/IPISRC044_ONT/processed/IPISRC044_T1_sclrs_ONT/IPISRC044_T1_sclrs_ONT | 109 GB |
| Jan 2025 (T2) | FASTQ | ONT/IPISRC044_ONT_upload/IPISRC044_ONT/fastqs/IPISRC044_T2_ONT_fastqs/fastq_pass | 211 GB |
| Jan 2025 (T2) | BAMs | ONT/IPISRC044_ONT_upload/IPISRC044_ONT/processed/IPISRC044_T2_sclrs_ONT/IPISRC044_T2_sclrs_ONT | 153 GB |
| Apr 2025 (T3) | FASTQ | ONT/IPISRC044_ONT_upload/IPISRC044_ONT/fastqs/IPISRC044_T3_ONT_fastqs/fastq_pass | 219 GB |
| Apr 2025 (T3) | BAMs | ONT/IPISRC044_ONT_upload/IPISRC044_ONT/processed/IPISRC044_T3_sclrs_ONT/IPISRC044_T3_sclrs_ONT | 150 GB |
All ONT data generated by UCSF
| Description | Bucket Path | Size |
|---|---|---|
| Loupe object for T1-T2-T3 merge | ucsf/misc_seurat_loupe_objects/T3-T2-T1-merge.loupe.object.UCSF.annot/ucsf_seurat_20250806_100527.cloupe.cloupe | 0.4 GB |
| Seurat RDS merged (Harmony-integrated) | ucsf/misc_seurat_loupe_objects/src044_for_pfo/072925_IPISRC044_T1_T2_T3_sobj_merged_processed_tcr_bcr_mutMap_tcMap_annot_final.rds | 5.3 GB |
Annotations guide: Tumor cell labels are in T1_T2_T3_overall_TC_identity.
Final cell type annotations are in fine_final_annot.
Cluster labels are in merged_louvain_res1.5.
Coarse annotations are in coarse_final_annot.
Key findings: Tumor cell percentage (relative to whole sample) decreases by T3, consistent with histopathology. Increase in immune infiltration across time points. See Darya Orlova’s analysis.
Sorted live T cells (CD3+) from PBMCs at multiple time points. This dataset is growing as additional time points are collected. Data includes up to 2026-01-13.
hudson_lab/PBMC_scRNAseq/FASTQ
— time points: Jan 2025, Apr 2025, May 2025, Jun 2025, Jul 2025, Aug 2025, Sep 2025, Nov 2025, Dec 2025hudson_lab/PBMC_scRNAseq/cellrangerhudson_lab/PBMC_scRNAseq/seurat_objectshudson_lab/flow_datahudson_lab/peptide_expansionData generated by Hudson Lab
Re-analysis of PBMC scRNA-seq data by Kamil, including Cell Ranger multi outputs for GEX, TCRαβ, TCRγδ, and CITE-seq across multiple blood draw time points.
kamil/blood/output
— GEX, TCRαβ, TCRγδ, CITE-seq pools for each time pointkamil/blood/analysis
— combined CNV (h5ad), TCR clonotype analysis, database matchingGenerated by Elucidate Bio. Includes multiplexed IF data with Phenocycler Fusion instrument and custom conjugated antibodies, plus Visium HD run on the same section.
elucidate/SIN-01_20250915Note: The Visium data is quite sparse. Web summaries: Sample 1 · Sample 2. Elucidate extracted signal from Visium by pseudobulking RNA counts by cell type as called by proteins. See presentation.
Xenium in-situ spatial transcriptomics for T0 tissue blocks B3 and C3.
xenium/T0/output-XETG00279__0078011__B3__20260211__182948 — Block B3xenium/T0/output-XETG00279__0078018__C3__20260211__182947 — Block C3xenium/cartoscope — Cartoscope visualization outputORION highly multiplexed immunofluorescence imaging and Minerva story visualizations.
hms_spatial/orion — ORION OME-TIFF imageshms_spatial/minerva — Minerva stories for H&E blocks, ORION, and IHC (B7-H3, EphA2)Histopathology slides and immunohistochemistry images. Viewable at osteosarc.com/imaging.
| Timepoint | Blocks / Stain | Bucket Path | Size | Source |
|---|---|---|---|---|
| Nov 2022 (T0) | H&E — B1, B2, B3, B4, C3 | elucidate/HE_images/ | 16.4 GB | Elucidate (scanned) |
| Nov 2022 (T0) | H&E — B9, B10, B12, B14, B15, B16, C1, C3, D1, D2, D3 | pathology_images/czi_scans_ucsf_2022 | 10 GB | UCSF (Zeiss) |
| Apr 2025 (T3) | B7-H3 IHC | UCSF | ||
| Apr 2025 (T3) | EphA2 IHC | UCSF |
Red Cross HLA typing results.
| HLA-A | HLA-B | HLA-C |
|---|---|---|
| A*01:01 | B*08:01 | C*01:02 |
| A*01:11N | B*27:05 | C*07:01 |
| Locus | Allele 1 | Allele 2 |
|---|---|---|
| HLA-DPA1 | *01:03 | *02:01 |
| HLA-DPB1 | *04:01 | *11:01 |
| HLA-DQA1 | *05:01 | *04:01 |
| HLA-DQB1 | *04:02 | *02:01 |
| HLA-DRB1 | *03:01 | *08:01 |
| HLA-DRB3 | *01:01 | *01:01 |
| HLA-DRB4 | Absent | |
| HLA-DRB5 | Absent | |
An R Shiny app compares bulk RNA-seq data to healthy tissues (GTEx) and other cancers (TCGA).
All data is stored in the gs://osteosarc-genomics bucket.
Several datasets are indexed in the RCRF Pattern Unify system for programmatic access.
Please contact us with any questions or comments.