St. Jude Cloud Paper

January 6, 2021

Genomic Data-Sharing Ecosystem for the Precision Medicine Era

St. Jude Children’s Research Hospital created the largest cloud-based genomic resource for pediatric cancer and a data-sharing model to accelerate life-saving research.

St. Jude Cloud is the largest cloud-based genomic data resource for pediatric cancer, growing to host over 1.25 petabytes of genomic data available for researchers since its launch just a few years ago. But it's not just the genomics data that has grown. The site now hosts suites of analysis and visualization tools that accelerate research by allowing users to bypass analyzing the raw data from scratch.

Key Findings

  • St. Jude Cloud launched in April 2018 as a cloud-based, genomic data-sharing platform to improve diagnosis, treatment and outcomes for young cancer patients. St. Jude scientists developed the platform in collaboration with Microsoft and DNAnexus.
  • St. Jude Cloud has since grown enormously in size, scope and function. Today, the platform is a data-sharing ecosystem for accessing, analyzing and visualizing genomic data from childhood cancer patients and survivors as well as children with sickle cell disease.
  • St. Jude Cloud is the largest cloud-based, genomic data resource for pediatric cancer. The platform offers 1.25 petabytes of harmonized genomic data, enough data to fill storage available on about 4,900 laptop computers.
  • St. Jude Cloud currently includes more than 12,104 whole genomes, 7,697 whole exomes and 2,202 transcriptomes. At its launch, St. Jude Cloud included more than 5,000 whole genomes and whole exomes and 1,200 transcriptomes.
  • Additional genomic data from the St. Jude clinical genomics program are added regularly, including curated, but unpublished data. The goal is to accelerate research by making data more quickly available to the scientific community. This is particularly important in pediatric cancer research, where more than half of cases involve rare cancer subtypes, which makes the work more challenging.
  • St. Jude Cloud data are available at no cost to researchers worldwide. The site attracts about 10,000 unique users per month.
  • The platform includes three interconnected apps designed to make the site user friendly to investigators without formal computational training. Here are the apps:
    • Genomics Platform provides registered users with access to harmonized raw data plus end-to-end genomic analysis, some using innovative algorithms St. Jude scientists developed that allow for integrated St. Jude Cloud data analysis.
    • Pediatric Cancer Knowledgebase (PeCan) enables exploration of published data contributed by the global research community on acquired (somatic) cancer genomic variations in more than 5,000 pediatric patients.
    • Visualization Community also allows all users to explore user data along with the published data using interactive maps and visualization tools St. Jude scientists developed. The tools give researchers a more integrated view of childhood cancer data, including genomic and epigenomic data plus clinical information.
  • St. Jude Cloud was created as a resource and model of federated data-sharing for the global research community.
  • The publication comes amid ongoing initiatives among researchers, institutions, organizations and government agencies about how to enhance pediatric cancer data sharing.