The Cancer Genome Atlas (TCGA) is a project to catalogue the genomic alterations responsible for cancer using genome sequencing and bioinformatics.[1][2] The overarching goal was to apply high-throughput genome analysis techniques to improve the ability to diagnose, treat, and prevent cancer through a better understanding of the genetic basis of the disease.
TCGA was supervised by the National Cancer Institute's Center for Cancer Genomics and the National Human Genome Research Institute funded by the US government. A three-year pilot project, begun in 2006, focused on characterization of three types of human cancers: glioblastoma multiforme, lung squamous carcinoma, and ovarian serous adenocarcinoma.[3] In 2009, it expanded into phase II, which planned to complete the genomic characterization and sequence analysis of 20–25 different tumor types by 2014. Ultimately, TCGA surpassed that goal, characterizing 33 cancer types including 10 rare cancers.[4][5]
The project initially set out to collect and characterize 500 patient samples, more than most genomics studies of its time, and used a variety of different molecular techniques. Techniques included gene expression profiling, copy number variation profiling, SNP genotyping, genome wide DNA methylation profiling, microRNA profiling, and exon sequencing. With restraints of nascent technology and costs at the start of the project, many array-based technologies and limited targeted gene sequencing were performed. During II, TCGA was able to begin performing whole exome and whole transcriptome sequencing on all cases and whole genome sequencing on 10% of the cases used in the project.