Cancer Fundamentals

Created by Lily Vittayarukskul for SVAI research community. Open to collaborators!

What causes Cancer?

Many different types of cancer at the molecular level include:

  1. Mutations in proto-oncogenes that alter the function of the regular cell cycle to uncontrollable cell division

  2. Mutations in cancer suppressor genes that alter their cell regulatory mechanism

  3. Mutations in DNA-repair genes that cause further mutations in cells instead of repairing them

Mutation patterns in cancer genomes

(Source: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2799788/)

Substitution rates depend on flanking nucleotides, a notable example being cytosine in CpG dinucleotides which, in mammals, is usually methylated at the 5-carbon and undergoes hydrolytic deamination to thymine at a relatively high rate

However, it remains possible that many mutations occurring during cancer growth are deleterious to the cell and consequently eliminated by selection. If so, the set of passenger mutations would not faithfully reflect the underlying mutation process. Since selection on germline mutations in coding sequences acts mainly at the amino acid level (24), we assume that this is also true of somatic mutations and that we can therefore explore the effects of selection by comparing frequencies of nonsynonymous and synonymous substitutions. Using data from two studies, where both types of substitution were catalogued (4, 5), we find the overall nonsynonymous/synonymous frequency ratios for pancreatic cancer to be 0.95, and for glioblastoma multiforme to be 1.10, neither of which is significantly different from 1 (P = 0.57 and 0.43, respectively). This indicates that (in contrast to germline mutations) the set of nonsynonymous mutations in cancer is not strongly biased by selection.

while selection certainly acts on some mutations, the set of mutations in cancer cells is not significantly biased by negative or positive selection, and we therefore assume in the following that the sets of nonsynonymous nucleotide substitutions reported by cancer sequencing studies mainly reflect the underlying mutation process that generated them.

A→G/T→C Mutational Asymmetry — higher A -> G mutation frequency associated with higher expression in that gene

Dinucleotide Hotspots as a Signature — TpC/GpA dinucleotide is a mutation hotspot in a subset of cancer types. (CpG dinucleotides are a mutation hotspot in breast cancer, colorectal cancer, pancreatic cancer, and glioblastoma (1, 35). The relative fractions of mutations occurring in TpC and CpG hotspots vs. other sites may thus be viewed as a “signature” that presumably reflects the nature of the mutational mechanisms in different cancers or their precursor somatic tissues. As such, it may provide a useful tool for monitoring data quality in large sequencing studies.)

Developing Cancer Signatures

(Source: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3588146/)

A total of 183,916 somatically acquired base substitutions were identified (see Table S1B for hyperlinks). In protein coding regions, there were 1,372 missense, 117 nonsense, 2 stop-lost, 37 essential splice-site, and 521 silent mutations.

Of the 2,869 indels identified, 2,233 were deletions, 544 insertions and 92 complex. There were 21 coding indels, of which 15 were predicted to result in a translational frameshift and six were in-frame. In addition, 1,192 structural variants (rearrangements), 16 homozygous deletions, and 14 regions of increased copy number (amplifications) were identified (Table S1C).

Likely driver substitutions and indels in cancer genes were found in TP53, GATA3, PIK3CA, MAP2K4, SMAD4, MLL2, MLL3, and NCOR1 (Table S1C). Amplification was observed over cancer genes previously implicated in breast cancer development including ERBB2, CCND1, MYC, MDM2, ZNF217, and ZNF703 and a homozygous deletion involving MAP2K4 was identified. All tumors derived from BRCA1 or BRCA2 germline mutation carriers showed loss of wild-type haplotypes at 17q21 or 13q12, respectively, as expected of recessive cancer genes (Table S1B).

The set of somatic mutations in a cancer genome is the aggregate outcome of one or more mutational processes. Each process leaves a mutation signature on the cancer genome defined by the mechanisms of DNA damage and repair that constitute it. The final catalog of mutations is determined by the strength and duration of exposure to each mutational process. We set out to extract the mutation signatures characterizing the mutational processes operative in the 21 breast cancers studied.

This paper provides a systematic computational framework that can be used for accurately deciphering signatures of mutational processes from mutational catalogs of cancer genomes.

In this study, a signature of a mutational process is represented as a discrete probability density function with a domain of preselected mutation features.

Different cancer genomes can be exposed to a particular mutational process at different intensities. For example, a mutational process could cause 1,000 mutations in one cancer genome while causing 20,000 in another. A cancer somatic mutation catalog can be examined as a linear superposition of the signatures and intensities of exposure of mutational processes active at some point in the lineage of cells leading to the cancer cell, plus added noise due to nonsystematic sequencing or analysis errors.

(LOOK AT THIS AGAIN; STOPPED AT THIS GRAPH)

Important definitions

  • missense mutation is a point mutation in which a single nucleotide change results in a codon that codes for a different amino acid.

  • point-nonsense mutation is a point mutation in a sequence of DNA that results in a premature stop codon, or a point-nonsense codon in the transcribed mRNA, and in a truncated, incomplete, and usually nonfunctional protein product.

  • A stop-loss mutation is the loss of the normal stop codon by mutation to encode an amino acid

  • splice site mutation is a genetic mutation that inserts, deletes or changes a number of nucleotides in the specific site at which splicing takes place during the processing of precursor messenger RNA into mature messenger RNA.

  • Silent mutations are mutations in DNA that do not have an observable effect on the organism's phenotype. They are a specific type of neutral mutation.

  • An indel is an insertion or deletion of nucleotides

  • Complex indels are formed by simultaneously deleting and inserting DNA fragments of different sizes at a common genomic location.

  • Translational frameshifting or ribosomal frameshifting refers to an alternative process of protein translation, potentially changing the series of codons produced.

  • Homozygous deletion refers to zero copies of the region (usually referring to gene) of interest on either chromosome arm.

  • heterozygous deletion means one of two copies of region (usually referring to gene) of interest is deleted

    • This can be bad if dominant copy is deleted, or both copies are needed for normal functioning

  • Structural variation (SV) is generally defined as a region of DNA approximately 1 kb and larger in size and can include inversions and balanced translocations or genomic imbalances (insertions and deletions), commonly referred to as copy number variants (CNVs).

A Solid Example: Mutational Processes Molding the Genomes of 21 Breast Cancers

(Source: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3414841/)

“Most somatic mutations in cancers are thought to be “passenger” events that do not contribute to cancer development. These bystanders bear the imprints of the DNA damage and repair processes operative during the development of the cancer, unmodified by selection. The several hundreds to tens of thousands of somatic mutations in each cancer, therefore, potentially allow much greater resolution of mutational patterns and insights into underlying mutational processes.”

  • We employed a nonnegative matrix factorization (NMF) and model selection approach (Berry et al., 2007) to extract mutational signatures from the 21 cases. NMF extracts interpretable features from complex multidimensional data (Berry et al., 2007; Lee and Seung, 1999). For example, application to images of faces yields familiar components such as eyes, nose, and mouth (Lee and Seung, 1999). Our desire to extract biologically meaningful mutational signatures, as well as the intrinsic nonnegativity of the mutation spectrum data, renders NMF an appropriate choice for factorizing the data from the 21 cases.

  • Evaluation of NMF decompositions (Berry et al., 2007) (Extended Experimental Procedures and Figures S1A–S1C) suggested that a best estimate of five biologically distinct mutational signatures were present in the 21 cancers (named A–E, Figure 2A). Each signature was characterized by a different profile of the 96 potential trinucleotide mutations and contributed to a different extent to each of the 21 cancers. Different combinations of the five signatures account for the variation in the 21 mutational catalogs (Figure 1D).

  • The mutational signature is based off what substitution types (6) and the two base pairs surrounding the substitution. <why?> generating 96 possible mutation types (6 types of substitution ∗ 4 types of 5’ base ∗ 4 types of 3’ base). Mutational signatures are displayed and reported based on the observed trinucleotide frequency of the human genome

STOPPED AT: ‘Extracting Mutation Signatures from Catalogues of Somatic Mutation’ section

Last updated