Cancer Drug Sensitivity Genomics Database: GDSC


Summary

Changes in the cancer genome strongly affect the clinical response to treatment and, in many cases, effective biomarkers of response to drugs. The Cancer Drug Sensitivity Genomics (GDSC) database (www.cancerRxgene.org) is the largest public resource for information on molecular markers of cancer cell drug sensitivity and drug response, and data is provided free of charge without restriction. GDSC currently contains drug sensitivity data from nearly 75,000 experiments, describing the response to 138 anticancer drugs in nearly 700 cancer cell lines. In order to identify molecular markers of drug response, cell line drug sensitivity data and a large genomic data collection obtained from the somatic mutation catalog in the cancer database, including somatic mutations in cancer genes, gene amplification and deletion, and tissue types And transcribed data information. The analysis of GDSC data is through a portal website that focuses on identifying molecular biological markers of drug sensitivity based on specific anti-cancer drugs or cancer gene queries. The graphical representation of the data runs through links to related resources, and all data sets are fully downloadable. GDSC provides a unique resource that contains a large number of drug sensitivity and genomic data sets to facilitate the discovery of new therapeutic biomarkers for cancer treatment.

image

△ Screenshot of the database homepage


1 The GDSC database is based on three types of data sets


1. Cell line drug sensitivity data

Cancer cell line drug susceptibility data was generated by a high-throughput screening conducted by the Wellcome Trust Sanger Institute (WTSI) Cancer Genome Project and Massachusetts General Hospital Molecular Therapy Center, using a collection of >1000 cell lines. The compound selected for screening is an anticancer therapeutic agent including a targeting agent and a cytotoxic chemotherapeutic agent. They are composed of approved drugs for clinical use, clinically developed drugs, clinical trials, and tool compounds in early development. They cover various targets and processes involved in cancer biology, including receptor tyrosine kinase signaling, cell cycle control, DNA damage response, and cytoskeleton. The compounds come from commercial suppliers or are provided by collaborators in academia, biotechnology, and the pharmaceutical industry. After 72 hours of drug treatment, fluorescence-based cell viability assay was used to determine cytotoxic drug sensitivity. The dose-response curve fits the fluorescence signal intensity of more than nine drug concentrations (2 times dilution series) to obtain the multi-parameter characteristics of the drug response. The values ​​displayed on the website include the half maximum inhibitory concentration (IC50), the slope of the dose-response curve, and the area under the curve for each experiment.

GDSC (released in July 2012) includes drug sensitivity data of 138 anticancer compounds, and screening of 329-668 cell lines (mean=525 cell lines per drug) for each drug, representing 73 169 cell lines- medicine interactions. This is the most sensitive public resource for cancer cell drugs.


2. Genomic datasets for cell lines

The total data set available for screening includes> 1000 different cancer cell lines. These were selected to represent the spectrum of conventional and rare types of adult and childhood cancers of epithelial, mesenchymal, and hematopoietic origin. The genome data set currently available for each cell line includes information about somatic mutations in 75 cancer genes, extensive gene copy number amplification and deletion in the genome, targeted screening of seven gene rearrangements, and microsatellite instability Marking, tissue type, and transcription data. Using various statistical methods as described below, the genome data set is used together with the drug sensitivity data of each cell line to identify genomic biomarkers of drug response. The omics data set in GDSC is directly obtained and updated from the "Cancer Somatic Mutation Catalog" (COSMIC) database.


3. Analysis of genomic features of drug sensitivity

An important part of the GDSC database is the systematic integration of large-scale genome and drug sensitivity data sets. In order to determine the genomic markers of drug response, two complementary analysis methods are currently used. Use multivariate analysis of variance (MANOVA) to correlate drug sensitivity (IC50 value and slope of dose-response curve) with genomic changes in cancer, including point mutations, amplification and deletion of common cancer genes, cancer gene rearrangements and Microsatellite instability. MANOVA identifies individual genomic characteristics related to drug sensitivity, the size effect and statistical significance of each drug-gene association.

We also applied elastic network regression to determine the genomic characteristics of multiple interactions that affect the response of each drug. The genomic data used in elastic net analysis includes all the data used in MANOVA, and also contains the genome-wide transcription profile and tissue type. The elastic net selects which of these characteristics is associated with the drug response.

A more detailed description of the different statistical analyses performed and guidance on interpreting the results can be found on the "Help & Documentation" page of the "statistical analysis" tab.


1 Query the GDSC database


In order to facilitate data interpretation, graphical representations with interactive functions should be used as much as possible. Querying the database is mainly based on the specific filter "Compounds" or "CancerGenes" in the "Browse ourdata" section of the homepage (Figure 1). The "Compounds" browse displays a list of drug names and their associated synonyms, putative treatment goals, cell coefficients (sample size) screened for each drug, and the date when the latest data for each compound was updated. Provides a link to the PUBCHEM database of chemical structures. By clicking on the name of a specific drug, the user enters a separate drug page for drug sensitivity and genomic correlation data.

Similarly, browse "Cancer Genes" to enter the list of cancer genes recognized by its HUGO name. This page provides a direct link to the COSMIC page of the gene and the UniProt database for more protein information. Click on the gene name to access the drug sensitivity and genome correlation data on the individual gene page.

You can also use the "Search" function to query the database (Figure 1). The "Search" box accepts queries based on compound (including synonyms), oncogene or cell line name. The auto-completion function allows users to quickly select the drug, gene or cell line they are interested in. The search results page lists matching compounds, oncogenes, or cell lines that are linked to the website’s detailed drug/gene page. In the case of cell line matching, the link will provide detailed cell line information in COSMIC

image

△ Figure 1 Database Workflow


1 Data analysis and visualization


Screening data and genome relevance are accessed through specific drug or gene pages (Figures 2 and 3). The top layer provides drug or genetic information and connects to PUBCHEM, COSMIC and UniProt databases as appropriate. It is worth noting that the top panel also provides links to related help pages to explain the data and analysis performed. The "Help & Documentation" link in the header at the top of all pages also provides additional information. The actual screening data and analysis are displayed in the bottom panel of the drug/gene page, and are divided into the following tabs: volcano map, volcano data, elastic net (drug page only), scatter plot, and download data. Volcano plots are used to calculate the correlation between drug sensitivity data and genetic events using MANOVA. The drug page displays a drug-specific volcano map, which shows how different genomic changes affect the response to a specific drug (Figure 2). The gene page shows a gene-specific volcano map that shows the effect of the mutated oncogene on the response of all the drugs analyzed (Figure 3). For example, the drug-specific volcano plot of the BRAF inhibitor PLX4720 shows that mutations in the gene BRAF are significantly associated with sensitivity to the compound (Figure 2). In contrast, the gene-specific volcano map of BRAF shows that mutations in this gene are related to the sensitivity of multiple drugs, including several different BRAF inhibitors (ie, PLX4720, SB590885, and AZ628) (Figure 3). In both cases, the x-axis represents the influence of drug-specific interactions on the IC 50 value of the cross-screen cell line, and the y-axis represents the importance of the interaction (P value). By hovering over each circle, provide the following information: geneticevent sample size (i.e., the cell coefficient screened with a specific mutation), effect size, and P value. By clicking on a single circle, you can link to a scatter plot of the IC50 value of the associated cell line (see below). The Volcano Data tab represents the volcano map data as a sortable table. The three buttons at the top of the table allow you to download the table in .csv, .tab or .xlsx file format.

image

△ Picture 2

image

△ Picture 3


Similarly, the Elastic Network tab contains a graphical representation of the results from the elastic network analysis of drug sensitivity (Figure 4). For effective visualization, up to 10 salient features related to drug response are displayed. These may include tissue type, cancer gene mutation, expression level and gene copy number. Each graph contains three elements: a bar graph of the effect size of the salient feature (right), a heat map of the genomic feature (center graph), and the IC 50 of the 20 least sensitive and most sensitive cell lines (bottom) Heat map of the value. For example, elastic net analysis of the BRAF inhibitor PLX4720 identified mutations in the BRAF gene, tissue type skin, and several transcriptional features (BCL2A1, GYPC, and DAAM2) related to drug sensitivity (Figure 4). Unlike the MANOVA analysis, the gene-specific correlation of elastic net analysis is not represented because EN describes how multiple genes affect drug sensitivity at the same time.

image

△ Picture 4


1 Data download


Since the focus of the website is the graphical representation of data, both volcanoes and scatter plots can be downloaded as .png or .svg files. In addition, the original data can be downloaded in .csv or .xlsx format. As described below, you can download data for a specific drug or gene on its related page, or download all of our analysis data from a series of large spreadsheets.

On the drug page of a particular compound, the available downloads include:

Drug sensitivity data (cell line IC50 value table);

Genomic changes in the cell line;

Genome correlation with MANOVA;

Elastic network analysis of drug sensitivity.

On the gene page, a single data download is available, which contains the MANOVA correlations of how the genes of the entire compound group are related to the drug response.

The drug sensitivity and genome data set is not to download drug or gene-specific data, you can also directly download the whole through the "Downloads" page. This can be accessed directly from the top of each page.

Downloadable files include:

Cell line tissue type, drug sensitivity and genomic data for MANOVA;

MANOVA results for all compounds;

Tissue-specific analysis of variance to examine the effect of tissue type on drug response;

Elastic net results for all compounds;

Cell line genome and transcription data for elastic net analysis;

The list of cancer cell series in our collection is constantly updated.

Please note that some of these files contain a large number of columns, and if you open the file in Excel 2003 or lower, the data will be lost because the worksheet size is limited to 256 columns. The "Downloads" page also provides access to archived files of previous data releases.


references:


Yang W, Soares J, Greninger P, et al. Genomics of Drug Sensitivity in Cancer (GDSC): a resource for therapeutic biomarker discovery in cancer cells. Nucleic Acids Research. 2013;41(Database issue):D955-D961. doi:10.1093/nar/gks1111.



Guess you like

Origin blog.51cto.com/15127592/2674940