SASD: THE SYNTHETIC ALTERNATIVE SPLICING DATABASE FOR IDENTIFYING NOVEL ISOFORM FROM PROTEOMICS

Date

2013-04-12

Authors

Zhang, Fan

ORCID

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Purpose: Alternative splicing is an important widespread mechanism for generating protein diversity and regulating protein expression. In human cells, about 40-60% of the genes are known to exhibit alternative splicing. Recent methodological advances, including EST sequencing, exon array, exon-exon junction array, and next-generation sequencing of all mRNA transcripts, have made it possible to perform high-throughput alternative splicing analysis. However, high-throughput identification and analysis of alternative splicing in the protein level has several advantages. For example, mRNA abundance in a cell often correlates poorly with the amount of protein synthesized, and proteins rather than mRNA transcripts are the major effector molecules in the cell. The combination of alternative splicing database and tandem mass spectrometry provides a powerful technique for identification, analysis and characterization of potential novel alternative splicing protein isoforms from proteomics. Therefore, we used a three steps pipeline to create an synthetic alternative splicing database(SASD) for tandem mass spectrometry data analysis. Methods: First we derived exons and introns from UCSC Genome Database, then we analyzed six types of combinations of exons and introns for the transcription of artificial splicing gene (exon_exon_normal, exon_exon_skipping, intron_exon, exon_intron, single exon, and single intron), and lastly we performed the translation of the artificial transcripts. Results: In addition, we built a web interface for users to browse 1) by genes/proteins, 2) by biological process, 3) by signaling and metabolic pathway, 4) by disease, 5) by drug, and 6) organ. Lastly, we presented two case studies: 1)in breast cancer and 2) in liver cancer, to demonstrate that the SASD can enable users to analyze, characterize, and understand the impact of alternative splicing on genes involved in drug, disease, pathway, function, and organ-specificity. Conclusions: The SASD provides the scientific community with an efficient means to identify and characterize novel Exon Skipping, Intron Retention, and alternative 3' splice site and 5' splice site protein isoforms from mass spectrometry data. We believe that it will be useful in annotating genome structures using rapidly accumulating proteomics data and assist scientific research on signal transduction pathways regulating pre-mRNA, clinical therapy, disease prevention, and drug development.

Description

Citation

Rights

License