Publications -- Fan Zhang

Permanent URI for this collectionhttps://hdl.handle.net/20.500.12503/31205

This collection is limited to articles published under the terms of a creative commons license or other open access publishing agreement since 2016. It is not intended as a complete list of the author's works.

Browse

Recent Submissions

Now showing 1 - 16 of 16
  • Item
    Evaluation of Neighborhood-Level Disadvantage and Cognition in Mexican American and Non-Hispanic White Adults 50 Years and Older in the US
    (American Medical Association, 2023-08-30) Wong, Christina G.; Miller, Justin B.; Zhang, Fan; Rissman, Robert A.; Raman, Rema; Hall, James R.; Petersen, Melissa E.; Yaffe, Kristine; Kind, Amy J.; O'Bryant, Sid E.; Team, HABS-HD Study
    IMPORTANCE: Understanding how socioeconomic factors are associated with cognitive aging is important for addressing health disparities in Alzheimer disease. OBJECTIVE: To examine the association of neighborhood disadvantage with cognition among a multiethnic cohort of older adults. DESIGN, SETTING, AND PARTICIPANTS: In this cross-sectional study, data were collected between September 1, 2017, and May 31, 2022. Participants were from the Health and Aging Brain Study-Health Disparities, which is a community-based single-center study in the Dallas/Fort Worth area of Texas. A total of 1614 Mexican American and non-Hispanic White adults 50 years and older were included. EXPOSURE: Neighborhood disadvantage for participants' current residence was measured by the validated Area Deprivation Index (ADI); ADI Texas state deciles were converted to quintiles, with quintile 1 representing the least disadvantaged area and quintile 5 the most disadvantaged area. Covariates included age, sex, and educational level. MAIN OUTCOMES AND MEASURES: Performance on cognitive tests assessing memory, language, attention, processing speed, and executive functioning; measures included the Spanish-English Verbal Learning Test (SEVLT) Learning and Delayed Recall subscales; Wechsler Memory Scale, third edition (WMS-III) Digit Span Forward, Digit Span Backward, and Logical Memory 1 and 2 subscales; Trail Making Test (TMT) parts A and B; Digit Symbol Substitution Test (DSST); Letter Fluency; and Animal Naming. Raw scores were used for analyses. Associations between neighborhood disadvantage and neuropsychological performance were examined via demographically adjusted linear regression models stratified by ethnic group. RESULTS: Among 1614 older adults (mean [SD] age, 66.3 [8.7] years; 980 women [60.7%]), 853 were Mexican American (mean [SD] age, 63.9 [7.9] years; 566 women [66.4%]), and 761 were non-Hispanic White (mean [SD] age, 69.1 [8.7] years; 414 women [54.4%]). Older Mexican American adults were more likely to reside in the most disadvantaged areas (ADI quintiles 3-5), with 280 individuals (32.8%) living in ADI quintile 5, whereas a large proportion of older non-Hispanic White adults resided in ADI quintile 1 (296 individuals [38.9%]). Mexican American individuals living in more disadvantaged areas had worse performance than those living in ADI quintile 1 on 7 of 11 cognitive tests, including SEVLT Learning (ADI quintile 5: beta = -2.50; 95% CI, -4.46 to -0.54), SEVLT Delayed Recall (eg, ADI quintile 3: beta = -1.11; 95% CI, -1.97 to -0.24), WMS-III Digit Span Forward (eg, ADI quintile 4: beta = -1.14; 95% CI, -1.60 to -0.67), TMT part A (ADI quintile 5: beta = 7.85; 95% CI, 1.28-14.42), TMT part B (eg, ADI quintile 5: beta = 31.5; 95% CI, 12.16-51.35), Letter Fluency (ADI quintile 4: beta = -2.91; 95% CI, -5.39 to -0.43), and DSST (eg, ADI quintile 5: beta = -4.45; 95% CI, -6.77 to -2.14). In contrast, only non-Hispanic White individuals living in ADI quintile 4 had worse performance than those living in ADI quintile 1 on 4 of 11 cognitive tests, including SEVLT Learning (beta = -2.35; 95% CI, -4.40 to -0.30), SEVLT Delayed Recall (beta = -0.95; 95% CI, -1.73 to -0.17), TMT part B (beta = 15.95; 95% CI, 2.47-29.44), and DSST (beta = -3.96; 95% CI, -6.49 to -1.43). CONCLUSIONS AND RELEVANCE: In this cross-sectional study, aging in a disadvantaged area was associated with worse cognitive functioning, particularly for older Mexican American adults. Future studies examining the implications of exposure to neighborhood disadvantage across the life span will be important for improving cognitive outcomes in diverse populations.
  • Item
    Hyperparameter Tuning with High Performance Computing Machine Learning for Imbalanced Alzheimer's Disease Data
    (MDPI, 2022-11-17) Zhang, Fan; Petersen, Melissa E.; Johnson, Leigh A.; Hall, James R.; O'Bryant, Sid E.
    Accurate detection is still a challenge in machine learning (ML) for Alzheimer's disease (AD). Class imbalance in imbalanced AD data is another big challenge for machine-learning algorithms working under the assumption that the data are evenly distributed within classes. Here, we present a hyperparameter tuning workflow with high-performance computing (HPC) for imbalanced data related to prevalent mild cognitive impairment (MCI) and AD in the Health and Aging Brain Study-Health Disparities (HABS-HD) project. We applied a single-node multicore parallel mode to hyperparameter tuning of gamma, cost, and class weight using a support vector machine (SVM) model with 10 times repeated fivefold cross-validation. We executed the hyperparameter tuning workflow with R's bigmemory, foreach, and doParallel packages on Texas Advanced Computing Center (TACC)'s Lonestar6 system. The computational time was dramatically reduced by up to 98.2% for the high-performance SVM hyperparameter tuning model, and the performance of cross-validation was also improved (the positive predictive value and the negative predictive value at base rate 12% were, respectively, 16.42% and 92.72%). Our results show that a single-node multicore parallel structure and high-performance SVM hyperparameter tuning model can deliver efficient and fast computation and achieve outstanding agility, simplicity, and productivity for imbalanced data in AD applications.
  • Item
    Application of Structural Retinal Biomarkers to Detect Cognitive Impairment in a Primary Care Setting
    (IOS Press, 2023-02-02) Mozdbar, Sima; Petersen, Melissa E.; Zhang, Fan; Johnson, Leigh A.; Tolman, Alex; Nyalakonda, Ramyashree; Gutierrez, Alejandra; O'Bryant, Sid E.
    BACKGROUND: Despite the diagnostic accuracy of advanced neurodiagnostic procedures, the detection of Alzheimer's disease (AD) remains poor in primary care. There is an urgent need for screening tools to aid in the detection of early AD. OBJECTIVE: This study examines the predictive ability of structural retinal biomarkers in detecting cognitive impairment in a primary care setting. METHODS: Participants were recruited from Alzheimer's Disease in Primary Care (ADPC) study. As part of the ADPC Retinal Biomarker Study (ADPC RBS), visual acuity, an ocular history questionnaire, eye pressure, optical coherence tomography (OCT) imaging, and fundus imaging was performed. RESULTS: Data were examined on n = 91 participants. The top biomarkers for predicting cognitive impairment included the inferior quadrant of the outer retinal layers, all four quadrants of the peripapillary retinal nerve fiber layer, and the inferior quadrant of the macular retinal nerve fiber layer. CONCLUSION: The current data provides strong support for continued investigation into structural retinal biomarkers, particularly the retinal nerve fiber layer, as screening tools for AD.
  • Item
    Proteomic profiles for Alzheimer's disease and mild cognitive impairment among adults with Down syndrome spanning serum and plasma: An Alzheimer's Biomarker Consortium-Down Syndrome (ABC-DS) study
    (Wiley Periodicals, Inc., 2020-06-30) Petersen, Melissa E.; Zhang, Fan; Schupf, Nicole; Krinsky-McHale, Sharon J.; Hall, James R.; Mapstone, Mark; Cheema, Amrita; Silverman, Wayne; Lott, Ira; Rafii, Michael S.; Handen, Benjamin; Klunk, William; Head, Elizabeth; Christian, Bradley; Foroud, Tatiana; Lai, Florence; Rosas, H. Diana; Zaman, Shahid; Ances, Beau M.; Wang, Mei-Cheng; Tycko, Benjamin; Lee, Joseph H.; O'Bryant, Sid E.
    Introduction: Previously generated serum and plasma proteomic profiles were examined among adults with Down syndrome (DS) to determine whether these profiles could discriminate those with mild cognitive impairment (MCI-DS) and Alzheimer's disease (DS-AD) from those cognitively stable (CS). Methods: Data were analyzed on n = 305 (n = 225 CS; n = 44 MCI-DS; n = 36 DS-AD) enrolled in the Alzheimer's Biomarker Consortium-Down Syndrome (ABC-DS). Results: Distinguishing MCI-DS from CS, the serum profile produced an area under the curve (AUC) = 0.95 (sensitivity [SN] = 0.91; specificity [SP] = 0.99) and an AUC = 0.98 (SN = 0.96; SP = 0.97) for plasma when using an optimized cut-off score. Distinguishing DS-AD from CS, the serum profile produced an AUC = 0.93 (SN = 0.81; SP = 0.99) and an AUC = 0.95 (SN = 0.86; SP = 1.0) for plasma when using an optimized cut-off score. AUC remained unchanged to slightly improved when age and sex were included. Eotaxin3, interleukin (IL)-10, C-reactive protein, IL-18, serum amyloid A , and FABP3 correlated fractions at r2 > = 0.90. Discussion: Proteomic profiles showed excellent detection accuracy for MCI-DS and DS-AD.
  • Item
    Proteomic profiles of incident mild cognitive impairment and Alzheimer's disease among adults with Down syndrome
    (Wiley Periodicals, Inc., 2020-05-21) O'Bryant, Sid E.; Zhang, Fan; Silverman, Wayne; Lee, Joseph H.; Krinsky-McHale, Sharon J.; Pang, Deborah; Hall, James R.; Schupf, Nicole
    Introduction: We sought to determine if proteomic profiles could predict risk for incident mild cognitive impairment (MCI) and Alzheimer's disease (AD) among adults with Down syndrome (DS). Methods: In a cohort of 398 adults with DS, a total of n = 186 participants were determined to be non-demented and without MCI or AD at baseline and throughout follow-up; n = 103 had incident MCI and n = 81 had incident AD. Proteomics were conducted on banked plasma samples from a previously generated algorithm. Results: The proteomic profile was highly accurate in predicting incident MCI (area under the curve [AUC] = 0.92) and incident AD (AUC = 0.88). For MCI risk, the support vector machine (SVM)-based high/low cut-point yielded an adjusted hazard ratio (HR) = 6.46 (P < .001). For AD risk, the SVM-based high/low cut-point score yielded an adjusted HR = 8.4 (P < .001). Discussion: The current results provide support for our blood-based proteomic profile for predicting risk for MCI and AD among adults with DS.
  • Item
    A Precision Medicine Approach to Treating Alzheimer's Disease Using Rosiglitazone Therapy: A Biomarker Analysis of the REFLECT Trials
    (IOS Press, 2021-05-18) O'Bryant, Sid E.; Zhang, Fan; Petersen, Melissa E.; Johnson, Leigh A.; Hall, James R.; Rissman, Robert A.
    Background: The REFLECT trials were conducted to examine the treatment of mild-to-moderate Alzheimer's disease utilizing a peroxisome proliferator-activated receptor gamma agonist. Objective: To generate a predictive biomarker indicative of positive treatment response using samples from the previously conducted REFLECT trials. Methods: Data were analyzed on 360 participants spanning multiple negative REFLECT trials, which included treatment with rosiglitazone and rosiglitazone XR. Support vector machine analyses were conducted to generate a predictive biomarker profile. Results: A pre-defined 6-protein predictive biomarker (IL6, IL10, CRP, TNFɑ, FABP-3, and PPY) correctly classified treatment response with 100% accuracy across study arms for REFLECT Phase II trial (AVA100193) and multiple Phase III trials (AVA105640, AV102672, and AVA102670). When the data was combined across all rosiglitazone trial arms, a global RSG-predictive biomarker with the same 6-protein predictive biomarker was able to accurately classify 98%of treatment responders. Conclusion: A predictive biomarker comprising of metabolic and inflammatory markers was highly accurate in identifying those patients most likely to experience positive treatment response across the REFLECT trials. This study provides additional proof-of-concept that a predictive biomarker can be utilized to help with screening and predicting treatment response, which holds tremendous benefit for clinical trials.
  • Item
    Accelerating Hyperparameter Tuning in Machine Learning for Alzheimer's Disease With High Performance Computing
    (Frontiers Media S.A., 2021-12-08) Zhang, Fan; Petersen, Melissa E.; Johnson, Leigh A.; Hall, James R.; O'Bryant, Sid E.
    Driven by massive datasets that comprise biomarkers from both blood and magnetic resonance imaging (MRI), the need for advanced learning algorithms and accelerator architectures, such as GPUs and FPGAs has increased. Machine learning (ML) methods have delivered remarkable prediction for the early diagnosis of Alzheimer's disease (AD). Although ML has improved accuracy of AD prediction, the requirement for the complexity of algorithms in ML increases, for example, hyperparameters tuning, which in turn, increases its computational complexity. Thus, accelerating high performance ML for AD is an important research challenge facing these fields. This work reports a multicore high performance support vector machine (SVM) hyperparameter tuning workflow with 100 times repeated 5-fold cross-validation for speeding up ML for AD. For demonstration and evaluation purposes, the high performance hyperparameter tuning model was applied to public MRI data for AD and included demographic factors such as age, sex and education. Results showed that computational efficiency increased by 96%, which helped to shed light on future diagnostic AD biomarker applications. The high performance hyperparameter tuning model can also be applied to other ML algorithms such as random forest, logistic regression, xgboost, etc.
  • Item
    Neurodegeneration from the AT(N) framework is different among Mexican Americans compared to non-Hispanic Whites: A Health & Aging Brain among Latino Elders (HABLE) Study
    (Wiley Periodicals, LLC, 2022-02-09) O'Bryant, Sid E.; Zhang, Fan; Petersen, Melissa E.; Hall, James R.; Johnson, Leigh A.; Yaffe, Kristine; Braskie, Meredith N.; Rissman, Robert A.; Vig, Rocky; Toga, Arthur W.
    Introduction: We sought to examine a magnetic resonance imaging (MRI)-based marker of neurodegeneration from the AT(N) (amyloid/tau/neurodegeneration) framework among a multi-ethnic, community-dwelling cohort. Methods: Community-dwelling Mexican Americans and non-Hispanic White adults and elders were recruited. All participants underwent comprehensive assessments including an interview, functional exam, clinical labs, informant interview, neuropsychological testing and 3T MRI of the brain. A neurodegeneration MRI meta-region of interest (ROI) biomarker for the AT(N) framework was calculated. Results: Data were examined from n = 1305 participants. Mexican Americans experienced N at significantly younger ages. The N biomarker was significantly associated with cognitive outcomes. N was significantly impacted by cardiovascular factors (e.g., total cholesterol, low-density lipoprotein) among non-Hispanic Whites whereas diabetes (glucose, HbA1c, duration of diabetes) and sociocultural (household income, acculturation) factors were strongly associated with N among Mexican Americans. Discussion: The prevalence, progression, timing, and sequence of the AT(N) biomarkers must be examined across diverse populations.
  • Item
    Proteomic profiles of prevalent mild cognitive impairment and Alzheimer's disease among adults with Down syndrome
    (Wiley Periodicals, Inc., 2020-04-17) Petersen, Melissa E.; Zhang, Fan; Krinsky-McHale, Sharon J.; Silverman, Wayne; Lee, Joseph H.; Pang, Deborah; Hall, James R.; Schupf, Nicole; O'Bryant, Sid E.
    Introduction: We sought to determine if a proteomic profile approach developed to detect Alzheimer's disease (AD) in the general population would apply to adults with Down syndrome (DS). Methods: Plasma samples were obtained from 398 members of a community-based cohort of adults with DS. A total of n = 186 participants were determined to be non-demented and without mild cognitive impairment (MCI) at baseline and throughout follow-up; n = 50 had prevalent MCI; n = 42 had prevalent AD. Results: The proteomic profile yielded an area under the curve (AUC) of 0.92, sensitivity (SN) = 0.80, and specificity (SP) = 0.98 detecting prevalent MCI. For detecting prevalent AD, the proteomic profile yielded an AUC of 0.89, SN = 0.81, and SP = 0.97. The overall profile closely resembled our previously published profile of AD in the general population. Discussion: These data provide evidence of the applicability of our blood-based algorithm for detecting MCI/AD among adults with DS.
  • Item
    A proteomic signature for dementia with Lewy bodies
    (Elsevier Inc., 2019-03-15) O'Bryant, Sid E.; Ferman, Tanis J.; Zhang, Fan; Hall, James R.; Pedraza, Otto; Wszolek, Zbigniew K.; Como, Tori; Julovich, David A.; Mattevada, Sravan; Johnson, Leigh A.; Edwards, Melissa; Graff-Radford, Neill R.
    Introduction: We sought to determine if a proteomic profile approach developed to detect Alzheimer's disease would distinguish patients with Lewy body disease from normal controls, and if it would distinguish dementia with Lewy bodies (DLB) from Parkinson's disease (PD). Methods: Stored plasma samples were obtained from 145 patients (DLB n = 57, PD without dementia n = 32, normal controls n = 56) enrolled from patients seen in the Behavioral Neurology or Movement Disorders clinics at the Mayo Clinic, Florida. Proteomic assays were conducted and analyzed as per our previously published protocols. Results: In the first step, the proteomic profile distinguished the DLB-PD group from controls with a diagnostic accuracy of 0.97, sensitivity of 0.91, and specificity of 0.86. In the second step, the proteomic profile distinguished the DLB from PD groups with a diagnostic accuracy of 0.92, sensitivity of 0.94, and specificity of 0.88. Discussion: These data provide evidence of the potential utility of a multitiered blood-based proteomic screening method for detecting DLB and distinguishing DLB from PD.
  • Item
    Potential two-step proteomic signature for Parkinson's disease: Pilot analysis in the Harvard Biomarkers Study
    (Elsevier Inc., 2019-05-02) O'Bryant, Sid E.; Edwards, Melissa; Zhang, Fan; Johnson, Leigh A.; Hall, James R.; Kuras, Yuliya; Scherzer, Clemens R.
    Introduction: We sought to determine if our previously validated proteomic profile for detecting Alzheimer's disease would detect Parkinson's disease (PD) and distinguish PD from other neurodegenerative diseases. Methods: Plasma samples were assayed from 150 patients of the Harvard Biomarkers Study (PD, n = 50; other neurodegenerative diseases, n = 50; healthy controls, n = 50) using electrochemiluminescence and Simoa platforms. Results: The first step proteomic profile distinguished neurodegenerative diseases from controls with a diagnostic accuracy of 0.94. The second step profile distinguished PD cases from other neurodegenerative diseases with a diagnostic accuracy of 0.98. The proteomic profile differed in step 1 versus step 2, suggesting that a multistep proteomic profile algorithm to detecting and distinguishing between neurodegenerative diseases may be optimal. Discussion: These data provide evidence of the potential use of a multitiered blood-based proteomic screening method for detecting individuals with neurodegenerative disease and then distinguishing PD from other neurodegenerative diseases.
  • Item
    Identification of novel alternative splicing biomarkers for breast cancer with LC/MS/MS and RNA-Seq
    (BioMed Central Ltd., 2020-12-03) Zhang, Fan; Deng, Chris K.; Wang, Mu; Deng, Bin; Barber, Robert C.; Huang, Gang
    Background: Alternative splicing isoforms have been reported as a new and robust class of diagnostic biomarkers. Over 95% of human genes are estimated to be alternatively spliced as a powerful means of producing functionally diverse proteins from a single gene. The emergence of next-generation sequencing technologies, especially RNA-seq, provides novel insights into large-scale detection and analysis of alternative splicing at the transcriptional level. Advances in Proteomic Technologies such as liquid chromatography coupled tandem mass spectrometry (LC-MS/MS), have shown tremendous power for the parallel characterization of large amount of proteins in biological samples. Although poor correspondence has been generally found from previous qualitative comparative analysis between proteomics and microarray data, significantly higher degrees of correlation have been observed at the level of exon. Combining protein and RNA data by searching LC-MS/MS data against a customized protein database from RNA-Seq may produce a subset of alternatively spliced protein isoform candidates that have higher confidence. Results: We developed a bioinformatics workflow to discover alternative splicing biomarkers from LC-MS/MS using RNA-Seq. First, we retrieved high confident, novel alternative splicing biomarkers from the breast cancer RNA-Seq database. Then, we translated these sequences into in silico Isoform Junction Peptides, and created a customized alternative splicing database for MS searching. Lastly, we ran the Open Mass spectrometry Search Algorithm against the customized alternative splicing database with breast cancer plasma proteome. Twenty six alternative splicing biomarker peptides with one single intron event and one exon skipping event were identified. Further interpretation of biological pathways with our Integrated Pathway Analysis Database showed that these 26 peptides are associated with Cancer, Signaling, Metabolism, Regulation, Immune System and Hemostasis pathways, which are consistent with the 256 alternative splicing biomarkers from the RNA-Seq. Conclusions: This paper presents a bioinformatics workflow for using RNA-seq data to discover novel alternative splicing biomarkers from the breast cancer proteome. As a complement to synthetic alternative splicing database technique for alternative splicing identification, this method combines the advantages of two platforms: mass spectrometry and next generation sequencing and can help identify potentially highly sample-specific alternative splicing isoform biomarkers at early-stage of cancer.
  • Item
    Identification of long non-coding RNA-related and -coexpressed mRNA biomarkers for hepatocellular carcinoma
    (BioMed Central Ltd., 2019-01-31) Zhang, Fan; Ding, Linda; Cui, Li; Barber, Robert C.; Deng, Bin
    Background: While changes in mRNA expression during tumorigenesis have been used widely as molecular biomarkers for the diagnosis of a number of cancers, the approach has limitations. For example, traditional methods do not consider the regulatory and positional relationship between mRNA and lncRNA. The latter has been largely shown to possess tumor suppressive or oncogenic properties. The combined analysis of mRNA and lncRNA is likely to facilitate the identification of biomarkers with higher confidence. Results: Therefore, we have developed an lncRNA-related method to identify traditional mRNA biomarkers. First we identified mRNAs that are differentially expressed in Hepatocellular Carcinoma (HCC) by comparing cancer and matched adjacent non-tumorous liver tissues. Then, we performed mRNA-lncRNA relationship and coexpression analysis and obtained 41 lncRNA-related and -coexpressed mRNA biomarkers. Next, we performed network analysis, gene ontology analysis and pathway analysis to unravel the functional roles and molecular mechanisms of these lncRNA-related and -coexpressed mRNA biomarkers. Finally, we validated the prediction and performance of the 41 lncRNA-related and -coexpressed mRNA biomarkers using Support Vector Machine model with five-fold cross-validation in an independent HCC dataset from RNA-seq. Conclusions: Our results suggested that mRNAs expression profiles coexpressed with positionally related lncRNAs can provide important insights into early diagnosis and specific targeted gene therapy of HCC.
  • Item
    The Health & Aging Brain among Latino Elders (HABLE) study methods and participant characteristics
    (Wiley Periodicals, LLC, 2021-06-21) O'Bryant, Sid E.; Johnson, Leigh A.; Barber, Robert C.; Braskie, Meredith N.; Christian, Bradley; Hall, James R.; Hazra, Nalini; King, Kevin; Kothapalli, Deydeep; Large, Stephanie; Mason, David; Matsiyevskiy, Elizabeth; McColl, Roderick; Nandy, Rajesh; Palmer, Raymond; Petersen, Melissa E.; Philips, Nicole; Rissman, Robert A.; Shi, Yonggang; Toga, Arthur W.; Vintimilla, Raul; Vig, Rocky; Zhang, Fan; Yaffe, Kristine
    Introduction: Mexican Americans remain severely underrepresented in Alzheimer's disease (AD) research. The Health & Aging Brain among Latino Elders (HABLE) study was created to fill important gaps in the existing literature. Methods: Community-dwelling Mexican Americans and non-Hispanic White adults and elders (age 50 and above) were recruited. All participants underwent comprehensive assessments including an interview, functional exam, clinical labs, informant interview, neuropsychological testing, and 3T magnetic resonance imaging (MRI) of the brain. Amyloid and tau positron emission tomography (PET) scans were added at visit 2. Blood samples were stored in the Biorepository. Results: Data was examined from n = 1705 participants. Significant group differences were found in medical, demographic, and sociocultural factors. Cerebral amyloid and neurodegeneration imaging markers were significantly different between Mexican Americans and non-Hispanic Whites. Discussion: The current data provide strong support for continued investigations that examine the risk factors for and biomarkers of AD among diverse populations.
  • Item
    IPAD: the Integrated Pathway Analysis Database for Systematic Enrichment Analysis
    (Springer Nature, 2012) Zhang, Fan; Drabier, Renee
    Background: Next-Generation Sequencing (NGS) technologies and Genome-Wide Association Studies (GWAS) generate millions of reads and hundreds of datasets, and there is an urgent need for a better way to accurately interpret and distill such large amounts of data. Extensive pathway and network analysis allow for the discovery of highly significant pathways from a set of disease vs. healthy samples in the NGS and GWAS. Knowledge of activation of these processes will lead to elucidation of the complex biological pathways affected by drug treatment, to patient stratification studies of new and existing drug treatments, and to understanding the underlying anti-cancer drug effects. There are approximately 141 biological human pathway resources as of Jan 2012 according to the Pathguide database. However, most currently available resources do not contain disease, drug or organ specificity information such as disease-pathway, drug-pathway, and organ-pathway associations. Systematically integrating pathway, disease, drug and organ specificity together becomes increasingly crucial for understanding the interrelationships between signaling, metabolic and regulatory pathway, drug action, disease susceptibility, and organ specificity from high-throughput omics data (genomics, transcriptomics, proteomics and metabolomics). Results: We designed the Integrated Pathway Analysis Database for Systematic Enrichment Analysis (IPAD, http://bioinfo.hsc.unt.edu/ipad), defining inter-association between pathway, disease, drug and organ specificity, based on six criteria: 1) comprehensive pathway coverage; 2) gene/protein to pathway/disease/drug/organ association; 3) inter-association between pathway, disease, drug, and organ; 4) multiple and quantitative measurement of enrichment and inter-association; 5) assessment of enrichment and inter-association analysis with the context of the existing biological knowledge and a "gold standard" constructed from reputable and reliable sources; and 6) cross-linking of multiple available data sources. IPAD is a comprehensive database covering about 22,498 genes, 25,469 proteins, 1956 pathways, 6704 diseases, 5615 drugs, and 52 organs integrated from databases including the BioCarta, KEGG, NCI-Nature curated, Reactome, CTD, PharmGKB, DrugBank, PharmGKB, and HOMER. The database has a web-based user interface that allows users to perform enrichment analysis from genes/proteins/molecules and inter-association analysis from a pathway, disease, drug, and organ. Moreover, the quality of the database was validated with the context of the existing biological knowledge and a "gold standard" constructed from reputable and reliable sources. Two case studies were also presented to demonstrate: 1) self-validation of enrichment analysis and inter-association analysis on brain-specific markers, and 2) identification of previously undiscovered components by the enrichment analysis from a prostate cancer study. Conclusions: IPAD is a new resource for analyzing, identifying, and validating pathway, disease, drug, organ specificity and their inter-associations. The statistical method we developed for enrichment and similarity measurement and the two criteria we described for setting the threshold parameters can be extended to other enrichment applications. Enriched pathways, diseases, drugs, organs and their inter-associations can be searched, displayed, and downloaded from our online user interface. The current IPAD database can help users address a wide range of biological pathway related, disease susceptibility related, drug target related and organ specificity related questions in human disease studies.
  • Item
    SASD: The Synthetic Alternative Splicing Database for identifying novel isoform from proteomics
    (2013) Zhang, Fan; Drabier, Renee
    Background: Alternative splicing is an important and widespread mechanism for generating protein diversity and regulating protein expression. High-throughput identification and analysis of alternative splicing in the protein level has more advantages than in the mRNA level. The combination of alternative splicing database and tandem mass spectrometry provides a powerful technique for identification, analysis and characterization of potential novel alternative splicing protein isoforms from proteomics. Therefore, based on the peptidomic database of human protein isoforms for proteomics experiments, our objective is to design a new alternative splicing database to 1) provide more coverage of genes, transcripts and alternative splicing, 2) exclusively focus on the alternative splicing, and 3) perform context-specific alternative splicing analysis. Results: We used a three-step pipeline to create a synthetic alternative splicing database (SASD) to identify novel alternative splicing isoforms and interpret them at the context of pathway, disease, drug and organ specificity or custom gene set with maximum coverage and exclusive focus on alternative splicing. First, we extracted information on gene structures of all genes in the Ensembl Genes 71 database and incorporated the Integrated Pathway Analysis Database. Then, we compiled artificial splicing transcripts. Lastly, we translated the artificial transcripts into alternative splicing peptides. The SASD is a comprehensive database containing 56,630 genes (Ensembl gene IDs), 95,260 transcripts (Ensembl transcript IDs), and 11,919,779 Alternative Splicing peptides, and also covering about 1,956 pathways, 6,704 diseases, 5,615 drugs, and 52 organs. The database has a web-based user interface that allows users to search, display and download a single gene/transcript/protein, custom gene set, pathway, disease, drug, organ related alternative splicing. Moreover, the quality of the database was validated with comparison to other known databases and two case studies: 1) in liver cancer and 2) in breast cancer. Conclusions: The SASD provides the scientific community with an efficient means to identify, analyze, and characterize novel Exon Skipping and Intron Retention protein isoforms from mass spectrometry and interpret them at the context of pathway, disease, drug and organ specificity or custom gene set with maximum coverage and exclusive focus on alternative splicing.