Core C: Bioinformatics Genomics

bioinformatics graphicBioinformatics is the application of statistics and computer science to the field of molecular biology. It has emerged as a field unto itself, as the datasets that are generated by modern biomedical researchers easily exceeds what can be directiy analyzed. Core C will work with the data generated from massive parallel sequencing from human, mouse and zebrafish, to extract variants that are potential to cause disease.

The PIs of Cores A, B and C have worked together extensively in the past, and have an established track record of productivity in the area of next generation sequencing (NGS) data analysis. Dr. Bafna has worked broadly in bioinformatics and genomics in the development computational methodologies employing novel algorithms and statistical techniques for NGS datasets. We envision that the WES data generated from Core B will be delivered to Core C for extraction of the potentially deleterious sequence variants (PDSVs), which will be delivered back to each of the Projects for segregation analysis and further validation. This will be accomplished by developing the four key pipelines of Core C:

  1. WES data tracking and storage pipeline
  2. WES data analysis pipeline
  3. Mutation identification pipeline
  4. Comparative genomics pipeline

The analysis of WES datasets is presented in this application as a series of filters that is applied to the primary sequence to extract all relevant variants, and then apply a heuristic ranking strategy to detect the PDSVs mostly likely associated with the phenotype. The output of this FILTER and PRIORITIZE programs are then reported as both SNPs and INDELs in a ranked fashion, for later validation and segregation testing. Further analysis will help uncover the contribution of these genes to common disease as well as genome- wide gene-gene interactions using other software we have developed. We are also well-positioned to take full advantage of the 3^" generation DNA sequencers, and are excited that UCSD will serve as one of the national HHMI PacBio Sequencing Centers. These tools, together with the outstanding and unique human and animal resources, will make for a powerful combination to investigate new causes of structural brain disorders.

Leader: Vineet Bafna, PhD
Professor, Computer Science and Engineering
UC San Diego