COVID-19 Updates

Visit UC San Diego's Coronavirus portal for the latest information for the campus community.

Seminar Presentations in 2015

December 4, 2015 - Xiaoqian Jiang, Ph.D. - Assistant Professor, Department of Biomedical Informatics, University of California, San Diego, La Jolla, CA
Secret: Secure Edit-distance Computation over homomoRphic Encrypted daTa

Abstract: The biomedical community benefits from the increasing availability of genomic data, which enables institutions and researchers to develop personalized treatment and discover precision medicine. However, privacy and confidentiality of genomic data are becoming a major concern for both patients and researchers in the data storage, transfer and analysis phases. In this work, we proposed the protocol for Secure Edit-distance Computation over homomoRphic Encrypted daTa (SECRET). Based on Homomorphic Encryption (HME), SECRET allows researchers to securely outsource both genomic data and edit-distance computation in an untrusted cloud environment without sacrificing data privacy. We developed HME based secure comparison and addition primitives over binary vectors to achieve SECRET. Our experimental results demonstrate the computational costs of the proposed protocol.
Bio: Professor Jiang is an Assistant Professor at the Department of Biomedical Informatics, UC San Diego. He has his MS degree in Computer Science from the University of Iowa and PhD in Computer Science from Carnegie Mellon University. His research interest includes health privacy and predictive modeling.

November 23, 2015 -  Rebecca Randell, Ph.D., - Senior Translational Research Fellow, School of Healthcare, University of Leeds
Opening the black box: Using realist evaluation to understand the implementation and use of healthcare technologies
Clinical trials are typically considered the gold standard for evaluating healthcare technologies. However, the impacts of healthcare technologies often vary according to the context in which they are implemented and clinical trials can struggle to account for this variation. Such variation is to be expected because healthcare technologies in and of themselves do not give rise to changes in outcomes; it is how people choose to make use of (or not) the resources that healthcare technologies offer them that lead to those outcomes, and these choices are highly dependent on context. Consequently, there is a need to understand not only what works but what works for whom, in what contexts, and how. Realist evaluation, an increasingly popular method for evaluating complex interventions in healthcare, is an approach that seeks to provide this understanding. At the University of Leeds, we are using realist evaluation to undertake a number of studies of healthcare technology use, from trainee doctors’ use of smartphone apps in support of decision making, to the impact of robotic surgery on teamwork and communication in the operating theatre. In this talk, I will draw on our experience to reflect on what realist evaluation can contribute to the evaluation of healthcare technologies.
Bio: Dr. Rebecca Randell is a lecturer in the School of Healthcare at the University of Leeds. She has a degree in Software Engineering from the University of Durham and a PhD in Human-Computer Interaction from the University of Glasgow. Rebecca’s research is concerned with understanding how healthcare professionals carry out their work in order to inform the design of healthcare technology and with understanding how healthcare technologies are used in practice. She is currently leading a project funded by the National Institute for Health Research’s Health Services & Delivery Research programme, using realist evaluation to understand how robotic surgery impacts on teamwork and communication in the operating theatre. In 2004, Rebecca was awarded the Diana E. Forsythe Award by the American Medical Informatics Association for research at the intersection of informatics and the social sciences. She is Chair of the European Federation for Medical Informatics (EFMI) Human and Organizational Factors of Medical Informatics Working Group.

November 13, 2015 - Riley I. Taitingfong, B.A. and Murktarat Yussuff, B.A.
PhD Students, Department of Communication, University of California, San Diego, La Jolla, CA
Human Centered Design (HCD) and Community Based Participatory Research (CBPR) for Developing an mHealth Tool for Intercultural Health Literacy
In this presentation we will discuss our work with the East African immigrant and refugee community in San Diego developing on an mHealth tool for mental health literacy. Our focus will be on the affordances and limits of integrating HCD and CBPR approaches for addressing intercultural communication and acculturation. A unique challenge of the project has been designing an intervention that is sensitive to the specific needs of families comprising members who command significantly varied levels of language and technological literacy, and acculturation demands.

November 6, 2015 - Loki Natarajan, Ph.D. - Professor, Division of Biostatistics and Bioinformatics, Department of Family Medicine and Public Health, University of San Diego, La Jolla, CA
Building Validated Stable Prognostic Classifiers: Applications to Diabetic Kidney Disease
Diabetic patients are at high risk of developing chronic kidney disease (CKD). Currently, there are few effective treatments for CKD, and there is an urgent need to identify novel biomarkers associated with diabetic CKD, with the ultimate goal of finding therapeutic targets. Using a cohort of 114 Type 2 diabetic patients, we used a metabolomics approach to develop models to characterize diabetic CKD. Predictors comprised a panel of 100 urine metabolites, as well as, demographic and clinical factors, such as age, body mass index, and duration of diabetes. We used penalized regression via LASSO to select parsimonious predictor sets to classify patients into CKD and non-CKD groups. Cross-validation and stability analysis were performed using bootstrapped samples, to assess misclassification rates and reproducibility of selected variables.
Bio: Dr. Natarajan is a Professor in the Division of Biostatistics and Bioinformatics, Dept. of Family Medicine and Public Health at UCSD. She received her PhD in mathematics in 1991 from UC Berkeley, and joined the UCSD faculty in 2002. As an applied biostatistician, she collaborates with researchers across SOM in obesity research, physical activity and diet interventions, cancer prevention, and diabetes, and has co-authored 120 peer reviewed publications. Her methodological interests include measurement error models for physical activity and dietary data, time-varying effects in survival models, and prognostic modeling with high-dimensional predictors.

October 30, 2015 - Jina Huh, Ph.D.,- Assistant Professor, Department of Biomedical Informatics, University of San Diego, San Diego, CA
Visual Media for Patient-Centered Systems
In this talk, I will discuss how visual media can be used for patient-centered systems. The goal of this seminar is to explore together opportunities for using visual media, such as videos, images, dynamic interfaces, and visualizations to increase opportunities for novel patient-centered systems. I present my formative research in patient-generated videos and images as well as visualizations for behavior change mobile applications. I connect this research effort to the literature on using photographs and artistic expressions in therapeutic sessions in clinical care as well as social support. I end with agenda for using visual media to improve human health.
Bio: Jina Huh is joining the Department of Biomedical Informatics at the University of California San Diego as Assistant Professor this fall. In addition to her work in online health communities from her NIH K01 award, she studies mobile health applications and social media to improve daily health management. She was an Assistant Professor at the Department of Media and Information at Michigan State University. She received the NLM postdoctoral fellowship at the University of Washington, a PhD from the University of Michigan School of Information, a Masters in HCI from Carnegie Mellon University, and a BA from Korea National University of Arts.

October 23, 2015 - İlkay Altıntaş, Ph.D., - Chief Data Science Officer, San Diego Supercomputer Center, University of San Diego, San Diego, CA
Bridging Big Data and Data Science Using Scalable Workflows
Scientific workflows are used by many scientific communities to capture, automate and standardize computational and data management practices. Workflow-based automation for an application is often achieved through a craft that combines multidisciplinary collaboration between team members, process modeling, and programmable scalability on computing and Big Data platforms. Such efforts often lead to provenance-aware archival and publication of results. This talk will summarize varying and changing requirements for scalability in distributed workflows influenced by Big Data and heterogeneous computing architectures including our ongoing research efforts on end-to-end performance prediction and scheduling for workflow-driven Biomedical Big Data applications using bioKepler ( It will also introduce, BBDTC (, a new collaborative for online biomedical big data training.
Bio: Ilkay Altintas is the Chief Data Science Officer at the San Diego Supercomputer Center (SDSC), UC San Diego, where she is also the Founder and Director for the Workflows for Data Science Center of Excellence. Since joining SDSC in 2001, she has worked on different aspects of scientific workflows as a principal investigator and in other leadership roles across a wide range of cross-disciplinary projects. She is a co-initiator of and an active contributor to the open-source Kepler Scientific Workflow System, and the co-author of publications related to computational data science at the intersection of scientific workflows, provenance, distributed computing and big data with applications to many scientific domains.

October 16, 2015 - Howard Taras, MD - Professor of Pediatrics, School of Medicine, University of San Diego, San Diego, CA
Health, School Absenteeism, and Need for Data
Excessive school absenteeism is an educational, social and health problem. Rates of student absenteeism vary, based on grade and nature of the population. Many students are labeled as “sick” by parents for dozens of days per year, excusing their poor attendance. Schools do not know which absences could have been avoided by providing health resources to families and which were unnecessary, as students were well-enough to attend. This session will cover how better health data collection can solve much of this problem, what has been done in the past, and what projects university bioinformatics faculty can adopt to help with next steps.
Bio: Howard Taras, MD is a Professor of Pediatrics at UCSD, where he specializes in School Health and Community Engagement. Through UCSD, Taras is a medical consultant to school districts across California. He assists with decisions and processes leading to safe integration of students with special health care needs. Taras is also Director of Community Engagement for the Clinical & Translational Research Institute (CTRI), which provides resources to clinical investigators and tries to help community members and practicing clinicians influence what academicians are researching. Taras went to McMaster Medical School in Hamilton Canada, and did his pediatric residency at University of Toronto.

October 9, 2015 - Dov Fox, JD, - Associate Professor of Law, School of Law, University of San Diego, San Diego, CA
The Legal Regulation of Genome Privacy in the United States
Medical and scientific advances in genome research depend on widespread sharing of diverse genomic information. Given the potential sensitivity of that highly personal information, people's willingness to share it depends in turn on their confidence that it will be kept safe and secure, according to their wishes and interests. The uniqueness of genetic information makes anonymization challenging, however, all the more so because of emerging possibilities for familial matching and computer-aided re-identification. One 2013 study published in Science demonstrated, for example, that it is possible to re-identify genomic information using genealogical databases and public records. This talk critically examines the complex web of privacy protections that research participants in genome studies have under United States law.
Bio: Dov Fox is an Associate Professor at the University of San Diego School of Law, where he writes and teaches in the areas of criminal law and procedure, health law and bioethics, and the regulation of technology. His scholarship has appeared in leading journals of law, medicine, and philosophy. He has served as a law clerk to the Honorable Stephen Reinhardt of the U.S. Court of Appeals for the Ninth Circuit, and worked at the President’s Council on Bioethics, the consulting firm of McKinsey & Company, the law firm of Wachtell, Lipton, Rosen & Katz, and the Civil Appellate Staff at the U.S. Department of Justice. Fox holds a B.A. from Harvard, J.D. from Yale, and D.Phil. from Oxford, where he was a Rhodes Scholar.

October 2, 2015 - Hyeon-eui Kim, PhD, RN, MPH - Associate Professor, Department of Biomedical Informatics, University of California, San Diego, La Jolla, CA
SAPPHIRE: Skin Assessment for Pressure Ulcer Prevention-Integrated Recording Environment
Abstract: Accurate assessment and documentation of skin conditions facilitate communication among care providers and are critical to effective prevention and mitigation of pressure ulcer. This presentation introduces a prototype mobile system called SAPPIRE (Skin Assessment for Pressure Ulcer Prevention, an Integrated Recording Environment) developed for an android device to assist nurses with skin assessment and documentation at bedside. Key functionalities of SAPPIRE include (1) data documentation conforming to the relevant terminology standards, (2) data exchange using Continuity of Care Records (CCR) standard and (3) smart display of patient data relevant to risk parameters to promote accurate pressure ulcer risk assessment with the Braden scale. Challenges associated standardizing assessment data faced during this development and the approaches that SAPPIRE took to overcome them will be discussed.
Bio: Dr. Hyeon-Eui Kim earned her PhD degree in Health Informatics, University of Minnesota at Twin Cities, Minneapolis, MN, MPH from the Graduate School of Public Health, Seoul National University, Seoul, Korea, and BSN from the College of Nursing, Seoul National University, Seoul, Korea. She was a Pre-doctoral Fellow, Division of Biomedical Informatics Research, Mayo Clinic, Rochester, MN. She was an Informatician, Clinical Informatics Research & Development, Partners Healthcare, Wellesley, MA, Research Associate, Internal Medicine, Brigham & Women's Hospital, Boston, MA, Systems Analyst, City of Hope National Cancer Center, Duarte, CA, Instructor, Red-cross College of Nursing, Seoul, Korea, and a Registered Nurse, Seoul National University Hospital, Seoul, Korea. She received AMIA Nursing Informatics Working Group Award in 2007 and AMIA Nursing Informatics Harriet Werley Award in 2010. Her research interests include standardized concept representation (terminologies and information modeling), impact of information technology on patient care, clinical decision support systems, consumer health informatics, and informed consent for clinical data and sample use in research.

May 29, 2015 - Daniella Meeker, Ph.D. - Assistant Professor of Preventive Medicine and Pediatrics, University of Southern California, Los Angeles, CA and Director of Clinical Research Informatics, Southern California Clinical Translational Sciences Institute and Information Scientist, RAND Corporation
Mining Clinical Data for Quality Improvement and Precision Medicine
The FDA has generously supported data mining efforts devoted to detecting adverse drug events in post-market surveillance using claims, and more recently, clinical data. There has been relatively little attention devoted to the concept of "comparative effectiveness surveillance" which might more comprehensively analyze not only the risks of adverse events associated with different treatments, but the comparative benefits of of these treatments to patients in cost-benefit analyses. Furthermore, many data mining methods only consider drug exposure and do not take into account variations in patients' characteristics. By failing to capture differences between patients' responses to treatments, there are missed opportunities to practice precision medicine using existing therapies, and risks of withdrawing drugs that significantly benefit small populations. Dr. Meeker will present a recent history of data mining approaches that have been devoted to predict undiscovered benefits of existing therapies and a framework for comparative effectiveness surveillance that allows more personalized cost-benefit assessments of alternative treatments. She will present early results from this framework, including an evaluation of novel sequential pattern mining algorithms that may predict how combinations of drug treatments may improve outcomes after heart attacks and incident congestive heart failure. These results reveal both opportunities and challenges associated with causal inference in clinical data mining.
Bio: Daniella Meeker is an Assistant Professor of Preventive Medicine and Pediatrics at the University of Southern California, and the Informatics Program Director for the Southern California Clinical Translational Sciences Institute, a collaboration between CHLA, Los Angeles County Department of Health Services and Keck Medicine of USC. She earned her PhD in Computation and Neural Systems from California Institute of Technology. After completing a post-doctoral fellowship at the RAND Bing Center for Health Economics, she joined RAND as an Information Scientist. Her current research is focused on distributed architectures for data management, analysis, and translational practice. Her other work includes development of collaborative platforms for knowledge management, program evaluation, social network analysis, and applied health and behavioral economics.

May 22,  2015 - Zhaohui Qin, Ph.D. - Associate Professor, Department of Biostatistics and Bioinformatics, Rollins School of Public Health, Emory University, Atlanta, GA
What to do if you have a million genomics datasets? New Research Ideas and Essential Supporting Information Technologies

Rapid advances in high throughput technologies has produced massive amount of genomics datasets with more being generated daily at an accelerated pace. However, the vast majority of these datasets remain underutilized after their initial publication due to the difficulty of processing and analyzing them. Consequently there is much more information in these datasets than is being reported. Another problem is how to identify datasets that a given researcher is most interested in, which is analogous to identifying the most relevant webpages for a given domain of address. To solve these problems, we are working towards to major goals: 1. Build an IT infrastructure to better organize publicly available genomics data. 2. Develop an innovative biomedical data browser to efficiently browse through these datasets to identify novel biological knowledge and insights.We believe accomplishing these two goals will enable biomedical researchers to conduct data-driven scientific research and gain new knowledge from genomics BigData.
Bio: Steve Qin is an Associate Professor of Biostatistics and Bioinformatics at Emory University. He obtained his PhD degree in Statistics at the University of Michigan and underwent postdoc training at Harvard University. He has more than 12 years of research experience in bioinformatics and computational biology. The major goal of Dr. Qin’s work is to provide analytical tools for the genetics and genomics research community. He has developed model-based methods to analyze high-throughput genomics and epigenomics data from ChIP-Seq, RNA-Seq, BS-seq and Hi-C experiments. His current research focuses to biological big data integration and mining. In addition to method development, he has collaborated extensively with biologists and clinicians to assist their efforts of identifying novel biological insights from their experimental data.

May 15, 2015 - Jimeng Sun, Ph.D. - Associate Professor, School of Computational Science and Engineering, College of Computing, Georgia Institute of Technology, Atlanta, GA
Building Scalable Health Analytic Platform: Computational Phenotyping and Cloud-based Predictive Modeling
As the adoption of electronic health records (EHRs) has grown, EHRs are now composed of a diverse array of data, including structured information (e.g., diagnoses, medications, and lab results), and unstructured clinical progress notes. Two unique challenges need to be addressed in order to utilize EHR data in clinical research and practice:
    1) Computational Phenotyping: How to turn complex and messy EHR data into meaningful clinical concepts or phenotypes?

    2) Scalable predictive modeling: How to efficiently construct and validate clinical predictive models from EHR?

In this talk, we discuss our approaches to these challenges. For computational phenotyping, we present EHR data as data as inter-connected high-order relations i.e. tensors (e.g. tuples of patient-medication-diagnosis, patient-lab, and patient-symptoms), and then develop Limestone algorithm based on sparse nonnegative tensor factorization for extracting multiple phenotype candidates from EHR data. Most of the phenotype candidates are considered clinically meaningful and with predictive power.
For predictive modeling, we introduce CloudAtlas, a cloud-based parallel predictive modeling system using big data infrastructure including Hadoop and Spark. Besides parallel model building, CloudAtlas can accurately estimate the running time and cost for a predictive modeling workflow then provHisions the proper cluster on demand in the cloud. In particular, we demonstrate that CloudAtlas can achieve 40X speedup plus 40% cost saving compared to traditional sequential execution on large EHR datasets.
Bio: Jimeng Sun is an Associate Professor of School of Computational Science and Engineering at College of Computing in Georgia Institute of Technology. Prior to joining Georgia Tech, he was a research staff member at IBM TJ Watson Research Center. His research focuses on health analytics using electronic health records and data mining, especially in designing novel tensor analysis and similarity learning methods and developing large-scale predictive modeling systems. He has published over 70 papers, filed over 20 patents (5 granted). He has received ICDM best research paper award in 2008, SDM best research paper award in 2007, and KDD Dissertation runner-up award in 2008. Dr. Sun received his B.S. and M.Phil. in Computer Science from Hong Kong University of Science and Technology in 2002 and 2003, and PhD in Computer Science from Carnegie Mellon University in 2007.

May 8, 2015 - Stephanie Feudjio Feupe, M.S. - Ph.D. Student, Department of Biomedical Informatics, University of California, San Diego
Identifying Mendelian Disorders: Short and Structural Variants
To gain knowledge about suspected genetic diseases and potential treatment, we use of the combination of structural variation analysis and single nucleotide variation analysis as opposed to targeting one type of variation. While this approach has led to some important results in tumor analysis, the process of variant interpretation and validation is time consuming and not trivial. As part of this talk, we present our attempt to the automation of both variant interpretation and linking diseases to gene variants with supporting documentation. We consequently ease the burden of variant interpretation as well as literature search for researchers and treating team.

Arya Iranmehr, M.S. - Ph.D. Student, Department of Electrical and Computer Engineering, University of California, San Diego
A Model-based Hierarchical Topic Model

Abstract: In this talk, I present a Model-Based Hierarchical Topic Model (MB-HTM) and corresponding algorithm for approximate inference for hierarchical topic models. MB-HTM involves a fixed abstraction hierarchy of depth L, which at each level l there are exactly l nodes that are fully connected to the nodes of the adjacent levels. The generative process of MB-HTM is defined that branching each level is multinomially distributed with a Dirichlet prior. MB-HTM lend itself to target much larger (in scale) applications, compared to its nonparametric counterparts. We use the collapsed Gibbs sampling algorithm for approximate inference, which marginalizes all the latent variables and parameters except those corresponding to paths and topic assignments, and then alternatively samples from their distribution for all documents until convergence. We compare document-completion Per-Word-Likelihood (PWL) of our model with state-of-the-art non-parametric Bayesian methods on several small and large corpus using Annealed Importance Sampling (AIS).

May 1, 2015 - Jolene Lau, Ph.D. - Scientific Analyst, Sagient Research Systems Inc.
Predicting the Likelihood of FDA Approval for Experimental Drugs: “Small Data” and Subjectivity

Abstract: Although thousands of drug candidates are currently being tested in clinical trials, only a small fraction of these agents will eventually receive FDA approval. BioMedTracker is a database that captures information on experimental drugs, company drug pipelines, and clinical trials based on publicly available information. The database allows us to generate Likelihood of Approval (LOA) statistics for drugs and compare approval timelines for submissions to the FDA and European Medicines Agency. When coupled with an understanding of the clinical landscape in various disease indications, these statistics may help guide drug development strategy for our pharma/biotech clients.
Bio: Dr. Jolene Lau is a Scientific Analyst at BioMedTracker, a drug intelligence platform that performs evidence-based pharmaceutical/biotech investment research for clients. BioMedTracker provides drug, company, and indication-based assessments based on scientific and medical publications, FDA regulatory insight, and a BMT analysis of the Likelihood of Approval of drugs in development. Her main areas of coverage include hematological cancers, breast and gynecological cancers, anti-infectives, and dermatology. Dr. Lau received her Ph.D. from the Scripps Research Institute and B.S. from the Caltech, both in Chemistry. She was a postdoctoral researcher at the Lawrence Berkeley National Lab.

April 24, 2015 - Tsung-Ting “Tim” Kuo, Ph.D. - Post-Doctoral Researcher, Department of Biomedical Informatics, University of California, San Diego
Bridging Machine Learning Theory and Practice - Lessons Learned from Participating ACM KDD-Cup

Abstract: Machine Learning (ML), or Knowledge Discovery and Data Mining (KDD), is one of the most popular scientific discipline nowadays. Understanding ML / KDD algorithms is not sufficient to apply them successfully to the real-world data. By participating leading ML / KDD competitions such as ACM KDD Cup, researchers have an opportunity to define the state-of-the-arts of ML / KDD techniques, and bridge the gap between theory and practice. Among hundreds of teams participated ACM KDD Cup, National Taiwan University (NTU) is one of the best performers. NTU has won 5 Champions from 2008 to 2013 except 2009 (which is 3rd place). As a 4-year NTU team member, I will share the lessons learned from ACM KDD Cup in this talk. Also, I will focus on the experience of the task in ACM KDD Cup 2008, which is about early stage detection of breast cancer from mammogram image data provided by Siemens Healthcare (formerly Siemens Medical Solutions), USA.
Bio: Dr. Tsung-Ting Kuo is a Postdoctoral Scholar in DBMI, School of Medicine, UCSD. He earned his Ph.D. in National Taiwan University, M.S. in National Chiao-Tung University, and B.S. in National Cheng-Kung University, all in Computer Science. He is a runner-up for the Best Dissertation Award from ACLCLP (Association of Computational Linguistic in Taiwan) and TIEEE (IEEE branch in Taiwan) in 2014. He is also a winner of NTU Outstanding Performance Scholarship in 2011, and many other awards. He was a major contributor in the NTU KDD-Cup team, a five-time champion from 2008 to 2013. He was the Workshop co-organizer of COLING SocialNLP, 2014 and Workshop co-organizer of IJCNLP SocialNLP, 2013. His research interests include data mining, machine learning, social network analysis, and natural language processing.

April 17, 2015 - Lucila Ohno-Machado, M.D., Ph.D., M.B.A. - Professor and Founding Chief, Department of Biomedical Informatics, Associate Dean of Informatics and Technology, University of California, San Diego.
Elizabeth Bell, M.P.H. - Staff Research Assistant - Research Coordinator, Department of Biomedical Informatics, University of California, San Diego.
Introduction to UCReX and ACT

Abstract: The UC ReX Data Explorer (i2b2 SHRINE) is a web based system that enables counts on 13.6 million de-identified patient records from the five UC biomedical centers with a menu-based query. Users can conduct interactive query counts for demographics, diagnosis and procedures (ICD-9), medications, laboratory, vital signs and vital status data derived from patient care activities. The Data Explorer user interface allows users to directly interact with the system, creating queries that search for specific patient characteristics. For example, a user who wants to identify non-Hispanic, African American patients having diagnoses of both type 2 diabetes and hypertension can use UC ReX Data Explorer to obtain counts of potential subjects whomatch the stated criteria at each site. Based on the results, one can decide which of the UC medical centers to approach for study collaboration or review of quality outcomes data. In 2013, BRAID and UC ReX collaborated in response to a National Institutes of Health (NIH) NCATS proposal to develop an infrastructure that will significantly increase the participant accrual to the nation’s highest priority clinical trials (ACT). The five BRAID-UC ReX campuses were selected and represent five (5) of 13 wave one sites in this important national initiative. In this talk, we will present the current progress of these projects and demonstrate the UC ReX Data Explorer system.
Bio: Dr. Ohno-Machado is Professor of Medicine and Chief, Division of Biomedical Informatics at the Department of Medicine and Associate Dean for Informatics and Technology of the School of Medicine, UCSD. Her research has been focused on construction and evaluation of novel data mining and decision support tools for biomedical research and clinical care, with particular emphasis on prognostication, as well as privacy technology tools to enable data sharing. As associate dean for informatics at UCSD, she also oversees the development and implementation of information systems for clinical quality improvement and health services research and directs the Informatics Core of the CTSA. UCSD’s clinical data warehouse for research was developed under her supervision and is managed by her team. She founded UC-Research eXchange, a University of California-wide initiative to integrate data warehouses from their five medical centers (UCLA, UCSF, UC Irvine, UC Davis, UCSD), and now serves as a member of the steering committee. The VA data warehouse, the VA Informatics and Computational Infrastructure (VINCI), UC-ReX, and USC-led clinics in LA formed pSCANNER in 2014, a PCORnet clinical data research network covering records of over 21 million patients. Elizabeth Bell is a clinical research coordinator for DBMI. She assists with user support and marketing of the UCReX research tool. Elizabeth obtained her MPH degree from University of Minnesota, and was an intern at DBMI prior to that.

April 10, 2015 - Sean Peisert, Ph.D.- Staff Scientist, Computational Research Division, Lawrence Berkeley National Laboratory and Assistant Adjunct Professor, Department of Computer Science, University of California, Davis
Models of Secure and Private Information Sharing

Abstract: The field of computer security can, in some ways, be seen as a toolbox full of tools -- we have something we want to secure, and we have tools available to do it. But the practical reality is that a lot of those tools are hard to use and so rather than using the right tool for the job, we use the tool we know how to use and hope it solves the problem. Or, instead, maybe the toolbox is sitting in the dark, and so we grab blindly for whatever tool our fingers wrap around first and start tinkering away with it. Taking a step back, however, we have to remember that every problem is different and presents its own intricate challenges. Additionally, not every problem can be or should be solved by technology alone. This talk walks through threats, vulnerabilities and solutions with regard to medical information sharing, to attempt to look at how we currently address security and privacy and how we might consider a research agenda in this area that might be both more practical and more effective in the future.
Bio: Dr. Sean Peisert is jointly appointed as a staff scientist at Lawrence Berkeley National Laboratory and as an assistant adjunct professor and faculty member in the Graduate Groups in Computer Science, Forensic Science, and Health Informatics at the University of California, Davis. His research interests cover a broad cross section of computer and network security. He is currently working on security research in the areas of intrusion detection, forensic analysis, vulnerability analysis, policy modeling, security metrics, the insider threat, elections and electronic voting, cyber-physical systems, the smart/power grid, and fault tolerance. He received his Ph.D., Masters, and Bachelors degrees in Computer Science from UC San Diego. Professor Peisert is actively involved with the academic computer security community and is an editorial board member of IEEE Security & Privacy, a steering committee member and past general chair of the New Security Paradigms Workshop (NSPW); steering committee member and past program co-chair of the Workshop on Cyber Security Experimentation and Test (CSET); and general chair for the 2015 IEEE Symposium on Security and Privacy, the flagship conference for security research.

April 3, 2015 - Naomi Broering, MLS, MA, F.ACMI - Dean of Libraries, Pacific College of Oriental Medicine
Big Data Fusion of Biomedical and Library Informatics for Patient Care

Abstract: The presentation will cover past historical developments leading to current and future trends. A review of early accomplishments of bringing computers from libraries to clinical settings, i.e.. hospitals, medical education, the NLM library databases and integrated academic information systems. Followed by present EHR systems and the recently released NLM new MedlinePlus Connect system that offers physicians offices with EHR systems the ability to connect a clinical diagnosis to MedlinePlus that will electronically deliver literature summaries for patients. This improves patient care and can be done during a medical office visit. The future of tying together all these data sets for patient centered services is fantastic, The challenge to your DBMI post docs is an opportunity to explore a role in technical connections to existing EHR systems that are not yet connected. NLM provides MedlinePlus Connect FREE of charge and we can offer contacts for technical information. I will be accompanied by Gregory A Chauncey, BSEE, MBA, Instructor from the Pacific College. He has expertise in Medline Plus.
Bio: Naomi C. Broering is Dean of Libraries at Pacific College of Oriental Medicine, where she Coordinate management of the three campus libraries, maintain Library Website, databases, E-Resources and Ebooks, Write Reports. Also, Administer Library Operations of San Diego Campus Library. Ms. Broering has decades of experience in the biomedical informatics and library field. She was Executive Director at the Houston Academy of Medicine - Texas Medical Center (HAM-TMC) Library, and Director of the Regional Medical Library from 1996 to 1999. She was Director of the Biomedical Information Resources Center and Medical Center Librarian at Georgetown University Medical Center, Dahlgren Memorial Library from 1975 to 1996. She is known for developing the Georgetown University IAIMS Project, the miniMEDLINE system and the Georgetown University Library Information System. She was elected as the Fellow of American College of Medical Informatics (FACMI) in 1989 for her contributions to the field of medical informatics.

March 13, 2015 - Michael Hogarth, MD - Professor, Internal Medicine and Chair, Health Informatics Graduate Program, University of California, Davis
Classic Papers in Medical Informatics
Abstract: Explore the 55 years of the discipline of medical informatics through a series of classic papers. Did you know that the author of the first paper describing Bayesian decision support was also the inventor of the CT Scan? Did you know the first paper to describe a EHR intervention resulting in quality improvement was 37 years ago? Did you know that a medical informaticist described the concept of the semantic web 9 years before Tim Berner’s Lee did in his Scientific American paper entitled “The Semantic Web”? In this presentation, we start by reviewing approaches used to measure the importance of papers and journals in a discipline. We then navigate the 55 years medical informatics through 20 “classic articles” that highlight the breadth and depth of innovation in the discipline.
Bio: Dr. Hogarth is a Board Certified Internist with appointments in Internal Medicine as well as the Department of Pathology, where he oversee the Pathology Informatics section. Dr. Hogarth has been a lead informaticist for a number of larg-scale informatics initiatives including UC-ReX, the Athena Breast Health Network, the California and Maryland Electronic Death Registration systems, and pSCANNER. Dr. Hogarth has been at the intersection of Internet technologies and healthcare for over 20 years. In 1995, he published the first book on the Internet and Healthcare (100,000 downloads, translated into 5 languages). In the same year, he and colleagues developed a customizable web browser. After building an open source “terminology server” in 2000, he took a 1-yr sabbatical and joined the Oracle healthcare team where he helped to design a designed a terminology/ontology component for the Oracle Healthcare Transaction Base. More recently, he led the design and development of an integrated health questionnaire system built on the Salesforce platform

March 6, 2015 - Douglas J. Conrad, MD - Professor and Director, Adult Cystic Fibrosis Program, Division of Pulmonary Care Medicine, University of California, San Diego
Barbara A. Bailey, Ph.D.- Associate Professor, Department of Mathematics and Statistics, San Diego State University

The Application of the Statistical Learning Algorithm, Random Forest, to Generate and Validate Clinical Phenotypes in an Adult Cystic Fibrosis
Abstract: Cystic Fibrosis (CF) is a multi-systemic disease resulting from mutations in the Cystic Fibrosis Transmembrane Regulator (CFTR) gene and has major clinical manifestations in the sino-pulmonary and gastro-intestinal tracts. Clinical phenotyping is important for identifying disease prognosis, responses to therapy, genomic/genetic risk assessment and for metabolomic studies. Multidimensional clinical phenotypes were generated using 26 common clinical variables to describe classes that overlapped quantiles of lung function and captured the complexity evident in adult CF patients. The variables included age, gender, CFTR mutations, FEV1% predicted, FVC% predicted, height, weight, Brasfield chest xray score, pancreatic sufficiency status and clinical microbiology results. Complete datasets were compiled on 211 subjects. Phenotypes were identified using a proximity matrix generated by the unsupervised Random Forests algorithm and subsequent clustering by the Partitioning around Medoids (PAM) algorithm. The final phenotypic classes were then characterized and compared to a similar dataset obtained three years earlier.
Clinical phenotypes were identified using a clustering strategy that generated four and five phenotypes. Each strategy identified 1) a low lung health score phenotype, 2) a younger, well-nourished, male-dominated class, 3) various high lung health score phenotypes which varied in terms of age, gender and nutritional status. This multidimensional clinical phenotyping strategy identified classes with expected microbiology results and low risk clinical phenotypes with pancreatic sufficiency. This approach demonstrated regional adult CF clinical phenotypes using non-parametric, continuous, ordinal and categorical data with a minimal amount of subjective data to identify clinically relevant phenotypes. These studies identified the relative stability of the phenotypes, demonstrated specific phenotypes consistent with published findings and identified others needing further study.
Douglas Conrad is a Professor of Medicine at the University of California, San Diego and is Director of the UCSD Adult CF Program. He is very active in CF translational studies including those associated with describing the CF airway inflammatory response, CF airway microbial ecology and most recently the CF airway metabolome. His most recent work on CF clinical phenotyping supports the successful application of these personalized medicine technologies.
Barbara Bailey is associate professor at the department of mathematics and statistics at the San Diego State University. Her research interests include Nonlinear Time Series, Dynamical Systems, and Clouds, Visualization of Nonlinear Models, Environmental Monitoring, Population Dynamics and Embryonic Mortality, and Model Validation.

February 27, 2015 - Kristin E. Lauter, Ph.D. - Principal Researcher and Research Manager, Cryptography Group, Microsoft Research
Homomorphic Encryption: Privacy in Genomic Computation

Abstract: Genomic data is a goldmine for exploring and searching for causes and cures for disease. But exposing one’s own genomic data to researchers and to large databases presents personal privacy risks. Government regulations limit the ways in which genomic data can be shared and computed on, which constrains progress to an extent. However, new techniques from cryptography which allow data to be encrypted in a form that it can also be computed on without requiring decryption can potentially provide a path forward. In this talk, we will discuss the progress and the limits of homomorphic encryption and how it can potentially be used to outsource computation on genomic data, even to untrusted parties.
Bio: Kristin Lauter is a Principal Researcher and Research Manager for the Cryptography group at Microsoft Research. Her personal research interests include homomorphic encryption and cloud security and privacy, including privacy for healthcare. Lauter is a Fellow of the American Mathematics Society and currently serving as President of the Association for Women in Mathematics, and on the Council of the AMS. She is also an Affiliate Professor in the Department of Mathematics at the University of Washington. She completed her PhD in mathematics at the University of Chicago in 1996, and she was T.H. Hildebrandt Assistant Professor of Mathematics at the University of Michigan (1996-1999). In 2008, Lauter, together with her coauthors, was awarded the Selfridge Prize in Computational Number Theory.

February 20, 2015 - Yang Huang, Ph.D. - Director, Research & Development Medical Informatics, Kaiser Permanente Southern California
Manabu Torii, Ph.D. - Scientist, Research & Development Medical Informatics, Kaiser Permanente Southern California
Adapting a General-Purpose Clinical Natural Language Processing (NLP) Pipeline nQuiry to Detect Heart Disease Risk Factor
Abstract: Heart disease costs about 600,000 lives and over 108.9 billion dollars every year in the United States. Leveraging text analytics, the rich information in electronic medical records (EMRs) can be used to identify relevant risk factors to assist effective risk assessment, which is critical to prevention, care, and treatment planning. The 2014 i2b2/UTHealth Challenge brought together researchers and practitioners of clinical natural language processing (NLP) to tackle problems of common interest, which included a track focusing on the identification of heart disease risk factors reported in EMRs. We participated in this track and developed an NLP system for risk factor detection using an existing general-purpose NLP pipeline nQuiry with other existing tools and resources in the public domain. Our system was based on hybrid of machine-learning and rule-based components and achieved an overall F1 score of 0.9185, with recall of 0.9409 and precision of 0.8972.
Yang Huang is a Director of Research & Development Medical Informatics at Kaiser Permanente Southern California (KPSC). He has been the technical lead and Principle Investigator of multiple projects developing and implementing clinical Natural Language Processing (NLP) and data mining applications since joining Kaiser seven and a half years ago. He received a Ph.D. in Biomedical Informatics in 2007 and M.S. in Computer Science in 2002 both from Stanford University. His research interests include parsing, modality detection, concept-mapping, sense-disambiguation and data mining of biomedical documents.
Manabu Torii is currently a scientist in the Medical Informatics group at Kaiser Permanente Southern California (KPSC), working on clinical natural language processing (NLP) applications. Before joining KPSC, he worked as a research assistant professor in the Imaging Science and Information Systems Center at Georgetown University Medical Center and the Center for Bioinformatics and Computational Biology (CBCB) at the University of Delaware. He received his PhD in Computer Science from the University of Delaware in 2006. His research interests include information extraction, document classification, and other related problems in the biomedical and clinical domain.

February 13, 2015 - Shuang Wang, Ph.D. - Postdoctoral Researcher, Division of Biomedical Informatics, University of California, San Diego
Protecting Medical Data Privacy in Biomedical Research Studies
Abstract: In this presentation, I will review my research on protecting medical data privacy in biomedical research studies. My research has a particular emphasis on leveraging cryptographic protocols (e.g., homomorphic encryption and secure multi-party computation) and privacy-preserving methods (e.g., differential privacy, data access sharing models) to protect medical data privacy. I will present three data sharing models for achieving privacy-preserving medical data analysis, in terms of data archives, data enclaves, and distributed data computation.
Bio: Dr. Shuang Wang is a postdoctoral researcher at the Division of Biomedical Information at University of California San Diego. He received his Ph.D. degree in Electrical and Computer Engineering from the University of Oklahoma. His backgrounds are in data privacy, data compression and machine learning. His current research interests focus on the medical and genome data privacy, genomic data compression, and GPU based high performance computing

February 6, 2015 - Andrew I. Su, Ph.D. - Associate Professor, Department of Molecular and Experimental Medicine, Scripps Research Institute
Crowdsourcing and Citizen Science for Biology
Abstract: In this talk, I will review my research on building and applying bioinformatics infrastructure for biomedical discovery. My research has a particular emphasis on leveraging crowdsourcing for genetics and genomics. Representative projects include the Gene Wiki, BioGPS, MyGene.Info, and Mark2Cure, each of which engages “the crowd” to help organize biomedical knowledge. These resources are collectively used millions of times every month by members of the research community, by students, and by the general public.
Bio: Professor Su is Associate Professor at the Scripps Research Institute in the Department of Molecular and Experimental Medicine (MEM). He earned his PhD in chemistry also from the Scripps Research Institute and joined the Genomics Institute of the Novartis Research Foundation (GNF) before he joined the Scripps as a faculty. He currently serves in Scientific Advisory Board of the Gene Ontology Consortium, Editorial Advisor of journal BMC Genomics, and Gene Wiki Editor.

January 30, 2015 - Jeffrey S. Grethe, Ph.D. - Associate Director, Center for Research in Biological Systems, University of California, San Diego
Cooperative and Collaborative Data and Resource Discovery Platforms for Scientific Communities — The Neuroscience Information Framework (NIF) and SciCrunch
Abstract: Data and information on research resources are everywhere, in numerous repositories and download sites, and more floods in every day. What’s a researcher to do? In order to be able to use shared data, the first fundamental rule is that you have to be able to find it. We have search engines like Google for web documents, PubMed and Google Scholar for articles, NCBI for selected genomics resources. The Neuroscience Information Framework (NIF; was instantiated in 2006 in response to a Broad Agency Announcement from the NIH Blueprint for Neuroscience Research citing an overwhelming need for an ”information framework for identifying, locating, and characterizing neuroscience information”. NIF was tasked with surveying the neuroscience resource landscape and developing a resource description framework and search strategy for locating, accessing and utilizing research resources, defined here as data, databases, tools, materials, literature, networks, terminologies, or information that can accelerate the pace of neuroscience research and discovery. NIF adds value to these existing biomedical resources by increasing their discoverability, accessibility, visibility, utility and interoperability, regardless of their current design or capabilities and without the need for extensive redesign of their components or information models. Unlike more general search engines, NIF provides deeper access to a more focused set of resources that are relevant to neuroscience, provides search strategies tailored to neuroscience, and also provides access to content that is traditionally “hidden” from web search engines. To accomplish this, NIF has deployed an infrastructure allowing a wide variety of resources to be searched and discovered at multiple levels of integration, from superficial discovery based on a limited description of the resource (NIF Registry), to deep content query (NIF Data Federation). It is currently one of the largest sources of biomedical information on the web, currently searching over 13,000 research resources in its Registry, and the contents of 250+ data resources comprising more than 800 million records in its Data Federation.
Building on the NIF infrastructure, SciCrunch was designed to help communities of researchers create their own portals to provide access to resources, databases and tools of relevance to their research areas. A data portal that searches across hundreds of databases can be created in minutes. Communities can choose from our existing SciCrunch data sources and also add their own. SciCrunch was designed to break down the traditional types of portal silos created by different communities, so that communities can take advantage of work done by others and share their expertise as well. SciCrunch currently supports a diverse collection of communities in addition to NIF, each with their own data needs: CINERGI – focuses on constructing a community inventory and knowledge base on geoscience information resources; NIDDK Information Network (dkNET) – serves the needs of basic and clinical investigators by providing seamless access to large pools of data relevant to the mission of The National Institute of Diabetes, Digestive and Kidney Disease (NIDDK); Research Identification Initiative (RII) – aims to promote research resource identification, discovery, and reuse.
Bio: Dr. Jeffrey S. Grethe, Ph.D. is a Principal Investigator (MPI) for the Neuroscience Information Framework (NIF; and the NIDDK Information Network (dkNET; in the Center for Research in Biological Systems (CRBS; at the University of California, San Diego. Following a B.S. in Applied Mathematics from the University of California, Irvine, he received a doctorate in neurosciences with a focus on neuroinformatics and computational modeling from the University of Southern California. Throughout his career, he has been involved in enabling collaborative research, data sharing and discovery through the application of advanced informatics approaches. This started at USC with his involvement in the Human Brain Project and continues today with his work on NIF, dkNET and with standards bodies such as the International Neuroinformatics Coordinating Facility.

​January 23, 2015 - Kamalika Chaudhuri, Ph.D. - Assistant Professor, Dept of Computer Science, University of California, San Diego
The Large Margin Mechanism for Differentially Private Maximization
Abstract: A basic problem in the design of differentially private algorithms, especially for statistical and machine learning tasks, is private maximization: pick an item from a universe that (approximately) maximizes a data-dependent function, all under the constraint of differential privacy. Previous algorithms for this problem are either range-dependent---i.e., their utility diminishes with the size of the universe---or apply to restricted function classes. We provide a general-purpose, range-independent algorithm for private maximization that guarantees approximate differential privacy. We demonstrate its applicability on two fundamental tasks in machine-learning -- classifier learning, and frequent itemset mining. Joint work with Daniel Hsu and Shuang Song
Bio: Prof. Kamalika Chaudhuri is Assistant Professor at the Department of Computer Science and Engineering, UC San Diego. Her research is on machine learning. Much of her work is on privacy-preserving machine learning and unsupervised learning. She is also broadly interested in a number of topics in learning theory, such as confidence-rated prediction, online learning, and active learning. She had her PhD degree from the Electrical Engineering and Computer Science Department, UC Berkeley, in 2007.

​January 16, 2015 - Lucila Ohno-Machado, MD, PhD, MBA - Professor and Division Chief, Division of Biomedical Informatics, University of California San Diego
Integrating Informatics with Clinicians, Researchers, and Educators: Goals for the New Health System Department of Biomedical Informatics
Abstract: Biomedical informatics has expanded significantly since it was founded in 2009 at UCSD with primary support from the Department of Medicine and the Clinical and Translational Research Institute. The newly formed  Department of Biomedical Informatics is starting a new phase of growth. Adding to our extensive informatics research and training portfolio, we will be recruiting key leaders in information technology and clinical informatics to our team. We will introduce informatics innovations to our clinical settings, promote information flow for decision makers, and help bridge gaps between clinical and research faculty. We will be implementing and evaluating the application of innovative software and processes, collaborating and instrumenting comparative effectiveness research, clinical and observational trials, translational science in addition to expanding our informatics R&D team. In this presentation, I will quickly review the mission, vision, and goals for our department, present opportunities for participation in current projects, and outline our plans for our next five years.
Bio: Dr. Ohno-Machado is Professor of Medicine and Chief, Division of Biomedical Informatics at the Department of Medicine and Associate Dean for Informatics and Technology of the School of Medicine, UCSD. Her research has been focused on construction and evaluation of novel data mining and decision support tools for biomedical research and clinical care, with particular emphasis on prognostication, as well as privacy technology tools to enable data sharing. As associate dean for informatics at UCSD, she also oversees the development and implementation of information systems for clinical quality improvement and health services research and directs the Informatics Core of the CTSA. UCSD’s clinical data warehouse for research was developed under her supervision and is managed by her team. She founded UC-Research eXchange, a University of California-wide initiative to integrate data warehouses from their five medical centers (UCLA, UCSF, UC Irvine, UC Davis, UCSD), and now serves as a member of the steering committee. The VA data warehouse, the VA Informatics and Computational Infrastructure (VINCI), UC-ReX, and USC-led clinics in LA formed pSCANNER in 2014, a PCORnet clinical data research network covering records of over 21 million patients.

January 9, 2015 - Edna Shenvi, MD, MAS, Postdoctoral Fellow, Division of Biomedical Informatics, University of California, San Diego
Toward Reductions in Inpatient Diagnostic Error through Provider Feedback
Abstract: Diagnostic error is a significant cause of healthcare harm, and providers often do not know the outcomes of their diagnostic decisions due to healthcare fragmentation. We developed a conceptual framework of the outcomes of diagnostic error in the inpatient setting, and performed a scoping review to compile record screening criteria that can detect such errors. This ongoing development of a diagnostic error trigger tool can be useful for further analysis of diagnostic error and for providing automated, filtered patient outcome feedback to clinicians. We also performed a survey of current residents about their behaviors of learning the outcomes of patients they have handed off to other teams, to better understand the perceptions of doctors in training of the needs in this area.
Bio: Dr. Shenvi is a postdoctoral scholar at UCSD in DBMI. She completed her MD at the University of Southern California Keck School of Medicine, two year appointments in general surgery residency at Boston University and Brigham & Women’s Hospital, and a MAS degree in Clinical Research at UCSD.