Summer Internship Program

Projects in our 10-week summer internship program cover a range of topic areas and vary according to the intern’s background and interests. Past projects have included predictive modeling for personalized medicine; privacy technology; natural language processing; image processing and retrieval; integration of genotypic, phenotypic, and behavioral patient data for discovery; and informed consent ontology and tools.

  • Open to:
    • undergraduate students
    • graduate students
    • postgraduates who have received their masters or doctorate within the past five years
  • The most highly qualified applicants have at least basic experience in computer science, human-computer interaction, or related disciplines
  • A stipend is paid sufficient for living expenses in San Diego
  • Must be at least 18 years of age by the program start date
  • Must be currently enrolled or employed at a university or other research institution
  • Must currently reside and be eligible to work in the United States


February 23Application Period opens
March 23Applications will no longer be accepted
April 13Notification of acceptance or waitlist status
April 20Response required from intern to guarantee participation
June 1/June 14*Program begins with kickoff meeting
July 13Midterm evaluations
August 3Final presentations
August 4/17*Program concludes

* Students who are unavailable due to their school calendar for the June 1st kickoff meeting will start on June 14th and finish on August 17th.

The Summer Internship Application Form is available for download on this link: 2018 DBMI summer internship application (v4).docx

Participating Faculty and Project Information

Tsung-Ting Kuo (Tim)

Project 1: Developing privacy preserving predictive modeling algorithms on Blockchain networks

Predictive modeling can advance research and facilitate quality improvement initiatives and substantiate research results, especially when data from multiple healthcare systems can be included. However, current, state-of-the-art privacy-preserving predictive modeling frameworks are still centralized, in other words, the models from distributed sites are integrated in a central server to build a global model. This centralization carries several risks, e.g., single-point-of-failure at the central server. To improve the security and robustness of predictive modeling frameworks, we will develop and implement novel and advanced algorithms on decentralized Blockchain networks (a distributed ledger/database technology adopted by the Bitcoin cryptocurrency) to build better models. The outcome will be algorithms that improve the predictive power of data from multiple healthcare systems through a distributed system.

Lucila Ohno-Machado

Project 1: Precision medicine participant engagement and data analysis

The Precision Medicine Initiative (PMI) is a bold research effort to revolutionize how we improve health and treat disease. The PMI aims to leverage advances in genomics, emerging methods for managing and analyzing large data sets while protecting privacy, and health information technology to accelerate biomedical discoveries. As part of this project, you would be part of a team focused on finding innovative ways to enroll participants into the program, assist in creation of marketing materials, organize and keep secure participant data, and run statistical analysis.

 Project 2: DataMed data discovery index

The bioCADDIE effort to build the DataMed Data Discovery Index resulted in a prototype search engine for finding biomedical datasets. Part of this project involves understanding the FAIR data principles and how they apply to datasets. You would be involved with analyzing dataset access controls, such as data use agreements and informed consent forms. We are working on an accessibility analysis of some datasets in dbGaP and develop a rating mechanism. This will also involve producing a publication with evaluation mechanisms and criteria, as well as preliminary results. Interested applicants should have a background in health policy, medicine or bioethics.

 Xiaoqian Jiang

Project 1: Distributed and decentralized missing data imputation

The goal of this project is to develop privacy-preserving distributed methods and tools for handling missing data in a distributed environment. Missing data is a big issue for data analysis, which needs to be handled carefully to preserve information as much as possible. The proposed research will enable researchers to use more complete data in their research by leveraging information from multiple collaborators without compromising patient privacy and help lower regulatory and other hurdles in collaboration across multiple institutions. It might improve the validity, robustness and generalizability of research findings, and offer substantial benefits in areas including, but not limited to, precision medicine and biomedical informatics practice. We will develop novel algorithms to enable missing data imputation for horizontally partitioned data.

 Jejo Koola

Project 1: Wearable sensors and mobile health to detect hepatic encephalopathy

Cirrhosis, irreversible scarring of the liver, causes significant morbidity and mortality due to decreased mental, physical, and biochemical function.  The prevalence is estimated between 400,000 and 3,000,000 persons in the United States, and the disease causes 44,000 deaths annually. Hepatic encephalopathy, a spectrum of potentially reversible neuropsychiatric abnormalities, occurs in 30 to 45 percent of patients with cirrhosis. Hepatic encephalopathy frequently leads to hospital admission if uncaught and left untreated, costing $1.6 billion to $2.0 billion annually. Furthermore, hepatic encephalopathy is a frequent contributor to hospital readmission in cirrhosis. Reducing the morbidity of hepatic encephalopathy involves identifying the decompensating patient early to institute escalation or change of treatment earlier. Increasing effort has been placed to identifying decompensating patients in the home setting. Home monitoring has usually taken the form of telemonitoring, particularly for heart failure. For hepatic encephalopathy, nascent work has been conducted in using mobile health through a smart phone application. However, the platform still requires intense monitoring of multiple parameters and involves volitional participation of the patient and a caregiver. Volitional participation may be particularly challenging for patients slipping into worsening stages of hepatic encephalopathy. A possible solution is the use of passive sensors: activity and exercise monitors, location trackers, heart rate sensors. The use of passive sensor technology has been tested on detecting mood changes in psychiatric disorders. This interaction between health and technology has been termed the "digital phenotype." We propose to use mobile health technology and wearable sensors to monitor patient's activity levels, vital signs, technology interaction, and sleep hygiene as a means of "phenotyping" covert hepatic encephalopathy or early grade overt hepatic encephalopathy.

 Project 2: Designing and evaluating the Big Data consult service

In 2012 the Institute of Medicine released a desiderata for a learning healthcare system, where evidence informs practice and practice informs evidence. Though the randomized clinical trial (RCT) serves as the gold standard for informing clinical decisions, flaws exist in terms of achieving recruitment, overly stringent inclusion/exclusion criteria, and lack of patient-centered decision making. Observational cohort studies have grown as an important complement to RCTs allowing comparative effectiveness research and patient-centered trials. The surge of Electronic Health Records (EHR) and its resulting zettabyte of data allows us to realize this vision for the first time. To ameliorate these difficulties, we propose to launch and study a novel "informatics consult" service. The service would allow clinicians, when no clear evidence-based guidelines exist regarding care decisions, to query the UCSD clinical data warehouse by identifying patients similar to the index case. Such a system would leverage our ability to truly deliver personalized, patient-centered care. We note several novel challenges to this proposed system: (i) performing semi-automated phenotyping so that we can identify clinical outcomes of interest; (ii) identifying patients that are similar to the index patient (often called clustering); (iii) incorporating automated, computable search regarding guideline recommended care; (iv) performing visual analytics to understand similarity of cohorts; (v) communication of probability and statistical information to healthcare professionals so they can effectively manage uncertainty. The service is initially intended to be provider-facing, i.e. a consult report is generated to help the healthcare provider make a medical decision. However, we envision that the same process will also be used to generate a patient-facing report, one that would help the patient better understand their disease and their treatment options based on other patients similar to themselves.

 Shuang Wang

Project: Development efficient methods to protect genome data privacy and security
Description: Genome data sharing is essential to promote scientific discovery, improve healthcare quality, and support meaningful use. However, data privacy is a major concern that may hinder genome data sharing and health information exchange for research. In this project, students will utilize cryptographic technologies, such as secure multiparty computation and homomorphic encryption to develop new tools to safeguard genome data analysis in public cloud or across different institutions.

To see past interns, please use the links below

2017 Summer interns

2016 Summer Interns

2015 Summer Interns

2014 Summer Interns

2013 Summer Interns 

2012 Summer Interns

2011 Summer Interns