Zhixiang Lin, PhD
Abstract: In the first part of the talk, a dimension reduction method will be introduced where we extend Principal Component Analysis to propose AC-PCA for simultaneous dimension reduction and Adjustment for Confounding variation. We show that AC-PCA can adjust for variations across individual donors present in a human brain dataset. For gene selection purposes, we extend AC-PCA with sparsity constraints, and propose and implement an efficient algorithm. The second part of the talk will be focused on clustering methods in single cell genomics. In single cell genomics, it is technically challenging to obtain chromatin accessibility and gene expression data for the same cell. We have developed a computational approach to this problem, where a model-based clustering method is proposed to match cell sub-populations in these two data types. We also demonstrate that using one data type can guide clustering of the other data type. Our proposed Bayesian model accounts for the stochasticity due to biological and technical effects. Last, methodologies motivated by spatial temporal modeling of gene expression dynamics during human brain development will be briefly discussed.
Bio: Dr. Zhixiang Lin studied biological sciences at Tsinghua University (BS, 2010), computational biology & bioinformatics and statistics at Yale University (PhD, 2015). He is a postdoctoral scholar at Stanford University, Department of Statistics since 2015. His major research area is statistical genetics/genomics and computational biology. His work has been published in prestigious journals such as PNAS, Biometrics, Annals of Applied Statistics and Cell.
BRF (Biomedical Research Facility) 1102
01/19/2018 10:00 am – 11:00 AM
Zihuai He, PhD
Abstract: Predicting the functional consequences of genetic variants is a challenging problem, especially for variants residing in non-coding regions. Projects such as ENCODE and Roadmap Epigenomics make available various epigenetic features, including histone modifications and chromatin accessibility, genome-wide in over a hundred different tissues and cell types. Meanwhile, recent developments in high-throughput assays to assess the functional impact of variants in regulatory regions (e.g. massively parallel reporter assays - MPRA, CRISPR/Cas9-mediated in situ saturating mutagenesis) can lead to the generation of high quality data on the functional effects of selected variants. We propose a semi-supervised approach, referred to as GenoNet, to jointly utilize experimentally confirmed regulatory variants (labeled variants), millions of unlabeled variants genome-wide, and more than a thousand cell type/tissue specific functional annotations on each variant to predict functional consequences of non-coding genetic variants. Through the application to several experimental datasets, we demonstrate that the proposed method significantly improves prediction accuracy compared to existing functional prediction methods, both at the organism level and at the tissue/cell type level. We further show that eQTLs and dsQTLs in specific tissues tend to be substantially more enriched among variants with high GenoNet scores, and how the GenoNet scores can be used to map regulatory variants in regions of interest, evaluate 3C interaction variants and aid in the discovery of disease associated genes through an integrative analysis of lipid phenotypes using a Metabochip dataset on 12,281 individuals.
Bio: Dr. Zihuai He received his Ph.D. in Biostatistics at the University of Michigan, and BS (Bachelor of Science) at Tsinghua University in China. He is currently a post-doctoral research scientist in the Department of Biostatistics at Columbia University. His research has been concentrated in the area of statistical genetics and integrative analysis of omics data. There have been 11 peer-reviewed journal publications generated from his work published in prestigious journals of genetics and statistics, such as The American Journal of Human Genetics, Journal of the American Statistical Association, and Biometrics. He has developed three R packages with efficient computational techniques that facilitate integrative analysis in a broad range of genomic study designs such as longitudinal studies, family studies and meta-analysis of multiple sequencing studies. At Columbia, he also collaborates with researchers in the GTEx Consortium for gene expression studies.
BRF (Biomedical Research Facility) 1104
01/12/2018 8:00 AM – 9:00 AM
Ling Ma, PhD
Abstract: In longitudinal studies, it is often of interest to investigate how the functional feature of a marker’s measurement process is associated with the event time of interest. We make use of B-splines to smoothly approximate the infinite dimensional functional data and propose a joint model of the longitudinal functional feature and the time to event. The proposed approach also allows for prediction of survival probabilities for future subjects based on their available longitudinal measurements and a fitted joint model. We illustrate our proposals on a prospective pregnancy study, namely Oxford Conception Study, where hormonal measurements of luteinizing hormone which is an important biomarker of ovulation is available. A joint modeling approach using functional analytic approach and discrete survival modeling was used to assess whether the functional feature of hormonal measurements, such as the curvature of the hormonal profile is associated with time to pregnancy.
Bio: Dr. Ling Ma received her PhD in Statistics from University of Missouri, Columbia in 2014. She then worked at the Eunice Kennedy Shriver National Institute of Child Health and Human Development (NICHD) as a postdoctoral fellow for two years before joining Clemson University as an assistant professor. Dr. Ma’s primary methodological research interests are survival analysis with special emphasis on interval-censored data and panel count data, joint modeling of longitudinal and time-to-event data. Dr. Ma has worked on statistical methods with applications to reproductive and environmental epidemiology, cancer, HIV, etc.
01/05/2018 10:00 AM – 11:00 AM
Xiaohui Niu, PhD
Abstract: Enhancer sequences contain short DNA motifs that act as binding sites for sequence-specific transcription factors. The crucial roles of enhancers in generating cell-type and state-specific transcriptional programs, further understanding of the process of enhancer transcription and its contribution to the overall functionality of enhancers will offer crucial insights into gene regulation, cell identity control, development and disease. However, this is a challenging problem because the very long distance of enhancer with its target gene increases the searching difficulty. Moreover, unlike promoter located in the upstream of its target gene, enhancer can act its regulatory role bidirectionally, which makes the problem more challenging. To address this need, we propose a novel hybrid convolutional and Gated Recurrent Unit (GRU) recurrent neural network framework for predicting enhancer de novo from sequence. In the model, the convolution layer captures regulatory motifs, while the recurrent layer captures long-term dependencies between the motifs in order to learn a regulatory ‘grammar’ to improve predictions. This model improves considerably in several benchmark datasets.
Bio: Dr. Xiaohui Niu is a visiting scholar in Department of Family Medicine and Public Health at UC San Diego and an associate professor in Huazhong Agricultural University in China. His research interests are machine learning methods and their applications in Bioinformatics, especially proteomics, including protein (gene) function prediction, protein binding site prediction, methods to construct phylogenetic tree, protein structure prediction etc.
12/06/2017 1:00 PM – 2:00 PM
Larry Shen, PhD
Abstract: In this presentation I discuss statistical integrity and some ethical issues in our practice in the bio-pharmaceutical industry. Some common ethical issues include data integrity, validity of statistical testing and conclusions, presentation of data, and post-hoc analyses. I am going to highlight a few ethical guidelines from the American Statistical Association and use a few case examples to illustrate ethical dilemmas that we often face in our daily work.
Bio: Dr. Larry Shen has a highly accomplished career in leading clinical organizations to support drug development and clinical research programs. He has directly worked on over 20 investigational new drug projects and played leading roles in regulatory submissions that had led to 6 drug approvals in both the US and European Union. He has authored or co-authored many articles on statistical methodology and their applications to drug development. His work on dose titration received the Thomas W Teal award at the 2007 Drug Information Association annual meeting. Dr. Shen also served as past President of the San Diego Chapter of the American Statistical Association (ASA). In 2014, Dr. Shen was elected as fellow of the ASA for his leadership in applying statistics to drug development and for his contributions to the statistics profession. Prior to co-founding Pharmapace, Dr. Shen was Vice President at Amylin Pharmaceuticals in charge of their clinical development organizations including Statistics, Programming, Data Management, PK/PD modeling, and Medical Writing. He had worked at Amylin since 1997 and had implemented rigorous procedures for data processing, analyses, and reporting to ensure data integrity and statistical excellence. Under his leadership, his department had played a critical role in the development and approval of four innovative medicines. Dr. Shen obtained his Ph.D. in Statistics from the University of California at Berkeley and both BS and MS degrees in mathematics/statistics from Beijing University in China.
11/01/2017 1:00 PM – 2:00 PM
Wesley Thompson, PhD
Abstract: Modern large-scale observational psychiatric studies collect data in a plethora of modalities, including questionnaires, structured clinical interviews, life histories, and many biological variables, including, e.g., structural and functional brain imaging, genetics, inflammatory measures. An important goal of such studies is to obtain a biological foundation for psychiatric diagnoses that are predictive of outcomes and/or response to specific treatments. However, a major difficulty in analyzing data from these studies is reducing dimensionality via revealing latent structures that inform about relationships across modalities, while simultaneously accounting for "batch" effects and method variance within modalities of measurement. Here, we present a Bayesian multi-level model that uncovers both shared and idiosyncratic factors within blocks (data modalities). We demonstrate that this methodology is effective in uncovering latent structure and predicting clinical outcomes in the T-1000 data, a large-scale of psychiatric disorders collecting data in scores of domains, including structural and functional imaging.
Bio:Dr. Wesley Thompson earned his Ph.D. in Statistics from Rutgers University in 2003, with a focus on statistical methods for longitudinal data analysis. He was appointed Assistant Professor of Statistics and Psychiatry at the University of Pittsburgh in 2005, where he received a five year NIH K25 Career Development Award to develop novel methods for studying co-variation in brain function and depression. Dr. Thompson joined UCSD in 2008, and is currently an Associate Professor of Family Medicine and Public Health within the Division of Biostatistics and Bioinformatics. His current work involves Bayesian semi-parametric and mixture models with applications to (i) improving effect size estimation, replication, and prediction in genome-wide association studies, (ii) predicting onset of illness from multivariate biomarker trajectories, (iii) applications of to functional and structural MRI data.
10/04/2017 1:00 PM – 2:00 PM
Hannah Carter, PhD
Abstract: Recent studies have characterized the extensive somatic alterations that arise during cancer and various studies have probed rare inherited mutations that lead to early onset cancer syndromes. However, little is understood about the role of genetic background in ‘sporadic’ adulthood cancers. It is possible that the somatic evolution of a tumor may be significantly affected by inherited polymorphisms carried in the germline. To investigate this, we analyzed genomic data for thousands of tumors from The Cancer Genome Atlas to reveal and systematically validate hundreds of genetic interactions between germline polymorphisms and major somatic events, including tumor formation in specific tissues and alteration of specific cancer genes. Among germline–somatic interactions, we found germline variants in RBFOX1 that increased incidence of SF3B1 somatic mutation by 8-fold via functional alterations in RNA splicing. Similarly, 19p13.3 variants were associated with a 4-fold increased likelihood of somatic mutations in PTEN. In support of this association, we found that PTEN knockdown sensitizes the MTOR pathway to high expression of the 19p13.3 gene GNA11. Finally, we observed that stratifying patients by germline polymorphisms exposed distinct somatic mutation landscapes, implicating new cancer genes. Our findings suggest that individual genomic data can help to forecast the trajectory of tumor evolution, including where and how cancer develops, opening avenues for prevention research.
Bio:Dr. Hannah Carter is an Assistant Professor in the UCSD Department of Medicine. She received her M.Eng in Electrical Engineering at the University of Louisville and her PhD in Biomedical Engineering from Johns Hopkins University. The Carter Lab uses bioinformatics and computational biology to study the role of inherited and acquired genetic variation in cancer. The goals of her research are to advance precision cancer medicine by developing approaches to discriminate drivers from passengers, predict cancer cell-specific therapeutic vulnerabilities and identify germline variation that contributes to the emergence or progression of tumors. Dr. Carter is a Siebel Scholar and a recipient of a 2013 NIH Director’s Early Independence Award.
09/06/2017 1:00 PM – 2:00 PM
(Joint work with Lin Liu and Loki Natarajan)
Karen Messer, PhD
Abstract: We address the practical problem of model selection in the presence of imputation for missing data. Our focus is on valid inference, in particular on confidence intervals that incorporate both the imputation mechanism and the model selection mechanism. We investigate commonly used resampling-based approaches - multiple imputation and the bootstrap - and incorporate Efron's 2014 computationally efficient variance estimate for bootstrap-smoothed estimates. We compare the resulting `Efron's rules' estimator to a 'Rubin's rules' estimator based on multiple imputation. These turn out to be versions of frequentist model averaged estimators, and are compared to an un-averaged selection estimator using the framework of Claeskens and Hjort. Simulation and real data examples are drawn from the related literature. Practical recommendations are given, including circumstances where the new Efron's rules estimator is seen to work well.
Bio: Dr. Karen Messer is Professor and Chief of Division of Biostatistics and Bioinformatics in Department of Family Medicine and Public Health at UCSD. She went to Clairemont high school here in San Diego, and she was an undergraduate math major at Harvard university. Dr. Messer got her PhD in mathematical statistics at UCSD under Dr. Murray Rosenblatt. Before joining UCSD in 2006, Dr. Messer was assistant professor of mathematics at UCLA, then associate professor and professor of mathematics at California State University Fullerton.
04/12/2017 1:00 PM – 2:00 PM
Victor De Gruttola, ScD
Abstract: Recent developments in biomedical science, such as those in molecular epidemiology and surveillance, vaccinology, and antimicrobial treatment, can greatly aid in devising effective responses to epidemic and endemic diseases. To take maximal advantage of such successes requires advances in quantitative science that combine across different disciplinary domains. For example in investigation and scale-up of HIV prevention interventions, challenges arise from the complex dependencies that characterize data from clinical studies and that reflecting the spread of HIV along sexual contact networks. Both randomized and observational studies often collect data on HIV incidence in different subpopulations, risk behavior, and viral genetic sequences. New methods are required to make maximal use of this very useful, but incomplete information to estimate quantities that will be useful in guiding scale-up of successful interventions. These include not only effects of randomized interventions—for trials randomized at both individual and cluster level-- but also expected effects under implementation policies likely to be used practice. We propose methods that make use of baseline data to improve estimation of intervention effects and of their modification by factors measured at individual and network levels. We show their advantages in settings with complete or missing data—for design of both randomized and observational studies with or without missing data. Cluster randomized trials are also useful for controlling outbreaks. We propose and demonstrate properties a novel design for settings like the Ebola epidemic, where a proof-of-principle vaccine trials provided evidence of efficacy, but where questions remain about the effectiveness of different possible modes of implementation. Our goal for these studies is not only to generate information about intervention effects but also to provide public health benefit. To do so, we leverage information about contact networks – in particular the degree of connection across randomized units obtained at study baseline – and develop a novel class of connectivity-informed cluster trial designs. We investigate the performance of these designs in terms of epidemic control outcomes (time to end of epidemic and cumulative incidence) and power to detect intervention effect, by simulating vaccination trials during an SEIR-type epidemic outbreak using a network-structured agent-based model.
Bio: Dr. Victor De Gruttola Professor of Biostatistics Department of Biostatistics Harvard T.H. Chan School of Public Health Dr. Victor De Gruttola has spent the past 30 years working with junior colleagues and in collaborating with clinical and laboratory investigators to develop and apply methods for advancing the HIV prevention and treatment research agendas. He also has managed large projects devoted to improving the public health response to the AIDS epidemic, both within the US and internationally. The aspects of the HIV epidemic on which he has worked include transmission and natural history of infection with the Human Immunodeficiency Virus (HIV), as well as investigation of antiretroviral treatments, including the development and consequences of resistance to them. The broad goals of his research have included developing treatment strategies that provide durable virologic suppression while preserving treatment options after failure, and evaluating the community-level impact of packages of prevention interventions, including antiviral treatment itself. He served as the Director of the Statistics and Data Analysis Center of the Adult Project of the AIDS Clinical Trials Group during the period in which highly active antiretroviral treatment was developed, and was instrumental in designing and analyzing studies of the best means of providing such therapy. He has also served as the Co-PI (with Max Essex) for a cluster-randomized trial of an HIV combination prevention program in Botswana. His methods research activity is focused HIV prevention research, especially with regard to the development of methods for analyses of sexual contact networks, for viral genetic linkage analyses in the presence of missing data, and for improving validity and efficiency of analyses of HIV prevention trials.
03/17/2017 11:00 AM – 11:45 PM
Kyle Hasenstab, PhD
Abstract: Event-related potential (ERP) studies are a set of experimental frameworks that use electroencephalography (EEG) to study the electrical potential outputted by a subject's brain when presented with an implicit task in the form of stimuli. Data consist of a temporally recorded functional ERP curve repeatedly observed over a sequence of stimuli and across a set of electrodes placed on the scalp, producing a complex data structure consisting of a functional (ERP curve), longitudinal (stimulus repetition), and spatial (electrode) dimension. In typical ERP studies, the dimension of data is reduced into a single measure for each subject by cross-sectionally averaging ERP across longitudinal and spatial repetitions in order to increase the signal-to-noise ratio of the ERP function. Features are then extracted from the averaged ERP and analyzed using simple statistical methods, ignoring additional information that may be found in the collapsed dimensions. In this talk, I discuss methodology for preserving and analyzing the lost dimensions of ERP data. In particular, I focus on multidimensional functional principal components analysis (MD-FPCA), a two-step procedure used to summarize important characteristics across all three dimensions of the ERP data structure into an interpretable, low-dimensional form. MD-FPCA is applied to a study on neural correlates of visual implicit learning in young children with autism spectrum disorder (ASD). Application of the proposed methods reveal meaningful trends and substructures in the implicit learning processes of ASD children when compared to typically developing controls. Results indicate proposed methodology effectively preserves important information contained within the multiple dimensions of ERP data.
Bio:Dr. Kyle Hasenstab recently earned his PhD in Statistics from the University of California, Los Angeles where he researched methods for analyzing data from EEG experiments to study implicit learning in children with autism spectrum disorder. He has worked as a postdoctoral fellow at the Centers for Disease Control and Prevention in their Chronic Viral Diseases Branch --and is currently working as a statistician for AT&T.
03/15/2017 2:00 PM – 3:00 PM
David Azriel, PhD
Abstract: Motivated by a data set obtained from brain imaging, we study inference of high-dimensional observations without assuming a sparse parameter space. Our approach starts from computing the z-scores at each cortical voxel. The result is a large strongly-dependent vector of observations, assumed to be Gaussian. We study two issues: first, we investigate the empirical distribution of this vector and its possible departure from a standard normal distribution. Second, we study inference of linear-projections of this vector. Our analysis shows that the global null hypothesis (when there is no dependence between the response and the measurements) is not likely to be true. Furthermore, we find that the effect is widespread (non-sparse) but not large enough to be significant anywhere.
Bio:Dr. David Azriel is a senior lecturer in statistics at the Technion - Israel Institute of Technology since 2015. Previously, he was a postdoc with Larry Brown at Wharton at the University of Pennsylvania. He completed his PhD thesis at the Hebrew University in Jerusalem in 2012. His research interests are in high dimensional data, model selection and optimal clinical trial design.
03/01/2017 1:00 PM – 2:00 PM
Bin Nan, PhD
Abstract: Estimation of change-point locations in the broken-stick model has significant applications in modeling important biological phenomena. In this talk, Dr. Nan will present a computationally economical likelihood-based approach for estimating change-point(s) efficiently in both cross-sectional and longitudinal settings. The method, based on local smoothing in a shrinking neighborhood of each change-point, is shown via simulations to be computationally more viable than existing methods that rely on search procedures, with dramatic gains in the multiple change-point case. The proposed estimates are shown to have root-n consistency and asymptotic normality--in particular, they are asymptotically efficient in the cross-sectional setting--allowing us to provide meaningful statistical inference. As the primary and motivating longitudinal application, a two change-point broken-stick model appears to be a good fit to the Michigan Bone Health and Metabolism Study cohort data to describe patterns of change in log estradiol levels, before and after the final menstrual period. A plant growth dataset in the cross-sectional setting is also illustrated. This is a joint work with Rito Das, Mouli Banerjee, and Huiyong Zheng.
Bio: Dr. Bin Nan is Professor of Biostatistics and Statistics at the University of Michigan. He received his Ph.D. in Biostatistics from the University of Washington in 2001 and joined the faculty at the University of Michigan in the same year. Dr. Nan's research interests are in various areas of statistics and biostatistics including semiparametric inference, failure time and survival analysis, longitudinal data, missing data and two-phase sampling designs, and high-dimensional data analysis. He is collaborating in many studies in areas of epidemiology, bioinformatics, and brain imaging, particularly in cancer, HIV, women's health, and neurodegenerative diseases. He is Fellow of the American Statistical Association and Fellow of the Institute of Mathematical Statistics.
02/15/2017 2:00 PM - 3:00 PM
Chiung-Yu Huang, PhD
Abstract: Although recurrent event data analysis is a rapidly evolving area of research, rigorous studies on modeling and estimation of the effects of time-varying covariates on the risk of recurrent events have been lacking. Existing methods for analyzing recurrent event data usually require that the covariate processes are observed throughout the entire follow-up period. However, covariates are often observed periodically rather than continuously. We propose a novel semiparametric estimator for the regression parameters in the popular proportional rate model. The proposed estimator is based on an estimated score function where we kernel smooth the mean covariate process. We show that the proposed semiparametric estimator is asymptotically unbiased, normally distributed and derive the asymptotic variance. Simulation studies are conducted to compare the performance of the proposed estimator and the simple methods carrying forward the last covariates. The different methods are applied to an observational study designed to assess the effect of Group A streptococcus (GAS) on pharyngitis among school children in India.
Bio: Dr. Huang is Associate Professor of Oncology and Biostatistics at the Johns Hopkins University. Her main area of research is in general biostatistics methodology and its application to the biomedical sciences. She has extensive experience in the statistical analysis of survival outcomes, recurrent events, competing risks, longitudinal measurements, missing data, biased sampling, and design and monitoring of clinical trials.
02/09/2017 2:00 PM - 3:00 PM
David Benkeser, MPH, PhD
Abstract: In many studies, multiple instruments are used to measure different facets of an unmeasured outcome of interest. For example, in studies of childhood development, children are administered tests in several areas and researchers combine these test scores into a univariate measure of neurocognitive development. Researchers are interested in predicting this development score based on household and environment characteristics early in life in order to identify children at high risk for neurocognitive delays. We propose a method for estimating the combined measure that maximizes predictive performance. Our approach allows modern machine learning techniques to be used to predict the combined outcome using potentially high-dimensional covariate information. In spite of the highly adaptive nature of the procedure, we nevertheless obtain valid estimates of the prediction algorithm’s performance for predicting the combined outcome as well as confidence intervals about these estimates. We illustrate the methodology using longitudinal cohort studies of early childhood development.
Bio: David Benkeser, PhD, MPH is a post-doctoral researcher under Mark Van der Laan in the Division of Biostatistics at the University of California, Berkeley where he works on developing methods for machine learning, causal inference, and the integration of the two fields. He obtained his PhD from the Department of Biostatistics at the University of Washington where his research focused on causal inference in complex longitudinal settings with applications in preventive vaccine efficacy trials for infectious diseases.
01/24/2017 2:00 PM - 3:00 PM
Colleen Kelly, Ph.D.
Kelly Statistical Consulting
Abstract: In 2000, Duane Steffey and I founded the SDSU Consulting Center and developed a Statistical Consulting course in response to university and private consulting requests and a desire to better train our graduate students for careers as applied statisticians. In the successive 15 years, I became increasingly devoted to statistical consulting as a career and eventually left academia for a career in consulting. In 2009, I founded Kelly Statistical Consulting. My industrial consulting experience has revised my vision of what is important to teach in a consulting course. In this talk, I present the common elements to most statistical consulting courses and how my presentation of these elements has evolved over the last 15 years. I discuss the (sometimes hard) lessons learned, and what I believe to be the key elements of a successful course.
Bio: Dr. Colleen Kelly is an Accredited Professional Statistician™ and has over 25 years of statistical consulting experience as a statistical consultant, professor, and researcher specializing in statistical methodology for clinical trials and biotechnology. As a tenured associate professor of statistics at San Diego State University, Dr. Kelly co-founded and co-directed the university’s statistical consulting center. At Victoria University in Wellington, New Zealand, she directed the university’s statistical consulting center and developed and taught their statistical consulting course. Currently, she heads Kelly Statistical Consulting, Inc., which provides statistical services to biotechnology, pharmaceutical and medical device companies.
12/07/2016 1:00 PM - 2:00 PM
Michelle Lacey , Ph.D.
Associate Professor of Biostatistics, Tulane University
Abstract: Variation in cytosine methylation at CpG dinucleotides is often observed in genomic regions, and analysis typically focuses on estimating the proportion of methylated sites observed in a given region and comparing these levels across samples to determine association with conditions of interest. While sites are typically treated as independent, when observed at the level of individual molecules methylation patterns exhibit strong evidence of local spatial dependence. We previously introduced a neighboring sites model to account for correlation and clustering behavior observed in two tandem repeat regions in a collection of ovarian carcinomas. We now introduce an extension of the model that accounts for the effect of distance between sites. We apply our model to data from a whole genome sequencing experiment using overlapping 300-bp reads, demonstrating its ability to detect distance-weighted effects in regions with intermediate levels of methylation.
Bio: Michelle Lacey earned her PhD in Statistics from Yale University and joined the Tulane faculty in 2003. She is currently appointed as Associate Professor of Mathematics and Adjunct Associate Professor of Biostatistics at Tulane University, and in addition to regularly teaching graduate courses in statistical modeling and data analysis for the School of Science and Engineering she is a contributing lecturer for courses at the Tulane University School of Medicine and the School of Public Health and Tropical Medicine. Dr. Lacey directs the Tulane Cancer Center Genomics Analysis Core to provide statistical support to researchers conducting high-throughput experiments, and she maintains an independent research program in epigenetic modeling and analysis. She also collaborates with researchers in the school of Science and Engineering and has recently established a consulting relationship with the World Food Programme to assist in the development of statistical methods for modeling and analysis of food security survey data.
11/02/2016 1:00 PM - 2:00 PM
Hans J. Skaug, Ph.D.
Professor of Statistics, Department of Mathematics, University of Bergen
Abstract: I will describe a flexible framework for doing empirical Bayes inference in general mixed models. The marginal likelihood is evaluated using the Laplace approximated, and optimized using a Newton-type method. The technical details of the Laplace approximation is hidden from the user via a technique called Automatic Differentiation. The approach has been implemented in the software package TMB (https://github.com/kaskr/adcomp). TMB is an R package, but links to C++ code for evaluation of the joint (in fixed and random effects) likelihood. I will discuss how TMB can be used to build mixed model R packages, and give examples of such R packages.
Bio: Hans J. Skaug is Professor in statistics at the Department of Mathematics, University of Bergen. He received his PhD (Dr. Scient) in 1994. His field of research is statistical ecology and computational statistics.
10/26/2016 1:00 PM - 2:00 PM
Fabian Telschow, Ph.D.
University of Goettingen, Germany
Abstract: In gait analysis of the knee joint data are curves in the group of 3×3 rotation matrices. We introduce and study S-equivariant functional models (viz., Gaussian perturbations of a center curve) and provide a uniform strongly consistent estimator for the center curve. Here S is a certain Lie group, which models the effect of different marker placements and self-chosen walking speeds in real gait data. Moreover, we provide novel estimators correcting for different marker placements and walking speeds and provide different statistical tools to analyze such data, for example, simultaneous confidence sets and permutation tests. The methods are applied to real gait data from an experiment studying the effect of short kneeling.
Bio: Fabian Telschow got his PhD degree from the University of Goettingen. His supervisor was Stephan Huckemann. He developed statistical tools for the analysis of biomechanical gait data in cooperation with the biomechanist Michael Pierrynowski from McMaster University, Canada. During his studies for the Msc. degree in pure math in Göttingen (specialized in the intersection of algebraic topology and geometry) he worked part-time in the group of Axel Munk as a student research assistant on statistical analysis of 2D-NMR spectroscopy. His current research interests are real world applications of non-euclidean statistics, especially, if the data are curves.
10/05/2016 1:30 PM - 2:30 PM
Loki Natarajan, Ph.D.
Professor of Biostatistics and Bioinformatics, UCSD Department of Family Medicine and Public Health
Abstract: Leading a healthy lifestyle can positively impact health, and reduce the risk of cancer, cardiovascular disease, and other chronic diseases. Health behaviors include many modifiable factors such as physical activity, diet, sleep, and smoking. This multiple exposure-multiple outcomes aspect of health behavior research calls for novel statistical approaches for study design and data analysis. In this talk we will discuss some of these approaches.
In the first part of the talk, we will present a “biobridge” design for a lifestyle intervention trial. Specifically, we will develop an analytic method to calculate a weighted risk score from several intermediate outcomes, and discuss how to quantify future clinical benefit through intervention-related changes on this risk score. Relative weights for the intermediate outcomes are derived by comparing a disease model conditional on the joint distribution of these outcomes to the corresponding marginal models. We will show analytically and via simulations that using marginal parameters as the weights in the risk score, and ignoring inter-correlations amongst the outcomes, yields biased estimates. Our proposed weighted risk score corrects for these biases. We will apply this method to design a weight-loss intervention trial with multiple biomarker outcomes.
In the second part of the talk, we will discuss the use of Bayesian networks to model multiple health behaviors and outcomes. Bayesian networks are a probabilistic machine learning approach which can be used to model multivariate relationships and represent them via intuitively meaningful graphs. We will apply this method to a sample of 333 overweight post-menopausal breast cancer survivors to model associations between BMI, lifestyle behaviors (alcohol intake, smoking, physical activity, sedentary behavior, sleep quality), psychosocial factors (depression, quality of life), biomarkers (insulin, C-reactive protein), demographics (age, education), and tumor factors. Using these networks, we will quantify the strength of association and infer (conditional) dependencies amongst these variables. Our results demonstrate that Bayesian networks could be a powerful exploratory tool for health behavior research.
09/28/2016 1:00 PM - 2:00 PM
Lu Tian, Sc.D.
Professor and Vice Chair, UCLA Department of Biostatistics, UCLA Jonathan and Karin Fielding School of Public Health
Abstract: In a randomized clinical trial with the time to event as the primary endpoint, one often evaluates the treatment effect by comparing the survival distributions from two groups. This can be achieved by for example estimating the hazard ratio under the popular proportional hazards (PH) model. However, when the hazard rate is very low, e.g., in safety studies, there may be too few observed events to warrantee the valid asymptotical inferences under the PH regression. The exact inference including hypothesis testing and constructing 95% confidence interval for the treatment effect is desired. In this paper, we have developed exact inference procedure for estimating the treatment effect based on the difference in restricted mean survival time between two arms, which is more appealing than hazard ratio in many applications. The proposed procedure is valid regardless of the number of events. We have also performed a simulation study to examine the finite sample performance of the proposed method.
Bio: Dr. Lu Tian received my Sc.D. in Biostatistics from Harvard University. He has considerable experience in statistical methodological research, planning large epidemiological studies, performing data management for randomized clinical trials and conducting applied data analysis. My current research interests are in developing statistical methods in personalized medicine, survival analysis, meta analysis and high throughput data analysis.
09/07/2016 1:00 PM - 2:00 PM
Thomas R. Belin, PhD
Professor and Vice Chair, UCLA Department of Biostatistics, UCLA Jonathan and Karin Fielding School of Public Health
Abstract: Accepting the need to enunciate ethical principles in the field of statistics, how might it be possible to encompass the scope and generality of what we do into a complete yet digestible set of guidelines? Drawing on reflections by leading statisticians about the nature of our work, scientific insights regarding how the human condition induces imperatives for people to communicate with one another, game-theory perspectives on competition and cooperation, and other philosophical discourse on the ethics of interpersonal interactions, it is argued that trust and understanding are essential core principles that can serve as the basis for judging whether a statistical approach is ethical. The framework's simplicity makes it easy to communicate, its generality gives it power, and its positive-sum appeal could be used to promote professional identity development around ethics. The presentation will also consider connections between this framework and principles for analyzing incomplete data, where the dual goals of reflecting all available information and accurately representing uncertainty have parallels to cultivating understanding and cultivating trust. Recent efforts to develop flexible joint-modeling strategies to handle highly multivariate data sets with a broad array of data types will also be discussed.
Bio: Thomas R. Belin, Ph.D. is a Professor in the UCLA Department of Biostatistics with a joint appointment in the UCLA Department of Psychiatry and Biobehavioral Sciences. He started at UCLA in 1991 after receiving his Ph.D. that year from Harvard University, working with Donald Rubin in the Harvard Department of Statistics on incomplete-data problems related to the decennial census in the United States. Specializing in statistical analysis with missing data and related extensions to causal inference, he has supervised over a dozen doctoral dissertations and was recognized in 2015 by the UCLA Public Health Student Association for "Outstanding Advising and Mentorship for Ph.D. and Dr.P.H. Students". He also serves as Vice Chair of the UCLA Department of Biostatistics, and his professional activities include being a member since 2014 of the American Statistical Association Committee on Professional Ethics. He was elected Fellow of the American Statistical Association in 2004, and in 2005 he received the Washington (D.C.) Statistical Society Gertrude M. Cox Award honoring a statistician making "significant contributions to statistical practice."
07/06/2016 1:00 PM - 2:00 PM
Soojin Roh, PhD
Abstract: Data assimilation is a statistical method to combine the output from numerical models with observations to give an improved forecast. The ensemble Kalman filter is a widely used data assimilation method in diverse areas such as weather forecasting and aerospace tracking. In this talk I will discuss the ensemble Kalman filter and some practical issues. I will then discuss a robust ensemble Kalman filter.
Bio: Dr. Soojin Roh received her PhD in Statistics from Texas A&M University. She is currently a lecturer in the Department of Statistics at Rutgers University. Her research interests include spatial statistics, data assimilation, robust estimation.
05/18/2016 2:00 PM - 3:00 PM
Mingan (Mike) Yang, PhD
Abstract: In parametric hierarchical models, it is standard practice to place mean and variance constraints on the latent variable distributions for the sake of identifiability and interpretability. Because incorporation of such constraints is challenging in semiparametric models that allow latent variable distributions to be unknown, previous methods either constrain the median or avoid constraints. In this article, we propose a centered stick-breaking process (CSBP), which induces mean and variance constraints on an unknown distribution in a hierarchical model. This is accomplished by viewing an unconstrained stick-breaking process as a parameter-expanded version of a CSBP. An efficient blocked Gibbs sampler is developed for approximate posterior computation. The methods are illustrated through a simulated example and an epidemiologic application.
Bio: Dr. Mingan Yang is an Assistant Professor of Biostatistics at graduate school of public health, San Diego State University. Upon graduation, he completed a postdoctoral research at Duke University and NIEHS, NIH, under the supervision of Dr. David Dunson. He specializes in Bayesian Statistics, Computational statistics, survival analysis, latent variable models, variable selection, and mixed effects models. He develops statistics methodology research with emphasis to address problems arising from health and medicine. Some research results are published in statistical journals such as Biometrics, Psychometrika, Computational Statistics & Data Analysis, and Biometrical Journal etc.
04/06/2016 1:00 PM - 2:00 PM
Armin Schwartzman, PhD
Abstract: Large scale multiple testing problems, such as in brain imaging and genomics, base their inference on a large number of z-scores. If most effects are null, it seems natural that the empirical distribution of z-scores should follow a standard normal distribution. But should it? In this talk Dr. Schwartzman will show two ways in which the empirical distribution of z-scores can be deceiving, because of correlation and mixture. First, following Efron’s (2007) conjecture, Dr. Schwartzman shows that even if the z-scores are standard normal, the empirical distribution may depart from it, due to strong correlation caused by hidden random effects. Instead, it may be approximated by a Gaussian mixture that generalizes Efron’s empirical null distribution. Second, Dr. Schwartzman shows that if the original data is a Gaussian mixture, then within-class standardization using a template-based EM algorithm produces z-scores whose empirical distribution looks standard normal. However, their true distribution has in fact lighter tails.
03/16/2016 2:00 PM - 3:00 PM
Thuan Nguyen, PhD
Abstract: Model search strategies play an important role in finding simultaneous susceptibility genes that are associated with a trait. More particularly, model selection via the information criteria, such as the BIC with modifications, have received considerable attention in quantitative trait loci (QTL) mapping. However, such modifications often depend upon several factors, such as sample size, prior distribution, and the type of experiment, e.g., backcross, or intercross. These changes make it difficult to generalize the methods to all cases. The fence method avoids such limitations with a unified approach, and hence can be used more broadly. In this talk, the method is studied in the case of backcross experiments (BE). In particular, a variation of the fence, called restricted fence (RF), is applied to BE, and its performance is evaluated and compared with the existing methods. Furthermore, we incorporate our recently developed strategy for model selection with incomplete data, known as the E-MS algorithm, with the RF to address the common missing value concerns in BE. Our study reveals some interesting findings in association with the missing data mechanisms. The proposed method is illustrated with a real data analysis involving QTL mapping for an agricultural study on barley grains.
03/03/2016 1:00 PM - 2:00 PM
Xin Tu, PhD
Abstract: Modern statistical methods provide a powerful tool to address complex statistical issues arising in clinical and translational research. However, the predominant statistical paradigm is only applicable to modeling relationships defined by within-subject attributes such as alcohol use and suicide from the same subject. Many relationships of interest in the age of the internet and mobile technology involve variables measuring between-subject attributes such as human interaction and such attributes are not amenable to treatment by conventional statistical models. In this talk, I will discuss a class of functional response models (FRM) to address this fundamental limitation in the current statistical paradigm. The between-subject attribute is not a concept unique to timely issues such as modeling human interaction in social networks, but is actually a fundamental barrier to understanding many classic statistical methods in order to extend them to address their limitations when applied to cutting-edge statistical problems in clinical and translational research. I will illustrate the FRM using a wide range of topics with both real and simulated data.
02/25/2016 1:00 PM - 2:00 PM
Mi-Ok Kim, PhD
Abstract: Developing the health information technology infrastructure to support comparative effectiveness research (CER) was a core objective of the American Recovery and Reinvestment Act of 2009. Many research networks, each including between 11,000 and 7.5 million patients each and more than 18 million in total, have established and numerous CER studies have been conducted. As compared to randomized clinical trials, these studies are less resource demanding and quickly collect data that are more representative of routine clinical care in large cohorts of patients over a long period of follow-up. Their utility, however, is restricted by the fact that treatment choice is affected by known or unknown prognostic factors, and consequently treatment groups are not directly comparable. This situation known as confounding by indication for treatment may render observational studies invalid and irrelevant unless properly addressed. Proper treatment of confounding is further complicated in data obtained from registries, network databases or the Electronic Health Record (EHR) where subjects or patients are commonly clustered in ways that may be relevant to the analysis. We will extend propensity score (PS) methodology and related sensitivity analysis to address measured and unmeasured confounding in the clustered data with the following aims:
Aim 1: Investigate how to optimally extend the PS methodology and identify what works best when
Aim 2: Develop a novel sensitivity analysis approach
Aim 3: Identify valid and most efficient PS methods for two existing CER studies.
We will use Monte Carlo computer simulation studies and real data including two existing CER studies. The real data examples will provide clinically plausible and interesting hierarchical data contexts and inform the design of the computer simulation studies about various types of outcomes that comprehend typical features of patient reported outcomes (PROs).
02/04/2016 1:30 PM - 2:30 PM
Nancy Reid, PhD
University Professor of Statistical Sciences,Canada Research Chair in Statistical Theory and Applications Director, Canadian Statistical Sciences Institute Department of Statistical Sciences, University of Toronto
Biography: Dr. Nancy Reid is University Professor and Canada Research Chair in Statistical Methodology at the University of Toronto. Her research interests are in statistical theory, likelihood inference, and design of studies. Along with her colleagues she has developed higher order asymptotic methods both for use in applications, and as a means to study theoretical aspects of the foundations of inference, including the interface between Bayesian and frequentist methods. She is the Director of the Canadian Statistical Sciences Institute.
Dr. Reid received her PhD from Stanford University, under the supervision of Rupert Miller. She taught at the University of British Columbia before moving to the University of Toronto, and has held visiting positions at the Harvard School of Public Health, University of Texas at Austin, Ecole Polytechnique Federale de Lausanne, and University College London.
She has been President of the Institute of Mathematical Statistics and the Statistical Society of Canada, and Vice-President of the International Statistical Institute. She is a Fellow of the American Association for the Advancement of Science, the Royal Society of Canada and the Royal Society of Edinburgh. In December 2014 she was appointed Officer of the Order of Canada.
Abstract: The Canadian Statistical Sciences Institute and the Fields Institute for Research in the Mathematical Sciences recently completed a six month thematic research program with this title. I will give an overview of the topics covered with emphasis on linkages between different areas, common problems, and common strategies. While the program was only able to cover a small fraction of the world of “Big Data”, the breadth of the material covered by the large number of speakers was very stimulating.
12/02/2015 1:00 PM - 2:00 PM
Matthew Cefalu, PhD
Associate Statistician, RAND Corporation
Biography: Dr. Matthew Cefalu is an Associate Statistician at the RAND Corporation, where his research is primarily focused on the development and application of novel methods for causal inference. Examples of past and present research projects include the Health-Related Behaviors Survey of Military Personnel, an independent assessment of the VA healthcare system, and the CAHPS Hospital Survey. Dr. Cefalu received his PhD in Biostatistics from Harvard University in 2013.
Abstract: There is a vast literature on estimating causal effects from observational data, and the majority of these methods focus on estimating marginal treatment effects (i.e. treatment effects in the entire population). However, it is often of interest to identify subpopulations for whom the treatment is most effective. We will use locally weighted quantile regression, where locality is based on the propensity score, to identify if treatment effect heterogeneity is present. This method will be illustrated using data from a study assessing the efficacy of Motivational Enhancement Therapy-Cognitive Behavioral Therapy 5 in treating adolescents with cannabis-related disorders.
11/04/2015 1:00 PM - 2:00 PM
Pablo Tamayo, PhD
Professor, Division of medical Genetics, Department of Medicine, UC San Diego Medical School, Moores Cancer Center at UC San Diego Health
Biography: Dr. Pablo Tamayo is a Cancer Researcher at UC San Diego Moores Cancer Center and a Professor at the UCSD School of Medicine. Prior to UCSD he worked as a senior computational biologist at the Broad Institute of MIT and Harvard, as a consulting member of staff for the Advanced Analytics group at Oracle Corp., as senior researcher and chief scientist of Thinking Machines Corp., at the Theoretical Division (T-8) of the Los Alamos National Laboratory and as a research assistant at Boston University. He obtained a Ph.D. in Statistical Physics and a B.S. in Physics Engineering. During the last two decades he has worked on the study of cancer pathways, models of oncogene activation, models of pharmacological response, discovery of disease subtypes and integrated models to delineate and characterize cellular cancer states. He has been an original contributor to the development of many genomic data analysis methods including Gene Set Enrichment Analysis (GSEA), the Molecular Signatures Database (MSigDB) and the GenePattern Analysis Environment. His most recent work has focused on the development of experimental and computational models of oncogenic transformation, cancer vulnerabilities and catalogs of oncogenic states. He has also worked on an information-theoretic approach to find associations and co-analyze diverse types of cancer data with different statistical properties. He has published over 130 articles with over 35,000 citations. His publication list can be found in:
Abstract: Systematic efforts to sequence the cancer genome have identified many of the recurrent mutations and copy number alterations in tumors. However, in many cases the role(s) played by these alterations is not obvious and necessitates an effective functional characterization of the pathways and networks that these genomic alterations regulate. Here we introduce REVEALER (Repeated Evaluation of VariablEs conditionAL Entropy and Redundancy), an analysis method that enables the discovery of an ensemble of mutually exclusive genomic alterations correlated with “functional” phenotypes, e.g., the activation or dependency of oncogenic pathways. We use REVEALER to identify complementary genomic alterations that account for a large fraction of the ”activated” or “dependent” samples with respect to four targets: the transcriptional activation of β-catenin and NRF2, MEK-inhibitor sensitivity, and KRAS dependency. REVEALER was able to “re-discover” several known features, as well as identify a number of novel associations, demonstrating the power of using information-theoretic association metrics to combine functional profiles with extensive characterization of alterations in cancer genomes.
10/07/2015 1:00 PM - 2:00 PM
Christine McLaren, PhD
Professor, Epidemiology, School of Medicine,Vice Chair for Academic Affairs, Epidemiology, Scientific Member, Genetic Epidemiology Research Institute, Director of Biostatistics, Chao Family Comprehensive Cancer Center, University of California, Irvine
Biography: Dr. Christine McLaren is Professor and Vice Chair of the Department of Epidemiology in the School of Medicine at the University of California, Irvine. Dr. McLaren is also co-Leader of the Program in Cancer Prevention, Outcomes, and Survivorship and a member of the Biostatistics Shared Resource of the Chao Family Comprehensive Cancer Center, at UC Irvine. Dr. McLaren has focused on statistical modeling research and has concentrated on two important areas: (1) statistical modeling of biomedical data and (2) collaborative research in cancer. She is an elected Fellow of the American Statistical Association, in part for “innovative research in biology and medicine”. Dr. McLaren is Principal Investigator of the NIH/NDDK R24 grant, “Genetic Modifiers of Iron Status in Hemochromatosis HFE C282Y Homozygotes”.
Abstract: Approximately one million people in the United States are at risk for development of iron overload, attributable primarily to the genetic disorder known as hemochromatosis. In the NIH-funded Hemochromatosis and Iron Overload Screening (HEIRS) Study, 101,168 multi-ethnic participants in primary care were screened for iron overload and hemochromatosis. Dr. McLaren will describe her role as PI of a Field Center for the HEIRS Study and her contributions to study design and analyses. Her team enrolled over 20,000 primary-care patients in UC Irvine primary-care clinics and in community clinics throughout Orange County. She will also describe subsequent statistical studies designed to answer the question “What role do genetic modifiers play in determining iron accumulation in persons homozygous for the HFE C282Y genotype”.
09/02/2015 1:00 PM - 2:00 PM
Danh V. Nguyen, PhD
Professor, Medicine (Biostatistics), Department of Medicine, Division of General Internal Medicine, Director, Biostatistics, Epidemiology & Research Design Unit, UCI Institute for Clinical and Translational Science, University of California, Irvine
Biography: Danh Nguyen, PhD, is Professor in the Department of Medicine, Division of General Internal Medicine and Director of the Biostatistics, Epidemiology & Research Design (BERD) Unit in the Institute for Clinical and Translational Science, University of California Irvine. Prior to joining UC Irvine in 2013, he was Professor in the Division of Biostatistics, Department of Public Health Sciences, at UC Davis from 2003
Abstract: Cardiovascular disease and infection are major factors for morbidity and mortality in patients on dialysis. Hospitalization data from United States Renal Data System (USRDS) captures nearly all (> 95%) patients with end-stage renal disease in the U.S., the largest source of research data available for this population. Although the precise mechanisms by which infection may affect cardiovascular events are not fully known, infections may affect vascular endothelium, create a chronic sub-clinical inflammatory state that affects atherosclerosis, or may create a procoagulant state. Thus, we hypothesize that the time period following infection are associated with increased cardiovascular event risk. The self-controlled case series, or simply called case series, design/method and analysis of infection-cardiovascular risk in patients on dialysis using USRDS data presents several unique challenges, including (1) the timing of infection (or exposure) onset error since the time of infection is not known precisely, (2) misspecification of risk period, (3) and other inferential challenges, such as formal hypothesis testing. In this talk I will discuss current resolutions/developments for some of these challenges related to case series analysis and open topics in other areas of applications.
07/01/2015 1:00 PM - 2:00 PM
Damla Senturk, PhD
Associate Professor, Department of Biostatistics, School of Public Health University of California, Los Angeles
Abstract: Differential brain response to sensory stimuli is very small (a few microvolts) compared to the overall magnitude of spontaneous electroencephalographam (EEG), yielding a low signal-to-noise ratio (SNR) in studies of event-related potentials (ERP). To cope with this phenomenon, stimuli are applied repeatedly and the ERP signals arising from the individual trials are averaged at the subject level. This results in loss of information about potentially important changes in the magnitude and form of ERP signals over the course of the experiment. In this paper, we develop a meta-preprocessing step utilizing a moving average of ERP across sliding trial windows, to capture such longitudinal trends. We embed this procedure in a weighted linear mixed effects model to describe longitudinal trends in features such as ERP peak amplitude and latency across trials while adjusting for the inherent heteroskedasticity created at the meta-preprocessing step. The proposed unified framework, including the meta-processing and the weighted linear mixed effects modeling steps, is referred to as MAP-ERP (Moving-Averaged-Processed ERP). We perform simulation studies to assess the performance of MAP-ERP in reconstructing existing longitudinal trends and apply MAP-ERP to data from young children with autism spectrum disorder (ASD) and their typically developing counter parts to examine differences in patterns of implicit learning, providing novel insights about the mechanisms underlying social and/or cognitive deficits in this disorder.
Biography: Dr. Damla Senturk received her Ph.D. degree in Statistics from UC Davis in 2004 and joined the faculty in the Department of Statistics at Pennsylvania State University. She joined the faculty of the UCLA Department of Biostatistics in 2011 where she has been an Associate Professor in Residence since July 1st of 2013. Her main areas of statistical methodology research are longitudinal and functional data analysis, semiparametric adjustments in regression modeling and measurement error models. Her main collaborative research areas include psychiatry and nephrology.
06/03/2015 1:00 PM - 2:00 PM
James Fowler, PhD
Professor, Medical Genetics Division, Department of Medicine Political Science Department, Division of Social Sciences Dept. of Family Medicine & Public Health University of California, San Diego
Abstract: From Framingham to Facebook, we have used a variety of social networks to measure, analyze, and change the effect of social networks on health. In this talk I will discuss a number of papers using different methods to better understand how networks function and what we can do to use them to make people healthier.
Biography: Dr. James Fowler earned a PhD from Harvard in 2003 and is currently a Professor at the University of California, San Diego. His work lies at the intersection of the natural and social sciences, with a focus on social networks, behavior, evolution, politics, genetics, and big data. Dr. Fowler was named a Fellow of the John Simon Guggenheim Foundation, one of Foreign Policy's Top 100 Global Thinkers, TechCrunch's Top 20 Most Innovative People, Politico's 50 Key Thinkers, Doers, and Dreamers, and Most Original Thinker of the year by The McLaughlin Group. He has also appeared on The Colbert Report. His research has been featured in numerous best-of lists including New York Times Magazine's Year in Ideas, Time's Year in Medicine, Discover Magazine's Year in Science, and Harvard Business Review's Breakthrough Business Ideas. Together with Nicholas Christakis, James wrote a book on social networks for a general audience called Connected. Winner of a Books for a Better Life Award, it has been translated into twenty languages, named an Editor's Choice by the New York Times Book Review, and featured in Wired, Oprah's Reading Guide, Business Week's Best Books of the Year, and a cover story in New York Times Magazine.
05/06/2015 1:00 PM - 2:00 PM
Yong Chen, PhD
Assistant Professor, Division of Biostatistics, University of Texas School of Public Health
Abstract: Over the past few decades, a dramatic increase in the incidence of obesity has become a worldwide health issue, contributing significantly as a risk factor of many diseases. Many individuals participate in web-based weight loss programs where their weights, physical activities and diets are self-reported. Such web-based program generated data poses new challenges to statistical modeling and inference, including subject-specific self-reporting times and outcome-dependent missingness. These challenges are known as biased sampling problem in statistical literature, and can lead to substantial bias in inference. In this talk, we propose a framework of novel statistical methods to efficiently detect and adjust for sampling bias, and to evaluate both the overall effectiveness of the weight loss program and the subject-specific effects of website usages on weight loss. The proposed methods provide elegant solutions for detecting and eliminating the impacts of biased sampling, and can achieve unbiased inference without fully specifying the complex data-generating mechanism. We apply the proposed methods to evaluate the effectiveness of a web-based program on weight loss, controlling the nonlinear trajectory of weights over time.
04/15/2015 1:00 PM - 2:00 PM
Tanya P. Garcia, PhD
Assistant Professor, Texas A&M University, School of Public Health
Biography: Tanya is an Assistant Professor in the Department of Epidemiology and Biostatistics at Texas A&M University, Health Science Center, School of Public Health. Previously, she worked in the Bioinformatics Training Program at Texas A&M University. She received a Ph.D. in Statistics from Texas A&M University in 2011 under the advisement of Prof. Yanyuan Ma. She earned a B.S. in Mathematics from the University of California, Irvine in 2003, an M.S. in Industrial Engineering and Operations Research from the University of California, Berkeley in 2005, and an M.S. in Statistics from the University of Western Ontario in 2006. Her research interests include genetic mixture models, high-dimensional inference, measurement error, mixed models, neurodegenerative diseases, nonparametric models, semiparametric theory, measurement error, and survival analysis.
Abstract: An important goal in clinical and statistical research is estimating the distribution for clustered failure times, which have a natural intra-class dependency and are subject to censoring. We propose to handle these inherent challenges with a novel approach that does not impose restrictive modeling or distributional assumptions. Rather, using a logit transformation, we relate the distribution for clustered failure times to covariates and a random, subject specific effect such that the covariates are modeled with unknown functional forms, and the random effect is distribution-free and potentially correlated with the covariates. Over a range of time points, the model is shown to be reminiscent of an additive logistic mixed effect model. Such a structure allows us to handle censoring via pseudo-value regression and develop semiparametric techniques that completely factors out the unknown random effect. We show both theoretically and empirically that the resulting estimator is consistent for any choice of random effect distributions and for any dependency structure between the random effect and covariates. Lastly, we illustrate the method's utility in an application to the Cooperative Huntington's Observational Research Trial data, where our method provides new insights into differences between motor and cognitive impairment event times in genetically predisposed Huntington patients.
04/06/2015 1:00 PM - 2:00 PM
Nicholas J. Schork, PhD
Adjunct Professor of Psychiatry and Biostatistics, University of California, San Diego
Biography: Nicholas J. Schork is a Professor and Director of Human Biology at the J. Craig Venter Institute (JCVI) and the Head of Integrated Genomics at Human Longevity, Inc. (HLI). He is also an adjunct Professor of Psychiatry and Family and Preventive Medicine (Division of Biostatistics) at the University of California, San Diego (UCSD). Prior to joining JCVI, Dr. Schork was, from 2007-2013, a Professor, Molecular and Experimental Medicine, at The Scripps Research Institute (TSRI), Director of Biostatistics and Bioinformatics at the Scripps Translational Science Institute (STSI), and Director of Research at Scripps Genomic Medicine, a division of Scripps Health. From 2001-2007 Dr. Schork was a Professor of Biostatistics and Psychiatry, and Co-Director of the Center for Human Genetics and Genomics, at UCSD. From 1994-2000, he was an Associate Professor of Epidemiology and Biostatistics at Case Western Reserve University in Cleveland, Ohio, and an Adjunct Associate Professor of Biostatistics at Harvard University. During 1999 and 2000, Dr. Schork took a sponsored leave of absence from CWRU to conduct research as the Vice President of Statistical Genomics at the French biotechnology company, Genset, where he helped guide efforts to construct the first high-density map of the human genome.
Dr. Schork’s interests and expertise are in quantitative human genetics and integrated approaches to complex biological and medical problems, especially the design and implementation of methodologies to dissect the determinants of complex traits and diseases. He has published over 450 scientific articles and book chapters on the analysis of complex, multifactorial traits and diseases. A member of several scientific journal editorial boards, Dr. Schork is a frequent participant in U.S. National Institutes of Health-related steering committees and review boards, and has founded or served on the advisory boards of ten companies. In addition, he is currently director of the quantitative components of a number of national research consortia, including the NIA-sponsored Longevity Consortium and the NIMH-sponsored Bipolar Consortium. Dr. Schork earned the B.A. in Philosophy, M.A. in Philosophy, M.A. in Statistics, and Ph.D. in Epidemiology, all from the University of Michigan in Ann Arbor. .
Abstract: There is a great deal of attention surrounding ‘individualized,’ ‘personalized,’ and/or ‘precision’ medicine. Much of this attention has been motivated by technological advances in genetic and related molecular assays that have provided researchers with an unprecedented ability to identify and characterize the potentially unique determinants of an individual’s disease susceptibility. However, as promising as these technologies are, their routine use in clinical settings will be hampered until they are appropriately vetted. In this talk, a number of studies are described that consider the use of genomic profiling to further efforts in individualized medicine. Focus is on the very thorny issues these studies have been designed to address, including dealing with patient genetic background heterogeneity, matching drugs to tumor genomic profiles in real-time clinical trial settings, exploring the utility of therapeutic interventions thought to be appropriate for an individual patient based on genomic profiling and monitoring genetically susceptible individuals. There is no doubt that individualized medicine will have a positive impact on health care, but only after some of the challenges it brings have been exposed and dealt with appropriately.
03/04/2015 1:00 PM - 2:00 PM
David Rocke, PhD
Division of Biostatistics, Department of Public Health Sciences, UC Davis
Abstract: RNA-Seq data are increasingly used for whole-genome differential mRNA expression analysis in lieu of gene expression arrays such as those from Affymetrix and Illumina. We review commonly used methods for this type of analysis, including DESeq, edgeR, and Cuffdiff2, by placing them within a common framework that allows comparisons of components of the methods as well as of the overall results. We also review a number of recent studies comparing these methods in terms of false positives and sensitivity, and add additional results of our own. We show that none of the existing methods is fully satisfactory, with most identifying large numbers of genes as differentially expressed even when there are none, but some will lead to better, more reliable results than others. This area is still early in its intellectual development and is changing rapidly, so there are substantial contributions that can be made.
02/04/2015 1:00 PM - 2:00 PM
Jeremy Taylor, PhD
Department of Biostatistics, Columbia University, University of Michigan
Biography: Jeremy M G Taylor PhD is the Pharmacia Professor of Biostatistics at the University of Michigan. He obtained a Bachelor’s degree in Mathematics and a Diploma in Statistics from Cambridge University and a PhD in Statistics from University of California Berkeley. He was a faculty member in the Department of Biostatistics and the Department of Radiation Oncology at UCLA from 1983 to 1998. He is currently a faculty member in the Department of Biostatistics, the Department of Radiation Oncology and the Department of Computational Medicine and Bioinformatics and the Director of the Center for Cancer Biostatistics at the University of Michigan. He is the winner of the Michael Fry award from the Radiation Research Society and the Mortimer Spiegelman award from the American Public Health Association. He is a former Chair of the Biometrics section of the American Statistical Association and a Fellow of the ASA. He is the former chair of the Biostatistical Methods and Research Design grant review committee for the National Institutes of Health. He is currently one of the coordinating editors of Biometrics. He has 300 publications and research interests in longitudinal and survival data, cure models, methods for missing data, biomarkers, surrogate and auxiliary variables. He has worked extensively in AIDS research but currently mainly focuses on cancer research.
Abstract: Motivated by data from multiple randomized trials of colon cancer, we model time-to-cancer-recurrence and time-to-death using a multi-state model. We incorporate a latent cured state into the model to allow for subjects who will never recur. Parametric models that assume Weibull hazards and include baseline covariates are used. Information from the multiple trials are included using a hierarchical model. Bayesian estimation methods are used. The model is used to assess whether there is improved efficiency in the analysis of the effect of treatment on time-to-death in each trial by using the information provided by earlier cancer recurrence. For subjects who are censored for death, multiple imputation is used to impute death times, where the imputation distribution is derived from the estimated model. Gains in efficiency are possible, although sometimes modest, using the extra information provided by the recurrence time.
1/30/2015 1:00 PM - 2:00 PM
Giovanni Motta, PhD
Assistant Professor, Department of Statistics, Columbia University
Abstract: Epilepsy patients who are not able to adequately control their seizures with medications are sometimes treated with a neurosurgical procedure. The goal of this procedure is to remove the abnormal “epileptic” tissue causing seizures, and spare the normal tissue that is critical for brain function. However, current brain mapping technology has limited accuracy for mapping epileptic and normal brain tissue. This is especially problematic in the treatment of patients whose seizures arise from neocortex. To address these problems, we have been developing an experimental optical brain imaging technique for spatially mapping epileptic and normal cortical tissue. Better methods for the statistical analysis of the spatiotemporal optical imaging data are necessary for further development of this technique into a practical and reliable clinical tool.
In this paper we introduce a novel flexible tool, based on spatiotemporal statistical modeling of Optical Imaging, that allows for source localization of the epilepsy regions. The final goal is clustering (dimension reduction) of the pixels in regions, in order to localize the epilepsy regions for the craniectomy. We identify the spatial clusters of the pixels according to the temporal non-stationarity of the observed time series – rather than using spatial information. In a second step, we use non-parametric bootstrap and non-parametric density estimation to obtain the probabilities that a given pixel belongs to each of the clustered regions on the neocortex.
The advantage of our approach compared with previous approaches is twofold. Firstly, we use a non-parametric approach, rather than the (more restrictive) parametric or polynomial-based specification. Secondly, we provide a statistical method that is able to identify the clusters in a data-driven way, rather than the (sometimes arbitrary) ad-hoc currently used approaches.
To demonstrate how our method might be used for intra-operative neurosurgical mapping, we provide an application of the technique to optical data acquired from a single human subject during direct electrical stimulation of the cortex.
12/3/2014 1:00 PM - 2:00 PM
Hal S. Stern, PhD
Professor of Statistics and Ted and Janice Smith Family Foundation Dean
Biography: Hal Stern is professor of statistics and dean of the Donald Bren School of Information and Computer Sciences at the University of California, Irvine. Stern came to UC Irvine in 2002 as the founding chair of the Department of Statistics. The Department now has 9 faculty and more than 40 graduate students in its MS/PhD programs. In 2010 he was named Ted and Janice Smith Family Foundation Dean of the Bren School. Prior to coming to UC Irvine he had faculty appointments at Iowa State and Harvard.
Within statistics he is known for his research work in Bayesian statistical methodology and model assessment techniques. He is a co-author of the highly-regarded graduate level statistics text
Bayesian Data Analysis. Current areas of interest include applications of statistical methods in psychiatry and human behavior, atmospheric sciences, and forensic science. He is a Fellow of the American Statistical Association and the Institute for Mathematical Statistics and has served on several expert committees for the National Academies. Stern received his B.S. degree in Mathematics from the Massachusetts Institute of Technology in 1981 and the M.S. and Ph.D. degrees in Statistics from Stanford University in 1985 and 1987, respectively
Abstract: The identification of recurring patterns within a sequence of events is an important task in behavioral research. We consider a general probabilistic framework for identifying patterns by distinguishing between events that belong to a pattern and events that occur as part of background processes. Using this framework we develop an inference procedure to detect sequences present in observed data and estimate the parameters governing these sequences. The model is applied to data from a study of the impact of fragmented and unpredictable maternal behavior on cognitive development of adolescents.
11/5/2014 1:00 PM - 2:00 PM
Annette Molinaro, MA, PhD
Associate Professor in Residence, Department of Epidemiology and Biostatistics, Department of Neurological Surgery, Hellen Diller Family Comprehensive Cancer Center, University of California, San Francisco
Abstract: We recently developed partDSA, a multivariate method that, similarly to CART, utilizes loss functions to select and partition predictor variables to build a tree-like regression model for a given outcome. However, unlike CART, partDSA permits both 'and' and 'or' conjunctions of predictors, elucidating interactions between variables as well as their independent contributions. partDSA thus permits tremendous flexibility in the construction of predictive models and has been shown to supersede CART in both prediction accuracy and stability. As the resulting models continue to take the form of a decision tree, partDSA also provides an ideal foundation for developing a clinician-friendly tool for accurate risk prediction and stratification.
With right-censored outcomes, partDSA currently builds estimators via either the Inverse Probability Censoring Weighted (IPCW) or Brier Score weighting schemes; see Lostritto, Strawderman and Molinaro (2012), where it is shown in numerous simulations that both proposed adaptations for partDSA perform as well, and often considerably better, than two competing tree-based methods. In this talk, various useful extensions of partDSA for right-censored outcomes are described and we show the power of the partDSA algorithm in deriving survival risk groups for glioma patient based on genomic markers.
10/1/2014 1:00 PM - 2:00 PM
Ursula Berger, PhD
Department for Medical Informatics, Biostatistics and Epidemiology (IBE), Ludwig-Maximilians-University Munich
Abstract: We assess the effect of regional deprivation on individual mortality by making use of a natural experiment: We followed up ethnic German resettlers from Former Soviet Union countries, who were quasi-randomly distributed across the socioeconomically heterogeneous counties of Germany’s federal state North Rhine-Westphalia (NRW). This allows us to disentangle the contextual effect from compositional effects. We use data from the retrospective cohort study ‘AMOR’ on the mortality of resettlers in NRW (n=34 393). Based on the postcode of the last known residence we could link study participants to the municipalities of NRW. After a mean follow-up of 10 years, 2580 resettlers were deceased. When analyzing regional deprivation using in additive survival models, we explore the gain of more precise data on deprivation and of smaller regional entities? Our findings indicate that in terms of mortality, regional deprivation does matter.
9/2/2014 2:00 PM - 3:00 PM
Hernando Ombao, Ph.D.
Professor, Department of Statistics, University of California, Irvine
Biography: Dr Ombao's research interests include:
- Time Series Analysis
- Spatio-temporal modelling
- Statistical Learning
- Applications to Brain Science (fMRI, EEG, MEG, EROS)
6/4/2014 1:00 - 2:00 PM
David Degras, Ph.D.
Assistant Professor, Statistics Department of Mathematical Sciences DePaul University College of Science and Health
Abstract: In this paper we introduce a new hierarchical model for the simultaneous detection of brain activation and estimation of the shape of the hemodynamic response in multi-subject fMRI studies. The proposed approach circumvents a major stumbling block in standard multi-subject fMRI data analysis, in that it both allows the shape of the hemodynamic response function to vary across region and subjects, while still providing a straightforward way to estimate population-level activation. An efficient estimation algorithm is presented, as is an inferential framework that not only allows for tests of activation, but also for tests for deviations from some canonical shape. The model is validated through simulations and application to a multi-subject fMRI study of thermal pain.ape. The model is validated through simulations and application to a multi-subject fMRI study of thermal pain.
5/23/2014 1:00 - 2:00 PM
Babak Shahbaba, Ph.D.
Assistant Professor, Department of Statistics and Department of Computer Science, University of California, Irvine
Biography: Dr Shahbaba's research interest is related to developing new Bayesian methods and applying them to real-world problems. He is currently focusing on the following areas:
- Scalable Bayesian inference (fast MCMC methods that can be applied to large datasets)
- Developing new models that are sufficiently flexible and provide interpretable results
- Incorporating appropriate priors into statistical models in order to improve their performance
- Applying novel statistical methods to answer research questions in genetics, neuroscience, and cancer studies
5/7/2014 1:00 - 2:00 PM
Richard Olshen, Ph.D.
Professor and Chief Division of Biostatistics Department of Health Research and Policy Stanford University School of Medicine
When each subject in a study provides a vector of numbers/features for analysis, and one wants to standardize, then for each coordinate of the resulting rectangular array one may subtract the mean by subject and divide by the standard deviation by subject. Each feature then has mean 0 and standard deviation 1. Data from expression arrays and protein arrays often come as such rectangular arrays, where typically column denotes “subject” and the other some measure of “gene.” When analyzing these data one may ask that subjects and genes “be on the same footing.” Thus, there may be a need to standardize across rows and columns of the matrix. We investigate the convergence of a successive approach to standardization, which we learned from colleague Bradley Efron. Limit matrices exist on a Borel set of full measure; these limits have row and column means 0, row and column standard deviations 1. We study implementation on simulated data and data that arose in cardiology. The procedure can be shown not to work with simultaneous standardization. Results make contact with previous work on large deviations of Lipschitz functions of Gaussian vectors and with von Neumann’s algorithm for the distance between two closed, convex subsets of a Hilbert space. New insights regarding inference are enabled. Efforts are joint with colleague Bala Rajaratnam and have been helped by conversations with many others.
5/2/2014 1:00 - 2:00 PM
Danielle Harvey, Ph.D.
Associate Professor, Division of Biostatistics, Department of Public Health, University of California, Davis
Alzheimer’s disease (AD) is widespread in the elderly population and clinical trials are ongoing, focused on elderly individuals with AD or at apparent risk for AD, to identify drugs that will help with this disease. Well-chosen biomarkers have the potential to increase the efficiency of clinical trials and drug discovery and should show good precision as well as clinical validity. We propose measures that operationalize the criteria of interest and describe a general family of statistical techniques that can be used for inference-based comparisons of marker performance. The methods are applied to regional volumetric and cortical thickness measures quantified from repeat structural magnetic resonance imaging (MRI) over time of individuals with mild dementia and mild cognitive impairment enrolled in the Alzheimer’s Disease Neuroimaging Initiative. The methodology presented provides a standardized framework for comparison of biomarkers and will help in the search for the most promising biomarkers.
Biography: Dr Harvey received her BA cum laude in mathematics from Pomona College and her PhD in statistics from University of Chicago. Her methodological interests span survival analysis, correlated event times, informative censoring, repeated measures, computational methods, and high-dimensional data as in MRI or PET scans. Collaborative research interests include work on Alzheimer's, cancer, end-of-life care, dosing errors, and health services and public health issues.
4/2/2014 1:00 - 2:00 PM
Wesley K. Thompson, Ph.D.
Assistant Professor In-Residence, Department of Psychiatry, University of California, San Diego
Complex traits and disorders such as schizophrenia are multifactorial and associated with the effects of multiple genes in combination with environmental factors. These disorders often cluster in families, have no clear-cut pattern of inheritance, and have a high fraction of phenotypic variance attributable to genetic variance (high heritability). It is becoming increasingly clear that many genes influence most complex traits and disorders. In such a scenario with a very high number of risk genes (‘polygenic’), each gene has a tiny effect. This makes it difficult to determine an individual’s risk, and to identify disease mechanisms that can be used for development of new effective treatments.
Genome-wide association studies (GWAS) have identified many trait-associated single nucleotide polymorphisms (SNPs), but so far these explain only small portions of the heritability of complex disorders. This “missing heritability” has been attributed to a number of potential causes, including lack of typing of rare variants. However, it has been shown that a large proportion of the missing heritability is available within GWAS data when associations of SNPs are examined in aggregate. This implies the existence of numerous common variants with small genetic (‘polygenic’) effects. These effects cannot be reliably detected with traditional GWAS statistical methods given current sample sizes. Thus, there is a need for innovative statistical approaches to identify polygenetic effects and reduce the proportion of ‘missing heritability’.
In this talk I describe novel statistical tools that enhance gene discovery, improve replication rates of discovered risk gene variants, and improve estimation of polygenic risk scores. The basic framework relies on extensions of a Bayesian two-group mixture model (Efron, 2010) that assumes a large proportion of loci are either null (unassociated with the phenotype of interest) or have very small effects, but that a small proportion have larger (though still small) effect sizes. These models can incorporate a priori information regarding functional roles of SNPs or pleiotropic effects with multiple phenotypes. We demonstrate these methods on GWAS data from large Crohn's disease and Schizophrenia meta-analyses.
Biography: Dr. Thompson earned his Ph.D. in Statistics from Rutgers University in 2003, and his dissertation studies focused on the development of a Bayesian model for sparse functional data. He was appointed Assistant Professor of Statistics and Psychiatry at the University of Pittsburgh in 2005, and he collaborated with several senior investigators on clinical research studies on depression, sleep and sleep disorders, and physical illness across the lifespan. Dr. Thompson joined the UCSD Department in 2008 and he serves as the Director of Biostatistics at the Stein Institute for Research on Aging. Dr. Thompson’s research interests center on the adaptation and application of statistical models of a dynamic covariation of multiple functional processes in order to identify potentially causal relationships between brain function, depression, and physical health. This work is supported by a NIH Career Development Award that Dr. Thompson received in 2006. He is also interested in developing statistical models that may explain the underlying mechanisms of healthy cognitive aging.
3/5/2014 1:00 - 2:00 PM
Donald B. Rubin, Ph.D.
John L. Loeb Professor of Statistics, Department of Statistics, Havard University
Biography: Donald B. Rubin is John L. Loeb Professor of Statistics, Harvard University, where he has been professor since 1983, and Department Chair for 13 of those years. He has been elected to be a Fellow/Member/Honorary Member/Research Fellow of: the Woodrow Wilson Society, John Simon Guggenheim Memorial Foundation, IZA, IAB, Alexander von Humboldt Foundation, American Statistical Association, Institute of Mathematical Statistics, International Statistical Institute, American Association for the Advancement of Science, American Academy of Arts and Sciences, European Association of Methodology, British Academy, and the U.S. National Academy of Sciences. He has authored/coauthored nearly 400 publications (including ten books), has four joint patents, and he has made important contributions to statistical theory and methodology, particularly in causal inference, design and analysis of experiments and sample surveys, treatment of missing data, and Bayesian data analysis. Among his other awards and honors, Professor Rubin has received the Samuel S. Wilks Medal from the American Statistical Association, the Parzen Prize for Statistical Innovation, the Fisher Lectureship and the George W. Snedecor Award of the Committee of Presidents of Statistical Societies. He was named Statistician of the Year, American Statistical Association, Boston and Chicago Chapters. He has served on the editorial boards of many journals, including: Journal of Educational Statistics, Journal of American Statistical Association, Biometrika, Survey Methodology, and Statistica Sinica. Professor Rubin has been, for many years, one of the most highly cited authors in mathematics in the world (ISI Science Watch), as well as in economics (Highly Cited Economists), with approximately 140,000 citations, with nearly 30,000 so far in 2012 and 2013 (according to Google Scholar). For many decades he has given keynote lectures and short courses in the Americas, Europe, Australia and Asia. He has also received honorary doctorate degrees from Otto Friedrich University, Bamberg, Germany and the University of Ljubljana, Ljubljana, Slovenia, and held the Honorary Belle van Zuylen Chair in the Department of Methodology and Statistics at the University of Utrecht, the Netherlands in 2012 -2013.
APM 6402, Halkin Seminar Room
2/21/2014 3:30 - 4:30 PM
Ming-Wen An, PhD
Assistant Professor, Department of Mathematics, Vassar College, Poughkeepsie, NY
Biography: Ming-Wen An received her B.A. in mathematics from Carleton College and her Ph.D. in biostatistics from the Johns Hopkins Bloomberg School of Public Health. One of her research interests is in issues of study design for addressing missing data due to "loss to follow-up" (with applications to evaluating HIV treatment programs in Africa). She is also interested in cancer clinical trial methodology, specifically designs for validating biomarkers used in targeted therapy and identification of alternative endpoints for Phase II trials.
2/5/2014 1:00 - 2:00 PM
Jelena Bradic, PhD
Assistant Professor, Department of Mathematics, University of California, San Diego
Abstract: In this paper, we study sparse structured estimation in the context of the high-dimensional non-parametric Cox proportional hazard's model with a very general family of group penalties. We study the finite sample oracle risk bounds of such regularized estimator and develop new techniques to do so. Unlike the existing literature, we exemplify differences between bounded and possibly unbounded non-parametric covariate effects. In particular, we show that unbounded effects can lead to larger prediction bounds, compared to simple linear models, in situations where the true parameter is not necessarily sparse. Moreover, we propose a sequence of sparse non-convex group regularizations. Interestingly, we identify a specific regime of the proposed non-convex estimation that allows the group SCAD penalty and the group Lasso penalty to have equivalent prediction errors. Oracle prediction bounds are also discussed for the group $l_0$ penalty. Theoretical results for hierarchical and smoothed estimation in the non-parametric Cox model are also discussed as two examples of the proposed general framework.
Biography: Dr. Bradic received her Ph.D. in Operations Research and Financial Engineering from Princeton in Spring 2011 with a specialization in Statistics and Applied Probability under the direction of Jianqing Fan. Her research is in high dimensional statistics, stochastic optimization, asymptotic theory, robust statistics, functional genomics and biostatistics.
1/29/2014 1:00 - 2:00 PM
Robert Weiss, PhD
Professor, Department of Biostatistics, University of California, Los Angeles
Abstract: We develop a Dirichlet process mixture (DPM) model extension for regularly spaced longitudinal data. In longitudinal data, observations are both subject specific and a function of time. We account for both dependence between sampling densities across time and dependence in observations across time within the same subject. In the cluster memory Dirichlet process mixture (cmDPM) model, we use the inherent clustering properties of the DPM model to carry information from one time point to the next. Observations at baseline are modeled with a DPM. Cluster assignments at future time points depend on the previous assignment. Subjects may retain their cluster membership from the previous time point with nonzero probability. After baseline, given the previous time point, subjects are no longer exchangeable and their observed values depend on their previous clustering history. Clusters that are retained over time evolve through a time dependent process. There are several ways to look at the process including as a dynamic Markov Chinese Restaurant Process. We apply the cmDPM model to model annual tuberculosis (TB) incidence rates across 197 countries in the world from 1990-2010 and examine how the annual distribution of TB incidence rates has changed over time.
This is joint work with Yuda Zhu of Genentech.
11/06/2013 1:00 - 2:00 PM
Shujie Ma, PhD
Professor, University of California, Riverside
It has been a long history of utilizing interactions in regression analysis to investigate interactive effects of covariates on response variables. In this paper we aim to address two kinds of new challenges resulted from the inclusion of such high-order effects in the regression model for complex data. The first kind arises from a situation where interaction effects of individual covariates are weak but those of combined covariates are strong, and the other kind pertains to the presence of nonlinear interactive effects. Generalizing the single index coefficient regression model, we propose a new class of semiparametric models with varying index coefficients, which enables us to model and assess nonlinear interaction effects between grouped covariates on the response variable. As a result, most of the existing semiparametric regression models are special cases of our proposed models. We develop a numerically stable and computationally fast estimation procedure utilizing both profile least squares method and local fitting. We establish both estimation consistency and asymptotic normality for the proposed estimators of index coefficients as well as the oracle property for the nonparametric function estimator. In addition, a generalized likelihood ratio test is provided to test for the existence of interaction effects or the existence of nonlinear interaction effects. Our models and estimation methods are illustrated by both simulation studies and an analysis of body fat dataset.
10/08/2013 2:00 - 3:00 PM
Joe Romano, PhD
Professor, Stanford University
10/04/2013 3:00 - 4:00 PM
Daniel F. Heitjan, PhD
Professor, Department of Biostatistics and Epidemiology
Perelman School of Medicine
University of Pennsylvania, Philadelphia, PA
Randomized clinical trials often include one or more planned interim analyses, during which an external monitoring committee reviews the accumulated data and determines whether it is scientifically and ethically appropriate for the study to continue. With survival-time endpoints, it is often desirable to schedule the interim analyses at the times of occurrence of specified landmark events, such as the 50th event, the 100th event, and so on. Because the timing of such events is random, and the interim analyses impose considerable logistical burdens, it is worthwhile to predict the event times as accurately as possible. Prediction methods available prior to 2001 used data only from previous trials, which are often of questionable relevance to the trial for which one wishes to make predictions. With modern data management systems it is often feasible to use data from the trial itself to make these predictions, rendering them far more reliable. This talk will describe work that some colleagues and students and I have done in this area. I will set the methodologic development in the context of the trial that motivated our work: REMATCH, a randomized clinical trial of a heart assist device that ran from 1998 to 2001 and was considered one of the most rigorous and expensive device trials ever conducted.
09/09/2013 1:00 - 2:00 PM
Tommi Gaines, DrPH
Division of Global Public Health
Department of Medicine
The Mexico-U.S. border region is home to an evolving HIV epidemic among vulnerable groups such as injection drug users and female sex workers. Features of one’s environment have been associated with individual health and therefore our objective is to highlight statistical and geographical techniques that examine HIV and risk-related behaviors. We describe the use of geographic information systems (GIS) data to map the location of sex work venues from epidemiologic studies conducted in Tijuana, Mexico and the application of statistical models to empirically assess the role of geography in shaping HIV and other sexually transmitted infections. We discuss the importance of combining statistical methods with GIS data to inform prevention and support services.
06/19/2013 1:00 - 2:00 PM
Lin Liu, PhD
Division of Biostatistics and Bioinformatics
Department of Family Medicine and Public Health
In dose-response studies, one of the most important issues is the identification of minimum effective dose (MED), where the MED is defined as the lowest dose such that the mean response is better than the mean response of a zero-dose control by a clinically significant difference. Dose-response curves are sometimes monotonic in nature. A union-intersection type of likelihood ratio test is proposed. One-sided lower confidence bounds can be inverted from the test to detect the differences between the dose-response means and a control mean. The evaluation of the lower confidence bounds is a concave programming problem subject to homogeneous linear inequality constraints. An efficient computing algorithm is proposed. A real data example from a dose-response study is used to illustrate the method.
06/05/2013 1:00 - 2:00 PM
Jaroslaw Harezlak, PhD
Assistant Professor, Department of Biostatistics
Fairbanks School of Public Health and School of Medicine
Indiana University, Indianapolis, IN
Collection of functional data has vastly grown in the past decade, including functional data collected longitudinally. For example, in the HIV Neuroimaging Consortium (HIVNC) study, metabolite spectra were obtained using magnetic resonance spectroscopy (MRS) from multiple brain regions at a number of study time points. Analysis of such data usually follows a two-step procedure: (1) metabolite concentration extraction and (2) association study of extracted features and outcome of interest.
Our approach does not rely on this frequently unreliable feature extraction. Instead, it incorporates prior scientific knowledge to estimate regression function associating the whole functional profile with the outcome without explicitly extracting the feature characteristics. Specifically, we propose a method for functional linear model estimation using partially empirical eigenvectors for regression (PEER) in the longitudinal data setting. Our method allows the regression function to vary across both time and space. We derive the estimator's statistical properties and discuss their connections to the generalized singular value decomposition (GSVD). The results of the simulation studies and an application to the analysis of HIV patients' neurocognitive impairment as a function of the metabolite profiles are presented.
Joint work with Madan G. Kundu and Timothy W. Randolph
05/08/2013 1:00 - 2:00 PM
Hulin Wu, PhD
Dean’s Professor, Department of Biostatistics and Computational Biology
Director, Center for Integrative Bioinformatics and Experimental Mathematics
University of Rochester School of Medicine and Dentistry
Many systems in engineering and physics can be represented by differential equations, which can be derived from well-established physics laws and theories. However, currently no laws or theories exist to deduce exact quantitative relationships and interactions among the huge amount of elements at different levels in a biological system. It is unclear whether the biological systems follow a mathematical representation such as differential equations, similar to that for a man-made physics or engineering system. Fortunately, recent advances in cutting-edge biomedical technologies allow us to generate intensive high-throughput data to gain insights into biological systems. It is badly needed to develop statistical methods and bioinformatics approaches to test whether a biological system follows a mathematical representation based on experimental data so that quantitative predictions can be made for biomedical interventions in a biological system. In this talk, I will present and discuss how to construct data-driven differential equations (ODE) to describe biological systems, in particular for dynamic gene regulatory network systems. We propose to combine the high-dimensional variable selection approaches and ODE model estimation methods to construct the high-dimensional ODE models based on experimental data. We apply the proposed approaches to study how our immune system responds to influenza infections and vaccination based on the time course high-throughput experimental data.
05/01/2013 2:30 - 3:30 PM
Andrew Zhou, PhD
Professor, Department of Biostatistics, University of Washington Director
Research Career Scientist, Biostatistics Unit, VA Puget Sound Health Care System
The rising cost of health care is one of the most important problems facing the United States. Accurately predicting such costs is an important first step in addressing this problem. However, due to some special distributional features of health care costs, including high skewness, presence of excessive zero values, and heteroscedasticity, it is difficult to obtain an accurate prediction of future health care costs of patients.
In this talk, I will describe some new models for using covariates to predict the future health care costs of patients. These new models include: (1) a parametric heteroscedastic transformation model, (2) a semi-parametric two-part heteroscedastic transformation model, (3) a quantile regression model, (4) a non-parametric heteroscedastic transformation regression model, and (4) a semi-parametric two-part mixed-effects heteroscedastic transformation model.
04/15/2013 1:00 - 2:00 PM
Karen Messer, PhD
Professor, Family Medicine and Public Health
Director, Moores UCSD Cancer Center Biostatistics/Bioinformatics shared resource
As a biostatistician, one aims to support high-quality inference from experimental or observational data across a wide variety of scientific settings. To this sometimes bewildering array, the discipline of statistics brings a unifying set of tools and objectives which can help sort out what one knows with high confidence, with low confidence, and most especially, not at all. Although the approaches to sound inference may differ with the number of subjects (n- big or small) and the number of variables (p- small or big), the principles of control of Type I error, modeling sources of bias and variation, and quantifying the limits of statistical power provide a helpful framework for a variety of problems. In this talk, I will give examples of approaches to statistical inference from three areas of my work in cancer biostatistics: early phase trial design (small n, small p), prognostic modeling for survival (big n, medium p), and analysis of next generation sequencing data (small n, big p). In the first two topics, some recent approaches to older problems will be presented and in the third, traditional tools will be applied to modern data.
03/08/2013 1:00 - 2:00 PM
Vineet Bafna, PhD
Professor in the Department of Computer Science at UCSD and in the Bioinformatics PhD program. His research area is Bioinformatics, with a focus on Genomics and Proteomics.
Cancer genomes are marked by genomic instability and massive rearrangements. Recently, many exotic mechanisms have been proposed as mechanistic explanations for these rearrangements. For example, the breakage-fusion-bridge (BFB) mechanism, proposed over seven decades ago, has seen renewed interest as a source of genomic variability and gene amplification in cancer. Here, we formally model and analyze the BFB mechanism, the first rigorous formulation of the mechanism. Using this model, we show that BFB can achieve a surprisingly broad range of amplification patterns, and describe efficient combinatorial algorithms to characterize patterns consistent with BFB. An extensive analysis of simulated, cell-line, and primary tumor data reveals the existence of BFB. Our results also suggest that BFB may be hard to detect under heterogeneity and polyploidy.
As a second example, the model of chromothripsis--extensive shattering followed by regrouping of small parts of a chromosome-- has been proposed to explain the extensive rearrangements seen in some tumors. Time remaining, we will critique this model using 3 different lines of evidence.
(joint work with Shay Zakov, and Marcus Kinsella).
Medical Teaching Facility, Room 175, UCSD School of Medicine
03/06/2013 1:00 - 2:00 PM
Daniel Gillen, PhD
Associate Professor, Department of Statistics, University of California, Irvine
Researchers frequently elect to evaluate new therapies on the basis of patient survival. For example, clinicians might consider five-year survival when investigating drugs developed for use in childhood cancer, or 28-day survival when investigating the treatment of sepsis in patients suffering traumatic injury. Both of these examples focus on patient responses over a fixed period of time. However, for ethical reasons it is common for data to be periodically analyzed for early indications of efficacy, futility, or harm. In the case of censored survival data, inference is typically based upon a semiparametric model assuming a time-invariant treatment effect and standard group sequential methodology is used to generate multiple criteria for guiding the decision of whether a trial should be stopped early given the observed data. However, it is often the case that a given treatment might have a delayed effect within individuals or that the effect of treatment might dissipate over time. Special issues arise in such settings, mostly due to the dependence of results on the censoring distribution observed in the trial. In this talk, we discuss general issues associated with the sequential testing of a survival endpoint. Specific attention is given to the uncertainty of future observations under a potentially time-varying treatment effect. In this case we propose a method of imputation of future treatment effects based on random walks, which assumes minimally informative Bayesian prior distributions on the smoothness of survival of each comparison group. Imputation of future survival differences is carried out using standard Bayesian predictive distributions, thereby allowing for quantification of uncertainty in future treatment differences.
02/06/2013 1:00 - 2:00 PM
Victor DeGruttola, Sc.D.
Professor and Chair, Department of Biostatistics, Harvard School of Public Health
The UC San Diego Center for AIDS Research and AIDS Research Institute are pleased to present Victor
DeGruttola, Sc.D.. Dr. DeGruttola will discuss the quantitative challenges in advancing the HIV prevention research agenda.
01/13/2013 4:00 - 5:00 PM
Lawrence Lin, PhD
Dr. Lawrence I. Lin has recently retired after 33 years of distinguished tenure at Baxter International Inc. He is a Principal Consultant at JBS Consulting Services. He is an Adjunct Professor in the Department of Mathematics, Statistics, and Computer Science, University of Illinois at Chicago. Dr. Lin is a Fellow of the American Statistical Association, and an elected member of the International Statistical Institute. He has served on as a referee of many international journals.
This will be a general overview presentation with practical examples and without much statistical formulas. We will introduce the concepts of un-scaled and scaled agreement statistics based on the basic case between two raters with paired samples for continuous, binary, and ordinal data. We will then progress into more complex cases when we have multiple raters and each rater has multiple readings per sample. Here, we can assess intra-rater and inter-rater agreement, compare inter-rater deviation to intra-rater deviation, and compare precision of a rater against another. We will explore the meaning of the two-stage criteria presented in the FDA guidance UCM070244: Statistical Approaches to Establishing Bioequivalence. The content is largely based on the materials presented in the newly published book by Springer, entitled “Statistical Tools for Assessing Agreement”.
Leichtag Building, Room 205
11/14/2012 1:00 - 2:00 PM
Anthony Gamst, Ph.D
Division of Biostatistics and Bioinformatics, Department of Family Medicine and Public Health, UCSD
Dr. Gamst is a Professor in the Division of Biostatistics and Bioinformatics, Department of Family Medicine and Public Health at UCSD. He is the author of over 90 papers in various areas of biostatistical applications and methodology, with predilection in imaging analysis and Alzheimer's disease.
Models with large numbers of nuisance parameters are common in modern statistics, having applications in laboratory medicine, genomics, clinical trials, medical imaging, epidemiology, and many other areas. Classical techniques, including Bayes and Maximum Likelihood, tend to produce sub-optimal or even inconsistent estimates of the parameters of interest in these models, when naively applied, while approximately unbiased estimating equations work rather generally. We study several such models, identify the sources of bias and spurious correlation which lead to inconsistency or sub-optimality, and compute the minimal smoothness required for the existence of root-n consistent (and efficient) parameter estimates. We also examine simultaneous estimation of nuisance parameters and parameters of interest. The results of the study are related to every-day practice, particularly to the fitting of regression models with many predictors, and some heuristics are given.
04/18/2012 1:00 - 2:00 PM
Kim-Anh Do, Ph.D
Department of Biostatistics, The University of Texas MD Anderson Cancer Center, Houston
Early detection is critical in disease control and prevention. The long term translational research goal is that if drugs can be targeted to specific tissues in the body, then dosage can be altered to achieve the desired effect while minimizing side effects such as toxicity. Motivated by specific problems involving high throughput data in the form of phage peptides, we have developed nonparametric and semiparametric mixture models for real-time analysis in the context of correlated phage experiments. Our main focus is to address the multiplicity issue automatically by incorporating a false discovery rate or utility function. We will highlight direct applications of both frequentist and Bayesian methods to cancer research challenges that address our long term translational goal. Specifically, the developed statistical methodology can assist in isolating ligand peptides and identify their corresponding tissue-specific receptors in rodent models and in patients, including discovery and validation of a ligand-receptor tumor targeting system in human metastatic prostate cancer.
MET-MedEd 120.27 (MedEd Dean's Conference Room, new Telemedicine building)
04/06/2012 3:00 - 4:00 PM
Jong-Hyeon Jeong, PhD
Department of Biostatistics, University of Pittsburgh
The hazard function is a popular summary measure of time-to-event or survival data from medical studies. However, translation of the study results based on the hazard function might not be straightforward for the stakeholders like patients and physicians. Therefore, consideration of the remaining life years to events of interest might be more useful. In time-to-event data, the issue of competing risks is often encountered, whenever the events of interest are precluded from being observed, due to some competing events. In this talk, statistical methods that recently have been developed to infer quantile residual life under competing risks will be presented. Some issues to be overcome for further generalization of the proposed methods will be also discussed. The proposed methods will be illustrated with a real dataset from a phase III clinical study on breast cancer with a long-term follow-up of more than 30 years.
BSB Dean’s Conference Room, UCSD SOM Campus
03/23/2012 11:00 AM - 12:00 PM
Richard Levine, PhD
Professor and Chair, Department of Mathematics and Statistics, San Diego State University
Tooth loss from periodontal disease or dental caries (decay) afflicts most adults over the course of their lives. Survival tree methods for correlated observations have shown potential for developing objective tooth prognosis systems, however the current technology suffers either from prohibitive computational expense or unrealistic simplifying assumptions to overcome computational demands. In this talk Bayesian tree methods are developed for correlated survival data, relying on a computationally feasible, yet flexible, frailty model with piecewise constant hazard function. Bayesian stochastic search methods, using a Laplace approximated marginal likelihood, are detailed for tree construction and posterior ensemble averaged variable importance ranking and amalgamation procedures are developed to identify indicators of tooth prognostic groups from a forest of trees. The proposed methods are used to assign each tooth from the VA Dental Longitudinal Study to one of five prognosis categories and evaluate the effects of clinical factors and genetic polymorphisms in predicting tooth loss. The prognostic rules established may be used in clinical practice to optimize tooth retention and devise periodontal treatment plans.
MTF 175, UCSD SOM Campus
03/21/2012 1:00-2:00 PM
Chuck Berry, PhD
Chuck Berry is Professor Emeritus and Interim Division Chief of the Division of Biostatistics and Bioinformatics, Department of Family Medicine and Public Health at UCSD. He authored over 180 papers on the methodology and applications of Biostatistics in Medical Sciences, and he is actively involved in several research projects, with a particular emphasis on statistical genetics.
The relative abundance of retroviral insertions in a host genome is important in understanding the persistence and pathogenesis of both natural retroviral infections and retroviral gene therapy vectors. When host genomic DNA is randomly broken via sonication and then amplified, amplicons of varying lengths are produced. A likelihood function is proposed for these lengths along with a hybrid Expectation-Maximization algorithm. Patient data illustrate the method and simulations show that relative abundance can be estimated with little bias, but that variation in highly abundant sites can be large. Reference: Charles C. Berry, Nicolas A. Gillet, Anat Melamed, Niall Gormley, Charles R.M Bangham, and Frederic Bushman Estimating Abundances of Retroviral Insertion Sites from DNA Fragment Length Data. Bioinformatics. first published online January 11, 2012
Leichtag 2A05, UCSD SOM Campus
03/07/2012 1:00-2:00 PM
Charles E. McCulloch, PhD
Professor and Head, Division of Biostatistics, Dept. of Epidemiology and Biostatistics, University of California at San Francisco
Joint work with John M. Neuhaus
Statistical models that include random effects are commonly used to analyze longitudinal and correlated data, often with strong and parametric assumptions about the random effects distribution. There is marked disagreement in the literature as to whether such parametric assumptions are important or innocuous. In the context of generalized linear mixed models used to analyze clustered or longitudinal data, we examine the impact of random effects distribution misspecification on a variety of inferences, including prediction, inference about covariate effects, prediction of random effects, and estimation of random effects variances. We describe examples, theoretical calculations, and simulations to elucidate situations in which the specification is and is not important. A key conclusion is the large degree of robustness of maximum likelihood for a wide variety of commonly encountered situations.
06/29/2011 04:00 PM
George Casella, PhD
University of Florida
Data obtained describing terrorist events are particularly difficult to analyze, due to the many problems associated with the both the data collection process, the inherent variability in the data itself, and the usually poor level of measurement coming from observing political actors that seek not to provide reliable data on their activities. Thus, there is a need for sophisticated modeling to obtain reasonable inferences from these data. Here we develop a logistic random effects specification using a Dirichlet process to model the random effects. We first look at how such a model can best be implemented, and then we use the model to analyze terrorism data. We see that the richer Dirichlet process random effects model, as compared to a normal random effects model, is able to remove more of the underlying variability from the data, uncovering latent information that would not otherwise have been revealed.
05/23/2011 02:00 PM
The Journal Clubs are the second and fourth Fridays at 3 pm in Moores Cancer Center Room 3079
01/10/14: Loki Natarajan will be presenting: George Michailidis Statistical Challenges in Biological Networks (2012) Comp Graph Stat, 21:4, 840-855.
01/24/14: Rintaro Saito will be presenting: Chuang HY, Lee E, Liu YT, Lee D, Ideker T. (2007) Network-based classification of breast cancer metastasis. Mol Syst Biol. 3:140.
02/14/14: Minya Pu will be presenting: Caiyan Li and Hongzhe Li (2008) Network-constrained regularization and variable selection for analysis of genomic data. Bioinformatics, 24(9): 1175-1182.
For more information please contact Loki Natarajan or