Probabilistic Projection of Carbon Emissions

Adrian Raftery, PhD

Abstract: The Intergovernmental Panel on Climate Change (IPCC) recently published climate change projections to 2100, giving likely ranges of global temperature increase for each of four possible scenarios for population, economic growth and carbon use. We develop a probabilistic forecast of carbon emissions to 2100, using a country-specific version of Kaya's identity, which expresses carbon emissions as a product of population, GDP per capita and carbon intensity (carbon per unit of GDP). We use the UN's probabilistic population projections for all countries, based on methods from our group, and develop a joint Bayesian hierarchical model for GDP per capita and carbon intensity in most countries. In contrast with opinion-based scenarios, our findings are statistically based using data for 1960–2010. We find that our likely range (90% interval) for cumulative carbon emissions to 2100 includes the IPCC's two middle scenarios but not the lowest or highest ones. We combine our results with the ensemble of climate models used by the IPCC to obtain a predictive distribution of global temperature increase to 2100. This is joint work with Dargan Frierson (UW Atmospheric Science), Richard Startz (UCSB Economics), Alec Zimmer (Upstart), and Peiran Liu (UW Statistics).

Bio: Dr. Adrian E. Raftery is a Professor of Statistics and Sociology at the University of Washington. He develops new statistical methods for problems in the social, environmental and health sciences. An elected member of the U.S. National Academy of Sciences, he was identified as the world's most cited researcher in mathematics for the decade 1995-2005 by Thomson-ISI. He has supervised 29 Ph.D. graduates, of whom 21 hold or have held tenure-track university faculty positions.

12/05/2018 1:00 PM – 2:00 PM

Liebow Auditorium in BSB (Biomedical Sciences Building)

Rank Based Inference for Clustered Data in Presence of Informative Intra-Cluster Group Size

Sandipan Dutta, PhD

Abstract: Rank based methods are useful non-parametric approaches of inference. Among them, the Wilcoxon rank-sum test is a popular nonparametric test for comparing two groups of independent observations. But, in many situations, observations may be correlated making the assumptions of the Wilcoxon rank-sum test invalid. One such scenario is a clustered data setting. In recent years, there have been renewed attempts in extending the Wilcoxon rank sum test for clustered data. However, in a clustered data setting, we are often faced with a situation where the group specific marginal distribution in a cluster depends on the number of observations in that group (i.e., the intra-cluster group size). In my presentation, we would talk about a novel extension of the rank-sum test for handling this complex situation. Moreover, we would also consider the scenario where the comparison of marginal group-specific distributions may not be enough in presence of some potentially useful covariables, such that ignoring the effects of these covariates can lead to incorrect inference for the group comparisons. To address this problem, we would discuss how to modify the rank-sum test into a covariate adjusted rank test by estimating the covariate effects through rank based estimation. We would demonstrate the utility of our proposed methods through a number of simulation studies and real-life datasets.   

Bio: Dr. Sandipan Dutta is a postdoctoral research associate in the Department of Biostatistics and Bioinformatics at Duke University working in the research group of Professor Susan Halabi.  He has obtained his Ph.D. in Biostatistics from the Department of Bioinformatics and Biostatistics at the University of Louisville under the supervision of Professor Somnath Datta. Prior to this, he obtained his masters in statistics from the Indian Institute of Technology Kanpur (IIT Kanpur), and his bachelors in statistics from the University of Calcutta, India. His research interests include developing methods for rank based inference of clustered data with complex features, time-to event analysis, censored data regression, prognostic modeling and identification of important biomarkers for clinical cancer data.  His research works have been published in journals such as Biometrics, Statistics in Medicine, Journal of Statistical Computation and Simulation, Journal of Clinical Oncology among others.

12/4/18 10:00-11:00 AM

MTF 168


Data versus Belief: Statistical Studies of Psychic Abilities

Jessica Utts, PhD

Abstract: After many years of investigating data in parapsychology (the study of possible psychic abilities), I have observed that belief and anecdotes often are given higher priority than data when people formulate conclusions about the possible existence of psychic phenomena. Unlike traditional frequentist methods of statistical inference, Bayesian methods allow the combination of data and prior beliefs to reach conclusions. Thus, the data from parapsychology provide a good example for comparing frequentist and Bayesian methods of making conclusions based on evidence. In this talk, I will present some of the data from experiments in parapsychology, and analyze it using both frequentist and Bayesian methods, illustrating how strong prior beliefs can be incorporated when we consider whether decision-makers will pay attention to data. This domain provides a good illustration of how Bayesian methods can be used in a real world setting, and how they allow people to disagree even when presented with a large amount of data. The talk concludes with an argument for why researchers in all areas should pay attention to the results of these experiments.

Bio: Dr. Jessica Utts is a Professor Emerita in the Department of Statistics at the University of California, Irvine, and was the 2016 President of the American Statistical Association. She received her PhD in Statistics from Penn State University and was a founding member of the UC Davis Statistics Department, where she served on the faculty there for many years before coming to UC Irvine. She has a long-standing interest in promoting statistical literacy, and has published three statistics textbooks with that emphasis. She has been actively involved in writing and grading the AP Statistics exam since its inception in 1997, and has recently completed 5 years as the Chief Reader for the exam. In addition to statistics education her research involves applications of statistics to a variety of areas, most notably parapsychology, for which she has appeared on TV shows including Larry King Live, ABC Nightline and CNN Morning News.

11/28/18 1:00-2:00 PM

MET 243


Modern multivariate response regression with applications in genomics 

Aaron Molstad, PhD

Abstract: In this two part talk, we will present new methods for fitting the multivariate response linear regression model motivated by applications in statistical genomics. In the first part, we introduce a new parameterization of the multivariate response linear regression model which we motivate through an "error-in-variables" data generating model.  We propose a novel non-convex weighted residual sum of squares criterion which exploits this parameterization and admits a new class of penalized estimators. The optimization is solved with a proximal gradient descent algorithm. We use our method to study the association between copy number variations and gene expression in patients with glimoblastoma mulitforme, an aggressive brain cancer, collected by TCGA. In the second part of the talk, we propose a new multivariate response linear regression method for cross-tissue expression quantitative trait loci (eQTL) mapping in Genotype-Tissue Expression project (GTEx) data. Our method exploits that gene expression is dependent across tissue types and that eQTLs are often shared amongst multiple tissues. We propose a penalized maximum likelihood estimator and derive an efficient expectation-conditional-maximization algorithm for its computation. Our analysis of the GTEx data shows that our method can improve eQTL mapping substantially compared to methods which model expression separately tissue-by-tissue. 

Bio: Dr. Aaron J. Molstad is a postdoctoral research fellow in the Biostatistics Program at the Fred Hutchinson Cancer Research Center in Seattle, WA.  Previously, he received his Ph.D. from the School of Statistics at the University of Minnesota. Dr. Molstad's primary research interests are in multivariate analysis and numerical optimization, with an emphasis on developing model-based methods and open source software for statistical genetics and genomics. His past work has developed new methods for precision (inverse covariance) matrix estimation, classification with matrix and tensor-valued data, high-dimensional multivariate response linear regression, and survival analysis. 

11/27/18 10:00- 11:00 AM

MTF 168


Interaction Feature Screening for Ultrahigh Dimensional Data

Guifang Fu, PhD

Abstract: Big data with ultrahigh dimensions has become increasingly important in diverse scientific fields. For example, genome-wide association studies identify disease susceptibility loci by screening over half a million single-nucleotide polymorphisms (SNPs). Clinical study findings imply that complex diseases are very likely regulated by interactions among multiple genes (i.e., epistasis) rather than by one genetic variant within a single gene. However, selecting important interaction effects from an ultrahigh dimension of features is extremely challenging in terms of computational feasibility and statistical accuracy. Existing statistical approaches are either focused on marginal effects, have proven to be highly inaccurate for scenarios involving strong interactive but weak marginal effects, or are computationally infeasible for big data. In this presentation, I introduce a novel interaction screening procedure based on the joint cumulant correlation (JCM-SIS). The implementation of JCM-SIS does not require model specification or data type restriction for responses or predictors. We have performed four simulations under various conditions to comprehensively demonstrate that JCM-SIS is empirically accurate, robust, and computationally viable for features in ultrahigh dimensional space. Numerical comparison indicates that JCM-SIS performs much better in a number of settings than two existing feature screening approaches. We successfully apply JCM-SIS to detect two-way interactions for 731,442 SNPs, a computational feat unprecedented in current literature. Additionally, we prove that JCM-SIS is theoretically sound and possesses strong sure screening consistency. In the discussion section, I will conclude by listing multiple future collaboration opportunities related to functional data analysis. 

Bio: Dr. Guifang Fu is an Assistant Professor in the Department of Mathematics and Statistics at the Utah State University. She received her Ph.D. in Statistics from Pennsylvania State University.  Her research has a dual focus in theoretical statistics methodologies and applied & computational statistics. She has led an independent, extramurally funded and nationally competitive research program while also actively collaborating on several multidisciplinary research teams. She specializes in developing state-of-the-art statistical methods to extract knowledge from data, advance the statistical theories that underlie these methods, and solve data-driven problems inspired by practical applications. She is highly interested in applying her methodologies to leading biomedical collaborations.

MTF 168 (Medical Teaching Facility)

11/19/18 10:00-11:00am


Propensity Score Weighting for Causal Inference with Multiple Treatments

Frank (Fan) Li

Abstract: Unconfounded comparisons of multiple groups are common in observational studies. Motivated from (1) an observational study comparing three medications (causal comparison) and (2) a racial disparity study (unconfounded descriptive comparison), we propose a unified framework, the balancing weights, for estimating causal effects with multiple treatments using propensity score weighting. These weights incorporate the generalized propensity score to balance the weighted covariate distribution of each treatment group, all weighted toward a common pre-specified target population. The class of balancing weights include several existing approaches such as inverse probability weights and trimming weights as special cases. Within this framework, we propose a class of target estimands based on linear contrasts and their corresponding nonparametric weighting estimators. We further propose the generalized overlap weights, constructed as the product of the inverse probability weights and the harmonic mean of the generalized propensity scores. The generalized overlap weights correspond to the target population with the most overlap in covariates between treatments, similar to the population in equipoise in randomized clinical trials. These weights are bounded and thus bypass the problem of extreme propensities. We show that the generalized overlap weights minimize the total asymptotic variance of the nonparametric estimators for the pairwise contrasts within the class of balancing weights. We consider two balance check criteria and propose a new sandwich variance estimator for estimating the causal effects with generalized overlap weights. We apply these methods to (1) study the causal effect of three anti-coagulants on patient's mortality and (2) to estimate the racial disparities in medical expenditure. The operating characteristics of the new weighing method is further illustrated by simulations. 

Bio: Frank (Fan) Li is a Ph.D. candidate in the Department of Biostatistics and Bioinformatics at Duke University, and a student affiliate at the Duke Clinical Research Institute. His primary research interests include causal inference methods applied to observational studies and pragmatic trials. In his doctoral dissertation, he studied propensity score methods for difference-in-differences and multiple treatments. He is also an active member of the Biostatistics and Study Design core in the NIH Collaboratory of Pragmatic Clinical Trials, established to oversee the statistical issues of ongoing demonstration projects.

MET 315 

11/15/18 1:00-2:00 PM


Surveys and Big Data for Estimating Brand Lift

Tim Hesterberg, PhD

Abstract: Google Brand Lift Surveys estimates the effect of display advertising using surveys. Challenges include imperfect A/B experiments, response and solicitation bias, discrepancy between intended and actual treatment, comparing treatment group users who took an action with control users who might have acted, and estimation for different slices of the population. We approach these issues using a combination of individual-study analysis and meta-analysis across thousands of studies. This work involves a combination of small and large data - survey responses and logs data, respectively.

There are a number of interesting and even surprising methodological twists. We use regression to handle imperfect A/B experiments and response and solicitation biases; we find regression to be more stable than propensity methods. We use a particular form of regularization that combines advantages of L1 regularization (better predictions) and L2 (smoothness). We use a variety of slicing methods, that estimate either incremental or non-incremental effects of covariates like age and gender that may be correlated. We bootstrap to obtain standard errors. In contrast to many regression settings, where one may either resample observations or fix X and resample Y, here only resampling observations is appropriate.

Bio: Dr. Tim Hesterberg is a Senior Statistician at Google. That means old. Before that he attempted jobs at an electric utility, in academia, and in software. He received his Ph.D. in Statistics from Stanford University, where he played a lot of volleyball. He wrote Mathematical Statistics with Resampling and R; he helped write the ASA Guidelines for Undergraduate Statistics Programs, so he could tell teachers how to teach. Now he'll tell you how to analyze data!

MET Lower Auditorium
11/07/2018 1:00 PM – 2:00 PM

Bayesian Approaches to Dynamic Model Selection

Michele Guindani, PhD

Abstract: In many applications, investigators monitor processes that vary in space and time, with the goal of identifying temporally persistent and spatially localized departures from a baseline or “normal" behavior. In this talk, I will first discuss a principled Bayesian approach for estimating time varying functional connectivity networks from brain fMRI data. Dynamic functional connectivity, i.e., the study of how interactions among brain regions change dynamically over the course of an fMRI experiment, has recently received wide interest in the neuroimaging literature. Our method utilizes a hidden Markov model for classification of latent neurological states, achieving estimation of the connectivity networks in an integrated framework that borrows strength over the entire time course of the experiment. Furthermore, we assume that the graph structures, which define the connectivity states at each time point, are related within a super-graph, to encourage the selection of the same edges among related graphs. Then, I will propose a Bayesian nonparametric model selection approach with an application to the monitoring of pneumonia and influenza (P&I) mortality, to detect influenza outbreaks in the continental United States. More specifically, we introduce a zero-inflated conditionally identically distributed species sampling prior which allows borrowing information across time and to assign data to clusters associated to either a null or an alternate process. Spatial dependences are accounted for by means of a Markov random field prior, which allows to inform the selection based on inferences conducted at nearby locations. We show how the proposed modeling framework performs in an application to the P&I mortality data and in a simulation study, and compare with common threshold methods for detecting outbreaks over time, with more recent Markov switching.

Bio: Dr. Guindani is a Professor in the Department of Statistics, University of California, Irvine. Before joining UCI, he has held faculty positions in the Department of Biostatistics, University of Texas MD Anderson Cancer Center and the Department of Mathematics and Statistics at the University of New Mexico. He has received his Ph.D. in Statistics from Università Bocconi, Milan, Italy in Spring 2005. He is currently a Co-Editor for Bayesian Analysis, the official journal of the International Society for Bayesian Analysis (ISBA) and he has been nominated as Editor-in-Chief of the same journal from January 2019 to December 2021. He is also an Associate Editor for Biometrics.

MTF 168
10/03/18 1:00 PM- 2:00 PM


Digging into the Biology Complex Traits

Rany Salem, PhD

Abstract: The emergence of Genome-wide association studies (GWAS) ushered a paradigm shift in human genetics and genetic epidemiology research in terms of the study designs and methodologies researchers could exploit to detect genetic factors contributing to a complex traits and disease. GWAS have allowed investigators to probe the genetic architecture and identify loci associated with thousands of human traits and diseases. In this talk, I will provide an overview of the lab’s research focus and describe results of a GWAS study of Diabetic Kidney Disease to illustrate the utility, challenges and limitations of such studies. Next, potentials solutions to these limitations are presented, including leveraging individual GWAS data available in central biorepositories (e.g. dbGaP) and GWAS summary statistics. I will also discuss recent work to predict weight gain using metabolomics data. Finally, I will briefly describe current limitations, opportunities and new directions in human genetics and genetic epidemiology.

Bio: Dr. Rany Salem is an Assistant Professor in Department of Family Medicine and Public Health. His research interests focus on application of statistical and epidemiological methods to understand the genetic architecture of complex traits and disease, including diabetes and diabetic complications, metabolic syndrome, cardiovascular and renal disease, and anthropometric and growth traits. He is interested in methodological questions in human genetics and leverage publically available datasets to explore genetics questions at scale.
He received his Ph.D. in Public Health from the UC San Diego/SDSU Joint Doctoral Program in 2009 under the mentorship of Drs. Nik Schork and Dan O’Connor. During his doctoral studies, he completed the UC San Diego Genetics Training Program in 2007. After receiving his Ph.D., he worked first as a postdoctoral fellow and then as a Senior Research Fellow at the Broad Institute, Boston Children’s Hospital, Harvard Medical School, where he focused on genetic epidemiology studies with an emphasis on statistical genetic methodology and analysis of large epidemiologic datasets. During his postdoc training, he was awarded an NIH K99 Pathway to Independence Award from NHLBI.

MTF 168
06/06/2018 1:00 PM – 2:00 PM

Meta-Analysis of Odds Ratios With Missing Counts Estimated using Kaplan-Meier Curves

Shemra Rizzo, PhD

Abstract: A typical random effects meta-analysis of odds-ratios assumes binomially distributed numbers of events in a treatment and control group and requires the number of events (i.e. deaths) and non-events (i.e. survivors) to be extracted from published papers. These data are often not available in the publications due to loss to follow-up. When the Kaplan-Meier (KM) survival plot is available, it is common practice to extract the survival probability from the plot and multiply it by the baseline sample size to infer the number of deaths and survivors. The naive approach to meta-analysis introduces these estimates as real extracted data; the results are hence over-certain and potentially inaccurate. Furthermore, accounting for the uncertainty introduced from these calculations is difficult as KM curves are typically published without variance information. We propose a model to incorporate the uncertainty associated with the estimation of the missing counts that uses summary statistics for the follow-up times. Furthermore, accounting for the uncertainty of the estimation is equivalent to a reduction of each study's sample size. A simulation study shows that our model outperforms the naive approach in terms of the coverage of the 95% confidence interval. We use real and simulated data to illustrate our model.

Bio: Dr. Shemra Rizzo is assistant professor at UC Riverside and currently serving as Vice-President of Academic Affairs for the Southern California Chapter of the American Statistical Association. She obtained her masters degree in statistics and operations research from the University of North Carolina - Chapel Hill and her PhD in biostatistics from UCLA.
If you would like invite others who are interested in the seminar series, please contact Melody Bazyar

BRF (Biomedical Research Facility) 1104
05/02/2018 1:00 PM – 2:00 PM

Estimating Network Properties: Applications to Sexual History Data

Ravi Goyal, PhD

Abstract: Analysis of sexual history data intended to describe sexual networks presents many challenges arising from the fact that most surveys collect information on only a very small fraction of the population of interest. In addition partners are rarely identified and responses are subject to reporting biases. Typically each network property of interest, such as mean number of sexual partners for males or females, is estimated independently of other network properties. There is, however, a complex relationship among networks properties; and knowledge of these relationships can aid in addressing concerns mentioned above.

This talk will present a method that leverages the relationships among network properties when making inferences about network features of interest. The method ensures that inference on network properties is compatible with an actual network. The talk will present simulation results which demonstrate that use of this method can improve estimates in settings where there is uncertainty that arises both from sampling and from systematic reporting bias compared to currently available approaches. The talk with conclude with applying the method to estimate network properties using data from the Chicago Health and Social Life Survey.

Bio: Dr. Ravi Goyal (Ph.D., Biostatistics, Harvard School of Public Health) is a statistician at Mathematica Policy Research where he focuses on developing and applying statistical network analysis methodology to improve and evaluate public programs. As a graduate student and research associate at Harvard University, his research focused on developing network sampling methods that capture uncertainties in network structure and applying these methods to analyze and model HIV epidemic data. During his employment as an applied mathematician at National Security Agency, he gained field experience (deployed to Iraq) and experience with real world complex datasets that included geospatial, longitudinal, and social network data.

If you would like invite others who are interested in the seminar series, please contact Melody Bazyar

MTF 168
04/11/2018 1:00 PM – 12:00 PM

Bayesian regression for group testing data

Joshua M. Tebbs

Abstract: Group testing involves pooling individual specimens (e.g., blood, urine, swabs, etc.) and testing the pools for the presence of a disease. When individual covariate information is available (e.g., age, gender, number of sexual partners, etc.), a common goal is to relate an individual's true disease status to the covariates in a regression model. Estimating this relationship is a nonstandard problem in group testing because true individual statuses are not observed and all testing responses (on pools and on individuals) are subject to misclassification arising from assay error. Previous regression methods for group testing data can be inefficient because they are restricted to using only initial pool responses and/or they make potentially unrealistic assumptions regarding the assay accuracy probabilities. To overcome these limitations, we propose a general Bayesian regression framework for modeling group testing data. The novelty of our approach is that it can be easily implemented with data from any group testing protocol. Furthermore, our approach will simultaneously estimate assay accuracy probabilities (along with the covariate effects) and can even be applied in screening situations where multiple assays are used. We apply our methods to group testing data collected in Iowa as part of statewide screening efforts for chlamydia. 

Bio: Dr. Joshua Tebbs is Professor in the Department of Statistics in the College of Arts and Sciences at University of South Carolina. He received his BS in Mathematics (1995) and MS in Statistics (1997) from University of Iowa and his PhD in Statistics (2000) from North Carolina State University. He is a Fellow of the American Statistical Association. His research involves the development of statistical methods for categorical data, primarily binary response data that are observed in pools (group testing), and for constrained inference problems motivated by biomedical and public health applications. His research has been funded by two R01 grants from the National Institutes of Health (NIH), he routinely serves on NIH and NSF review panels, and he has advised eight PhD students. His current work is aimed at surveillance and identification for multiple diseases in group testing applications, motivated by nationwide screening activities for chlamydia and gonorrhea.

If you would like invite others who are interested in the seminar series, please contact Melody Bazyar (

MET 223
04/02/2018 3:30 PM – 4:30 PM

Optimal treatment allocations in space and time for online control of an emerging infectious disease

Eric Laber, PhD

Abstract: A key component in controlling the spread of an epidemic is deciding where, when, and to whom to apply an intervention. We develop a framework for using data to inform these decisions in real-time. We formalize a treatment allocation strategy as a sequence of functions, one per treatment period, that map up-to-date information on the spread of an infectious disease to a subset of locations where treatment should be allocated. An optimal allocation strategy optimizes some cumulative outcome, e.g., the number of uninfected locations, the geographic footprint of the disease, or the cost of the epidemic. Estimation of an optimal allocation strategy for an emerging infectious disease is challenging because spatial proximity induces interference among locations, the number of possible allocations is exponential in the number of locations, and because disease dynamics and intervention effectiveness are unknown at outbreak. We derive a Bayesian online estimator of the optimal allocation strategy that combines simulation-optimization with Thompson sampling. The proposed estimator performs favorably in simulation experiments. This work is motivated by and illustrated using data on the spread of white-nose syndrome, a highly fatal infectious disease devastating bat populations in North America.

Bio: Dr. Eric Laber is an Associate Professor in Department of Statistics in North Carolina State University. His major research areas are causal inference, non-regular asymptotics, optimization, and reinforcement learning. The primary application areas include precision medicine, artificial intelligence, adaptive conservation, and the management of infectious diseases.

BRF (Biomedical Research Facility) 1102
02/27/2018 9:00 AM – 10:00 AM

Statistical methods for high-throughput genomic data

Zhixiang Lin, PhD

Abstract: In the first part of the talk, a dimension reduction method will be introduced where we extend Principal Component Analysis to propose AC-PCA for simultaneous dimension reduction and Adjustment for Confounding variation. We show that AC-PCA can adjust for variations across individual donors present in a human brain dataset. For gene selection purposes, we extend AC-PCA with sparsity constraints, and propose and implement an efficient algorithm. The second part of the talk will be focused on clustering methods in single cell genomics. In single cell genomics, it is technically challenging to obtain chromatin accessibility and gene expression data for the same cell. We have developed a computational approach to this problem, where a model-based clustering method is proposed to match cell sub-populations in these two data types. We also demonstrate that using one data type can guide clustering of the other data type. Our proposed Bayesian model accounts for the stochasticity due to biological and technical effects. Last, methodologies motivated by spatial temporal modeling of gene expression dynamics during human brain development will be briefly discussed.

Bio: Dr. Zhixiang Lin studied biological sciences at Tsinghua University (BS, 2010), computational biology & bioinformatics and statistics at Yale University (PhD, 2015). He is a postdoctoral scholar at Stanford University, Department of Statistics since 2015. His major research area is statistical genetics/genomics and computational biology. His work has been published in prestigious journals such as PNAS, Biometrics, Annals of Applied Statistics and Cell.

BRF (Biomedical Research Facility) 1102
01/19/2018 10:00 AM – 11:00 AM

A semi-supervised approach for predicting cell type/tissue specific functional consequences of non-coding variation using massively parallel reporter assays

Zihuai He, PhD

Abstract: Predicting the functional consequences of genetic variants is a challenging problem, especially for variants residing in non-coding regions. Projects such as ENCODE and Roadmap Epigenomics make available various epigenetic features, including histone modifications and chromatin accessibility, genome-wide in over a hundred different tissues and cell types. Meanwhile, recent developments in high-throughput assays to assess the functional impact of variants in regulatory regions (e.g. massively parallel reporter assays - MPRA, CRISPR/Cas9-mediated in situ saturating mutagenesis) can lead to the generation of high quality data on the functional effects of selected variants. We propose a semi-supervised approach, referred to as GenoNet, to jointly utilize experimentally confirmed regulatory variants (labeled variants), millions of unlabeled variants genome-wide, and more than a thousand cell type/tissue specific functional annotations on each variant to predict functional consequences of non-coding genetic variants. Through the application to several experimental datasets, we demonstrate that the proposed method significantly improves prediction accuracy compared to existing functional prediction methods, both at the organism level and at the tissue/cell type level. We further show that eQTLs and dsQTLs in specific tissues tend to be substantially more enriched among variants with high GenoNet scores, and how the GenoNet scores can be used to map regulatory variants in regions of interest, evaluate 3C interaction variants and aid in the discovery of disease associated genes through an integrative analysis of lipid phenotypes using a Metabochip dataset on 12,281 individuals.

Bio: Dr. Zihuai He received his Ph.D. in Biostatistics at the University of Michigan, and BS (Bachelor of Science) at Tsinghua University in China. He is currently a post-doctoral research scientist in the Department of Biostatistics at Columbia University. His research has been concentrated in the area of statistical genetics and integrative analysis of omics data. There have been 11 peer-reviewed journal publications generated from his work published in prestigious journals of genetics and statistics, such as The American Journal of Human Genetics, Journal of the American Statistical Association, and Biometrics. He has developed three R packages with efficient computational techniques that facilitate integrative analysis in a broad range of genomic study designs such as longitudinal studies, family studies and meta-analysis of multiple sequencing studies. At Columbia, he also collaborates with researchers in the GTEx Consortium for gene expression studies.

BRF (Biomedical Research Facility) 1104
01/12/2018 8:00 AM – 9:00 AM

Joint modeling of longitudinal functional feature and discrete time-to-event

Ling Ma, PhD

Abstract: In longitudinal studies, it is often of interest to investigate how the functional feature of a marker’s measurement process is associated with the event time of interest. We make use of B-splines to smoothly approximate the infinite dimensional functional data and propose a joint model of the longitudinal functional feature and the time to event. The proposed approach also allows for prediction of survival probabilities for future subjects based on their available longitudinal measurements and a fitted joint model. We illustrate our proposals on a prospective pregnancy study, namely Oxford Conception Study, where hormonal measurements of luteinizing hormone which is an important biomarker of ovulation is available. A joint modeling approach using functional analytic approach and discrete survival modeling was used to assess whether the functional feature of hormonal measurements, such as the curvature of the hormonal profile is associated with time to pregnancy.

Bio: Dr. Ling Ma received her PhD in Statistics from University of Missouri, Columbia in 2014. She then worked at the Eunice Kennedy Shriver National Institute of Child Health and Human Development (NICHD) as a postdoctoral fellow for two years before joining Clemson University as an assistant professor. Dr. Ma’s primary methodological research interests are survival analysis with special emphasis on interval-censored data and panel count data, joint modeling of longitudinal and time-to-event data. Dr. Ma has worked on statistical methods with applications to reproductive and environmental epidemiology, cancer, HIV, etc.

MTF 168
01/05/2018 10:00 AM – 11:00 AM

Deep Learning and Its Application in Predicting Enhancer in Human Genome

Xiaohui Niu, PhD

Abstract: Enhancer sequences contain short DNA motifs that act as binding sites for sequence-specific transcription factors. The crucial roles of enhancers in generating cell-type and state-specific transcriptional programs, further understanding of the process of enhancer transcription and its contribution to the overall functionality of enhancers will offer crucial insights into gene regulation, cell identity control, development and disease. However, this is a challenging problem because the very long distance of enhancer with its target gene increases the searching difficulty. Moreover, unlike promoter located in the upstream of its target gene, enhancer can act its regulatory role bidirectionally, which makes the problem more challenging. To address this need, we propose a novel hybrid convolutional and Gated Recurrent Unit (GRU) recurrent neural network framework for predicting enhancer de novo from sequence. In the model, the convolution layer captures regulatory motifs, while the recurrent layer captures long-term dependencies between the motifs in order to learn a regulatory ‘grammar’ to improve predictions. This model improves considerably in several benchmark datasets.

Bio: Dr. Xiaohui Niu is a visiting scholar in Department of Family Medicine and Public Health at UC San Diego and an associate professor in Huazhong Agricultural University in China. His research interests are machine learning methods and their applications in Bioinformatics, especially proteomics, including protein (gene) function prediction, protein binding site prediction, methods to construct phylogenetic tree, protein structure prediction etc.

MTF 168
12/06/2017 1:00 PM – 2:00 PM

Ethical Issues in Statistical Practice - Case Examples

Larry Shen, PhD

Abstract: In this presentation I discuss statistical integrity and some ethical issues in our practice in the bio-pharmaceutical industry. Some common ethical issues include data integrity, validity of statistical testing and conclusions, presentation of data, and post-hoc analyses. I am going to highlight a few ethical guidelines from the American Statistical Association and use a few case examples to illustrate ethical dilemmas that we often face in our daily work.

Bio: Dr. Larry Shen has a highly accomplished career in leading clinical organizations to support drug development and clinical research programs. He has directly worked on over 20 investigational new drug projects and played leading roles in regulatory submissions that had led to 6 drug approvals in both the US and European Union. He has authored or co-authored many articles on statistical methodology and their applications to drug development. His work on dose titration received the Thomas W Teal award at the 2007 Drug Information Association annual meeting. Dr. Shen also served as past President of the San Diego Chapter of the American Statistical Association (ASA). In 2014, Dr. Shen was elected as fellow of the ASA for his leadership in applying statistics to drug development and for his contributions to the statistics profession. Prior to co-founding Pharmapace, Dr. Shen was Vice President at Amylin Pharmaceuticals in charge of their clinical development organizations including Statistics, Programming, Data Management, PK/PD modeling, and Medical Writing. He had worked at Amylin since 1997 and had implemented rigorous procedures for data processing, analyses, and reporting to ensure data integrity and statistical excellence. Under his leadership, his department had played a critical role in the development and approval of four innovative medicines. Dr. Shen obtained his Ph.D. in Statistics from the University of California at Berkeley and both BS and MS degrees in mathematics/statistics from Beijing University in China.

MTF 168
11/01/2017 1:00 PM – 2:00 PM

Multi-Block Models for Psychiatric and Brain Imaging Data

Wesley Thompson, PhD

Abstract: Modern large-scale observational psychiatric studies collect data in a plethora of modalities, including questionnaires, structured clinical interviews, life histories, and many biological variables, including, e.g., structural and functional brain imaging, genetics, inflammatory measures. An important goal of such studies is to obtain a biological foundation for psychiatric diagnoses that are predictive of outcomes and/or response to specific treatments. However, a major difficulty in analyzing data from these studies is reducing dimensionality via revealing latent structures that inform about relationships across modalities, while simultaneously accounting for "batch" effects and method variance within modalities of measurement. Here, we present a Bayesian multi-level model that uncovers both shared and idiosyncratic factors within blocks (data modalities). We demonstrate that this methodology is effective in uncovering latent structure and predicting clinical outcomes in the T-1000 data, a large-scale of psychiatric disorders collecting data in scores of domains, including structural and functional imaging.

Bio:Dr. Wesley Thompson earned his Ph.D. in Statistics from Rutgers University in 2003, with a focus on statistical methods for longitudinal data analysis. He was appointed Assistant Professor of Statistics and Psychiatry at the University of Pittsburgh in 2005, where he received a five year NIH K25 Career Development Award to develop novel methods for studying co-variation in brain function and depression. Dr. Thompson joined UCSD in 2008, and is currently an Associate Professor of Family Medicine and Public Health within the Division of Biostatistics and Bioinformatics. His current work involves Bayesian semi-parametric and mixture models with applications to (i) improving effect size estimation, replication, and prediction in genome-wide association studies, (ii) predicting onset of illness from multivariate biomarker trajectories, (iii) applications of to functional and structural MRI data.

MTF 168
10/04/2017 1:00 PM – 2:00 PM

Are tumors predictable? Inherited genetic variation constrains tumor evolution

Hannah Carter, PhD

Abstract: Recent studies have characterized the extensive somatic alterations that arise during cancer and various studies have probed rare inherited mutations that lead to early onset cancer syndromes. However, little is understood about the role of genetic background in ‘sporadic’ adulthood cancers. It is possible that the somatic evolution of a tumor may be significantly affected by inherited polymorphisms carried in the germline. To investigate this, we analyzed genomic data for thousands of tumors from The Cancer Genome Atlas to reveal and systematically validate hundreds of genetic interactions between germline polymorphisms and major somatic events, including tumor formation in specific tissues and alteration of specific cancer genes. Among germline–somatic interactions, we found germline variants in RBFOX1 that increased incidence of SF3B1 somatic mutation by 8-fold via functional alterations in RNA splicing. Similarly, 19p13.3 variants were associated with a 4-fold increased likelihood of somatic mutations in PTEN. In support of this association, we found that PTEN knockdown sensitizes the MTOR pathway to high expression of the 19p13.3 gene GNA11. Finally, we observed that stratifying patients by germline polymorphisms exposed distinct somatic mutation landscapes, implicating new cancer genes. Our findings suggest that individual genomic data can help to forecast the trajectory of tumor evolution, including where and how cancer develops, opening avenues for prevention research.

Bio:Dr. Hannah Carter is an Assistant Professor in the UCSD Department of Medicine. She received her M.Eng in Electrical Engineering at the University of Louisville and her PhD in Biomedical Engineering from Johns Hopkins University. The Carter Lab uses bioinformatics and computational biology to study the role of inherited and acquired genetic variation in cancer. The goals of her research are to advance precision cancer medicine by developing approaches to discriminate drivers from passengers, predict cancer cell-specific therapeutic vulnerabilities and identify germline variation that contributes to the emergence or progression of tumors. Dr. Carter is a Siebel Scholar and a recipient of a 2013 NIH Director’s Early Independence Award.

MTF 168
09/06/2017 1:00 PM – 2:00 PM

“Efron's Rules” for Inference after Imputation and Model Selection

(Joint work with Lin Liu and Loki Natarajan)

Karen Messer, PhD

Abstract: We address the practical problem of model selection in the presence of imputation for missing data. Our focus is on valid inference, in particular on confidence intervals that incorporate both the imputation mechanism and the model selection mechanism. We investigate commonly used resampling-based approaches - multiple imputation and the bootstrap - and incorporate Efron's 2014 computationally efficient variance estimate for bootstrap-smoothed estimates. We compare the resulting `Efron's rules' estimator to a 'Rubin's rules' estimator based on multiple imputation. These turn out to be versions of frequentist model averaged estimators, and are compared to an un-averaged selection estimator using the framework of Claeskens and Hjort. Simulation and real data examples are drawn from the related literature. Practical recommendations are given, including circumstances where the new Efron's rules estimator is seen to work well.

Bio: Dr. Karen Messer is Professor and Chief of Division of Biostatistics and Bioinformatics in Department of Family Medicine and Public Health at UCSD. She went to Clairemont high school here in San Diego, and she was an undergraduate math major at Harvard university. Dr. Messer got her PhD in mathematical statistics at UCSD under Dr. Murray Rosenblatt. Before joining UCSD in 2006, Dr. Messer was assistant professor of mathematics at UCLA, then associate professor and professor of mathematics at California State University Fullerton.

MTF 168
04/12/2017 1:00 PM – 2:00 PM

Controlling Epidemics: Challenges and Opportunities for Quantitative Scientists

Victor De Gruttola, ScD

Abstract: Recent developments in biomedical science, such as those in molecular epidemiology and surveillance, vaccinology, and antimicrobial treatment, can greatly aid in devising effective responses to epidemic and endemic diseases. To take maximal advantage of such successes requires advances in quantitative science that combine across different disciplinary domains. For example in investigation and scale-up of HIV prevention interventions, challenges arise from the complex dependencies that characterize data from clinical studies and that reflecting the spread of HIV along sexual contact networks. Both randomized and observational studies often collect data on HIV incidence in different subpopulations, risk behavior, and viral genetic sequences. New methods are required to make maximal use of this very useful, but incomplete information to estimate quantities that will be useful in guiding scale-up of successful interventions. These include not only effects of randomized interventions—for trials randomized at both individual and cluster level-- but also expected effects under implementation policies likely to be used practice. We propose methods that make use of baseline data to improve estimation of intervention effects and of their modification by factors measured at individual and network levels. We show their advantages in settings with complete or missing data—for design of both randomized and observational studies with or without missing data. Cluster randomized trials are also useful for controlling outbreaks. We propose and demonstrate properties a novel design for settings like the Ebola epidemic, where a proof-of-principle vaccine trials provided evidence of efficacy, but where questions remain about the effectiveness of different possible modes of implementation. Our goal for these studies is not only to generate information about intervention effects but also to provide public health benefit. To do so, we leverage information about contact networks – in particular the degree of connection across randomized units obtained at study baseline – and develop a novel class of connectivity-informed cluster trial designs. We investigate the performance of these designs in terms of epidemic control outcomes (time to end of epidemic and cumulative incidence) and power to detect intervention effect, by simulating vaccination trials during an SEIR-type epidemic outbreak using a network-structured agent-based model.

Bio: Dr. Victor De Gruttola Professor of Biostatistics Department of Biostatistics Harvard T.H. Chan School of Public Health Dr. Victor De Gruttola has spent the past 30 years working with junior colleagues and in collaborating with clinical and laboratory investigators to develop and apply methods for advancing the HIV prevention and treatment research agendas. He also has managed large projects devoted to improving the public health response to the AIDS epidemic, both within the US and internationally. The aspects of the HIV epidemic on which he has worked include transmission and natural history of infection with the Human Immunodeficiency Virus (HIV), as well as investigation of antiretroviral treatments, including the development and consequences of resistance to them. The broad goals of his research have included developing treatment strategies that provide durable virologic suppression while preserving treatment options after failure, and evaluating the community-level impact of packages of prevention interventions, including antiviral treatment itself. He served as the Director of the Statistics and Data Analysis Center of the Adult Project of the AIDS Clinical Trials Group during the period in which highly active antiretroviral treatment was developed, and was instrumental in designing and analyzing studies of the best means of providing such therapy. He has also served as the Co-PI (with Max Essex) for a cluster-randomized trial of an HIV combination prevention program in Botswana. His methods research activity is focused HIV prevention research, especially with regard to the development of methods for analyses of sexual contact networks, for viral genetic linkage analyses in the presence of missing data, and for improving validity and efficiency of analyses of HIV prevention trials.

Leichtag 107
03/17/2017 11:00 AM – 11:45 PM

Modeling Time-Varying Trends in ERP Data with Applications to an Implicit Learning Paradigm in Autism

Kyle Hasenstab, PhD

Abstract: Event-related potential (ERP) studies are a set of experimental frameworks that use electroencephalography (EEG) to study the electrical potential outputted by a subject's brain when presented with an implicit task in the form of stimuli. Data consist of a temporally recorded functional ERP curve repeatedly observed over a sequence of stimuli and across a set of electrodes placed on the scalp, producing a complex data structure consisting of a functional (ERP curve), longitudinal (stimulus repetition), and spatial (electrode) dimension. In typical ERP studies, the dimension of data is reduced into a single measure for each subject by cross-sectionally averaging ERP across longitudinal and spatial repetitions in order to increase the signal-to-noise ratio of the ERP function. Features are then extracted from the averaged ERP and analyzed using simple statistical methods, ignoring additional information that may be found in the collapsed dimensions. In this talk, I discuss methodology for preserving and analyzing the lost dimensions of ERP data. In particular, I focus on multidimensional functional principal components analysis (MD-FPCA), a two-step procedure used to summarize important characteristics across all three dimensions of the ERP data structure into an interpretable, low-dimensional form. MD-FPCA is applied to a study on neural correlates of visual implicit learning in young children with autism spectrum disorder (ASD). Application of the proposed methods reveal meaningful trends and substructures in the implicit learning processes of ASD children when compared to typically developing controls. Results indicate proposed methodology effectively preserves important information contained within the multiple dimensions of ERP data.

Bio:Dr. Kyle Hasenstab recently earned his PhD in Statistics from the University of California, Los Angeles where he researched methods for analyzing data from EEG experiments to study implicit learning in children with autism spectrum disorder. He has worked as a postdoctoral fellow at the Centers for Disease Control and Prevention in their Chronic Viral Diseases Branch --and is currently working as a statistician for AT&T.

MTF 168
03/15/2017 2:00 PM – 3:00 PM

Inference of high-dimensional, non-sparse and strongly dependent Gaussian observations

David Azriel, PhD

Abstract: Motivated by a data set obtained from brain imaging, we study inference of high-dimensional observations without assuming a sparse parameter space. Our approach starts from computing the z-scores at each cortical voxel. The result is a large strongly-dependent vector of observations, assumed to be Gaussian. We study two issues: first, we investigate the empirical distribution of this vector and its possible departure from a standard normal distribution. Second, we study inference of linear-projections of this vector. Our analysis shows that the global null hypothesis (when there is no dependence between the response and the measurements) is not likely to be true. Furthermore, we find that the effect is widespread (non-sparse) but not large enough to be significant anywhere.

Bio:Dr. David Azriel is a senior lecturer in statistics at the Technion - Israel Institute of Technology since 2015. Previously, he was a postdoc with Larry Brown at Wharton at the University of Pennsylvania. He completed his PhD thesis at the Hebrew University in Jerusalem in 2012. His research interests are in high dimensional data, model selection and optimal clinical trial design.

MTF 168
03/01/2017 1:00 PM – 2:00 PM

Fast Estimation of Regression Parameters in a Broken-Stick Model for Longitudinal Data

Bin Nan, PhD

Abstract: Estimation of change-point locations in the broken-stick model has significant applications in modeling important biological phenomena. In this talk, Dr. Nan will present a computationally economical likelihood-based approach for estimating change-point(s) efficiently in both cross-sectional and longitudinal settings. The method, based on local smoothing in a shrinking neighborhood of each change-point, is shown via simulations to be computationally more viable than existing methods that rely on search procedures, with dramatic gains in the multiple change-point case. The proposed estimates are shown to have root-n consistency and asymptotic normality--in particular, they are asymptotically efficient in the cross-sectional setting--allowing us to provide meaningful statistical inference. As the primary and motivating longitudinal application, a two change-point broken-stick model appears to be a good fit to the Michigan Bone Health and Metabolism Study cohort data to describe patterns of change in log estradiol levels, before and after the final menstrual period. A plant growth dataset in the cross-sectional setting is also illustrated. This is a joint work with Rito Das, Mouli Banerjee, and Huiyong Zheng. 

Bio: Dr. Bin Nan is Professor of Biostatistics and Statistics at the University of Michigan. He received his Ph.D. in Biostatistics from the University of Washington in 2001 and joined the faculty at the University of Michigan in the same year. Dr. Nan's research interests are in various areas of statistics and biostatistics including semiparametric inference, failure time and survival analysis, longitudinal data, missing data and two-phase sampling designs, and high-dimensional data analysis. He is collaborating in many studies in areas of epidemiology, bioinformatics, and brain imaging, particularly in cancer, HIV, women's health, and neurodegenerative diseases. He is Fellow of the American Statistical Association and Fellow of the Institute of Mathematical Statistics.

BRF 1102
02/15/2017 2:00 PM - 3:00 PM

Recurrent event data analysis with intermittently observed time-varying covariates

Chiung-Yu Huang, PhD

Abstract: Although recurrent event data analysis is a rapidly evolving area of research, rigorous studies on modeling and estimation of the effects of time-varying covariates on the risk of recurrent events have been lacking. Existing methods for analyzing recurrent event data usually require that the covariate processes are observed throughout the entire follow-up period. However, covariates are often observed periodically rather than continuously. We propose a novel semiparametric estimator for the regression parameters in the popular proportional rate model. The proposed estimator is based on an estimated score function where we kernel smooth the mean covariate process. We show that the proposed semiparametric estimator is asymptotically unbiased, normally distributed and derive the asymptotic variance. Simulation studies are conducted to compare the performance of the proposed estimator and the simple methods carrying forward the last covariates. The different methods are applied to an observational study designed to assess the effect of Group A streptococcus (GAS) on pharyngitis among school children in India.

Bio: Dr. Huang is Associate Professor of Oncology and Biostatistics at the Johns Hopkins University. Her main area of research is in general biostatistics methodology and its application to the biomedical sciences. She has extensive experience in the statistical analysis of survival outcomes, recurrent events, competing risks, longitudinal measurements, missing data, biased sampling, and design and monitoring of clinical trials.

BRF 1104
02/09/2017 2:00 PM - 3:00 PM

Optimally combining outcomes to improve prediction

David Benkeser, MPH, PhD

Abstract: In many studies, multiple instruments are used to measure different facets of an unmeasured outcome of interest. For example, in studies of childhood development, children are administered tests in several areas and researchers combine these test scores into a univariate measure of neurocognitive development. Researchers are interested in predicting this development score based on household and environment characteristics early in life in order to identify children at high risk for neurocognitive delays. We propose a method for estimating the combined measure that maximizes predictive performance. Our approach allows modern machine learning techniques to be used to predict the combined outcome using potentially high-dimensional covariate information. In spite of the highly adaptive nature of the procedure, we nevertheless obtain valid estimates of the prediction algorithm’s performance for predicting the combined outcome as well as confidence intervals about these estimates. We illustrate the methodology using longitudinal cohort studies of early childhood development.

Bio: David Benkeser, PhD, MPH is a post-doctoral researcher under Mark Van der Laan in the Division of Biostatistics at the University of California, Berkeley where he works on developing methods for machine learning, causal inference, and the integration of the two fields. He obtained his PhD from the Department of Biostatistics at the University of Washington where his research focused on causal inference in complex longitudinal settings with applications in preventive vaccine efficacy trials for infectious diseases.  

BRF 1102
01/24/2017 2:00 PM - 3:00 PM

The Evolution of a Statistical Consulting Course

Colleen Kelly, Ph.D.

Kelly Statistical Consulting

Abstract: In 2000, Duane Steffey and I founded the SDSU Consulting Center and developed a Statistical Consulting course in response to university and private consulting requests and a desire to better train our graduate students for careers as applied statisticians. In the successive 15 years, I became increasingly devoted to statistical consulting as a career and eventually left academia for a career in consulting. In 2009, I founded Kelly Statistical Consulting. My industrial consulting experience has revised my vision of what is important to teach in a consulting course. In this talk, I present the common elements to most statistical consulting courses and how my presentation of these elements has evolved over the last 15 years. I discuss the (sometimes hard) lessons learned, and what I believe to be the key elements of a successful course.

Bio: Dr. Colleen Kelly is an Accredited Professional Statistician™ and has over 25 years of statistical consulting experience as a statistical consultant, professor, and researcher specializing in statistical methodology for clinical trials and biotechnology. As a tenured associate professor of statistics at San Diego State University, Dr. Kelly co-founded and co-directed the university’s statistical consulting center. At Victoria University in Wellington, New Zealand, she directed the university’s statistical consulting center and developed and taught their statistical consulting course. Currently, she heads Kelly Statistical Consulting, Inc., which provides statistical services to biotechnology, pharmaceutical and medical device companies.

MTF 168
12/07/2016 1:00 PM - 2:00 PM

A distance-weighted model for methylation change with application to whole genome bisulfite sequencing data

Michelle Lacey , Ph.D.

Associate Professor of Biostatistics, Tulane University

Abstract: Variation in cytosine methylation at CpG dinucleotides is often observed in genomic regions, and analysis typically focuses on estimating the proportion of methylated sites observed in a given region and comparing these levels across samples to determine association with conditions of interest. While sites are typically treated as independent, when observed at the level of individual molecules methylation patterns exhibit strong evidence of local spatial dependence. We previously introduced a neighboring sites model to account for correlation and clustering behavior observed in two tandem repeat regions in a collection of ovarian carcinomas. We now introduce an extension of the model that accounts for the effect of distance between sites. We apply our model to data from a whole genome sequencing experiment using overlapping 300-bp reads, demonstrating its ability to detect distance-weighted effects in regions with intermediate levels of methylation.

: Michelle Lacey earned her PhD in Statistics from Yale University and joined the Tulane faculty in 2003. She is currently appointed as Associate Professor of Mathematics and Adjunct Associate Professor of Biostatistics at Tulane University, and in addition to regularly teaching graduate courses in statistical modeling and data analysis for the School of Science and Engineering she is a contributing lecturer for courses at the Tulane University School of Medicine and the School of Public Health and Tropical Medicine. Dr. Lacey directs the Tulane Cancer Center Genomics Analysis Core to provide statistical support to researchers conducting high-throughput experiments, and she maintains an independent research program in epigenetic modeling and analysis. She also collaborates with researchers in the school of Science and Engineering and has recently established a consulting relationship with the World Food Programme to assist in the development of statistical methods for modeling and analysis of food security survey data.

MTF 168
11/02/2016 1:00 PM - 2:00 PM


A flexible framework for fitting mixed models based on automatic differentiation

Hans J. Skaug, Ph.D.

Professor of Statistics, Department of Mathematics, University of Bergen

Abstract: I will describe a flexible framework for doing empirical Bayes inference in general mixed models. The marginal likelihood is evaluated using the Laplace approximated, and optimized using a Newton-type method. The technical details of the Laplace approximation is hidden from the user via a technique called Automatic Differentiation. The approach has been implemented in the software package TMB ( TMB is an R package, but links to C++ code for evaluation of the joint (in fixed and random effects) likelihood. I will discuss how TMB can be used to build mixed model R packages, and give examples of such R packages.

Bio: Hans J. Skaug is Professor in statistics at the Department of Mathematics, University of Bergen. He received his PhD (Dr. Scient) in 1994. His field of research is statistical ecology and computational statistics.

MET 307
10/26/2016 1:00 PM - 2:00 PM

Equivariant Functional Analysis of Curves in SO(3) with Applications to Gait Analysis

Fabian Telschow, Ph.D.

University of Goettingen, Germany

Abstract: In gait analysis of the knee joint data are curves in the group of 3×3 rotation matrices. We introduce and study S-equivariant functional models (viz., Gaussian perturbations of a center curve) and provide a uniform strongly consistent estimator for the center curve. Here S is a certain Lie group, which models the effect of different marker placements and self-chosen walking speeds in real gait data. Moreover, we provide novel estimators correcting for different marker placements and walking speeds and provide different statistical tools to analyze such data, for example, simultaneous confidence sets and permutation tests. The methods are applied to real gait data from an experiment studying the effect of short kneeling.

Bio: Fabian Telschow got his PhD degree from the University of Goettingen. His supervisor was Stephan Huckemann. He developed statistical tools for the analysis of biomechanical gait data in cooperation with the biomechanist Michael Pierrynowski from McMaster University, Canada. During his studies for the Msc. degree in pure math in Göttingen (specialized in the intersection of algebraic topology and geometry) he worked part-time in the group of Axel Munk as a student research assistant on statistical analysis of 2D-NMR spectroscopy. His current research interests are real world applications of non-euclidean statistics, especially, if the data are curves.

MTF 168
10/05/2016 1:30 PM - 2:30 PM

A statistical perspective on health behavior research: a tale of multiple exposures and multiple outcomes

Loki Natarajan, Ph.D.

Professor of Biostatistics and Bioinformatics, UCSD Department of Family Medicine and Public Health

Abstract: Leading a healthy lifestyle can positively impact health, and reduce the risk of cancer, cardiovascular disease, and other chronic diseases. Health behaviors include many modifiable factors such as physical activity, diet, sleep, and smoking. This multiple exposure-multiple outcomes aspect of health behavior research calls for novel statistical approaches for study design and data analysis. In this talk we will discuss some of these approaches.
In the first part of the talk, we will present a “biobridge” design for a lifestyle intervention trial. Specifically, we will develop an analytic method to calculate a weighted risk score from several intermediate outcomes, and discuss how to quantify future clinical benefit through intervention-related changes on this risk score. Relative weights for the intermediate outcomes are derived by comparing a disease model conditional on the joint distribution of these outcomes to the corresponding marginal models. We will show analytically and via simulations that using marginal parameters as the weights in the risk score, and ignoring inter-correlations amongst the outcomes, yields biased estimates. Our proposed weighted risk score corrects for these biases. We will apply this method to design a weight-loss intervention trial with multiple biomarker outcomes.
In the second part of the talk, we will discuss the use of Bayesian networks to model multiple health behaviors and outcomes. Bayesian networks are a probabilistic machine learning approach which can be used to model multivariate relationships and represent them via intuitively meaningful graphs. We will apply this method to a sample of 333 overweight post-menopausal breast cancer survivors to model associations between BMI, lifestyle behaviors (alcohol intake, smoking, physical activity, sedentary behavior, sleep quality), psychosocial factors (depression, quality of life), biomarkers (insulin, C-reactive protein), demographics (age, education), and tumor factors. Using these networks, we will quantify the strength of association and infer (conditional) dependencies amongst these variables. Our results demonstrate that Bayesian networks could be a powerful exploratory tool for health behavior research.

MTF 168
09/28/2016 1:00 PM - 2:00 PM

Exact inference on the restricted mean survival time

Lu Tian, Sc.D.

Professor and Vice Chair, UCLA Department of Biostatistics, UCLA Jonathan and Karin Fielding School of Public Health

Abstract: In a randomized clinical trial with the time to event as the primary endpoint, one often evaluates the treatment effect by comparing the survival distributions from two groups. This can be achieved by for example estimating the hazard ratio under the popular proportional hazards (PH) model. However, when the hazard rate is very low, e.g., in safety studies, there may be too few observed events to warrantee the valid asymptotical inferences under the PH regression. The exact inference including hypothesis testing and constructing 95% confidence interval for the treatment effect is desired. In this paper, we have developed exact inference procedure for estimating the treatment effect based on the difference in restricted mean survival time between two arms, which is more appealing than hazard ratio in many applications. The proposed procedure is valid regardless of the number of events. We have also performed a simulation study to examine the finite sample performance of the proposed method.

Bio: Dr. Lu Tian received my Sc.D. in Biostatistics from Harvard University. He has considerable experience in statistical methodological research, planning large epidemiological studies, performing data management for randomized clinical trials and conducting applied data analysis. My current research interests are in developing statistical methods in personalized medicine, survival analysis, meta analysis and high throughput data analysis.

MTF 168
09/07/2016 1:00 PM - 2:00 PM

How Principles for Analyzing Incomplete Data Motivate Viewing Trust and Understanding as Twin Pillars of Ethics in Statistics

Thomas R. Belin, PhD

Professor and Vice Chair, UCLA Department of Biostatistics, UCLA Jonathan and Karin Fielding School of Public Health

Abstract: Accepting the need to enunciate ethical principles in the field of statistics, how might it be possible to encompass the scope and generality of what we do into a complete yet digestible set of guidelines? Drawing on reflections by leading statisticians about the nature of our work, scientific insights regarding how the human condition induces imperatives for people to communicate with one another, game-theory perspectives on competition and cooperation, and other philosophical discourse on the ethics of interpersonal interactions, it is argued that trust and understanding are essential core principles that can serve as the basis for judging whether a statistical approach is ethical. The framework's simplicity makes it easy to communicate, its generality gives it power, and its positive-sum appeal could be used to promote professional identity development around ethics. The presentation will also consider connections between this framework and principles for analyzing incomplete data, where the dual goals of reflecting all available information and accurately representing uncertainty have parallels to cultivating understanding and cultivating trust. Recent efforts to develop flexible joint-modeling strategies to handle highly multivariate data sets with a broad array of data types will also be discussed.

Bio: Thomas R. Belin, Ph.D. is a Professor in the UCLA Department of Biostatistics with a joint appointment in the UCLA Department of Psychiatry and Biobehavioral Sciences. He started at UCLA in 1991 after receiving his Ph.D. that year from Harvard University, working with Donald Rubin in the Harvard Department of Statistics on incomplete-data problems related to the decennial census in the United States. Specializing in statistical analysis with missing data and related extensions to causal inference, he has supervised over a dozen doctoral dissertations and was recognized in 2015 by the UCLA Public Health Student Association for "Outstanding Advising and Mentorship for Ph.D. and Dr.P.H. Students". He also serves as Vice Chair of the UCLA Department of Biostatistics, and his professional activities include being a member since 2014 of the American Statistical Association Committee on Professional Ethics. He was elected Fellow of the American Statistical Association in 2004, and in 2005 he received the Washington (D.C.) Statistical Society Gertrude M. Cox Award honoring a statistician making "significant contributions to statistical practice."

MTF 168
07/06/2016 1:00 PM - 2:00 PM

Statistical Investigation of Ensemble Kalman Filter

Soojin Roh, PhD

Abstract: Data assimilation is a statistical method to combine the output from numerical models with observations to give an improved forecast. The ensemble Kalman filter is a widely used data assimilation method in diverse areas such as weather forecasting and aerospace tracking. In this talk I will discuss the ensemble Kalman filter and some practical issues. I will then discuss a robust ensemble Kalman filter.

Bio: Dr. Soojin Roh received her PhD in Statistics from Texas A&M University. She is currently a lecturer in the Department of Statistics at Rutgers University. Her research interests include spatial statistics, data assimilation, robust estimation.

MTF 168
05/18/2016 2:00 PM - 3:00 PM

Bayesian Semiparametric Latent Variable Model: An Application on Fibroid Tumor Study

Mingan (Mike) Yang, PhD

Abstract: In parametric hierarchical models, it is standard practice to place mean and variance constraints on the latent variable distributions for the sake of identifiability and interpretability. Because incorporation of such constraints is challenging in semiparametric models that allow latent variable distributions to be unknown, previous methods either constrain the median or avoid constraints. In this article, we propose a centered stick-breaking process (CSBP), which induces mean and variance constraints on an unknown distribution in a hierarchical model. This is accomplished by viewing an unconstrained stick-breaking process as a parameter-expanded version of a CSBP. An efficient blocked Gibbs sampler is developed for approximate posterior computation. The methods are illustrated through a simulated example and an epidemiologic application.

Bio: Dr. Mingan Yang is an Assistant Professor of Biostatistics at graduate school of public health, San Diego State University. Upon graduation, he completed a postdoctoral research at Duke University and NIEHS, NIH, under the supervision of Dr. David Dunson. He specializes in Bayesian Statistics, Computational statistics, survival analysis, latent variable models, variable selection, and mixed effects models. He develops statistics methodology research with emphasis to address problems arising from health and medicine. Some research results are published in statistical journals such as Biometrics, Psychometrika, Computational Statistics & Data Analysis, and Biometrical Journal etc.

MTF 168
04/06/2016 1:00 PM - 2:00 PM

Correlation and Mixture in High Dimensional Data: Should the Empirical Distribution Look Normal?

Armin Schwartzman, PhD

Abstract: Large scale multiple testing problems, such as in brain imaging and genomics, base their inference on a large number of z-scores. If most effects are null, it seems natural that the empirical distribution of z-scores should follow a standard normal distribution. But should it? In this talk Dr. Schwartzman will show two ways in which the empirical distribution of z-scores can be deceiving, because of correlation and mixture. First, following Efron’s (2007) conjecture, Dr. Schwartzman shows that even if the z-scores are standard normal, the empirical distribution may depart from it, due to strong correlation caused by hidden random effects. Instead, it may be approximated by a Gaussian mixture that generalizes Efron’s empirical null distribution. Second, Dr. Schwartzman shows that if the original data is a Gaussian mixture, then within-class standardization using a template-based EM algorithm produces z-scores whose empirical distribution looks standard normal. However, their true distribution has in fact lighter tails.

MTF 168
03/16/2016 2:00 PM - 3:00 PM

Fence Methods for Genetic Application

Thuan Nguyen, PhD

Abstract: Model search strategies play an important role in finding simultaneous susceptibility genes that are associated with a trait. More particularly, model selection via the information criteria, such as the BIC with modifications, have received considerable attention in quantitative trait loci (QTL) mapping. However, such modifications often depend upon several factors, such as sample size, prior distribution, and the type of experiment, e.g., backcross, or intercross. These changes make it difficult to generalize the methods to all cases. The fence method avoids such limitations with a unified approach, and hence can be used more broadly. In this talk, the method is studied in the case of backcross experiments (BE). In particular, a variation of the fence, called restricted fence (RF), is applied to BE, and its performance is evaluated and compared with the existing methods. Furthermore, we incorporate our recently developed strategy for model selection with incomplete data, known as the E-MS algorithm, with the RF to address the common missing value concerns in BE. Our study reveals some interesting findings in association with the missing data mechanisms. The proposed method is illustrated with a real data analysis involving QTL mapping for an agricultural study on barley grains.

MTF 168
03/03/2016 1:00 PM - 2:00 PM

Functional Response Models: A Unified Paradigm for Between- and Within-subject Attributes

Xin Tu, PhD

Abstract: Modern statistical methods provide a powerful tool to address complex statistical issues arising in clinical and translational research. However, the predominant statistical paradigm is only applicable to modeling relationships defined by within-subject attributes such as alcohol use and suicide from the same subject. Many relationships of interest in the age of the internet and mobile technology involve variables measuring between-subject attributes such as human interaction and such attributes are not amenable to treatment by conventional statistical models. In this talk, I will discuss a class of functional response models (FRM) to address this fundamental limitation in the current statistical paradigm. The between-subject attribute is not a concept unique to timely issues such as modeling human interaction in social networks, but is actually a fundamental barrier to understanding many classic statistical methods in order to extend them to address their limitations when applied to cutting-edge statistical problems in clinical and translational research. I will illustrate the FRM using a wide range of topics with both real and simulated data.

MTF 168
02/25/2016 1:00 PM - 2:00 PM

Statistical Challenges of Using Electronic Data and Existing Research Infrastructure for CER

Mi-Ok Kim, PhD

Abstract: Developing the health information technology infrastructure to support comparative effectiveness research (CER) was a core objective of the American Recovery and Reinvestment Act of 2009. Many research networks, each including between 11,000 and 7.5 million patients each and more than 18 million in total, have established and numerous CER studies have been conducted. As compared to randomized clinical trials, these studies are less resource demanding and quickly collect data that are more representative of routine clinical care in large cohorts of patients over a long period of follow-up. Their utility, however, is restricted by the fact that treatment choice is affected by known or unknown prognostic factors, and consequently treatment groups are not directly comparable. This situation known as confounding by indication for treatment may render observational studies invalid and irrelevant unless properly addressed. Proper treatment of confounding is further complicated in data obtained from registries, network databases or the Electronic Health Record (EHR) where subjects or patients are commonly clustered in ways that may be relevant to the analysis. We will extend propensity score (PS) methodology and related sensitivity analysis to address measured and unmeasured confounding in the clustered data with the following aims:

Aim 1: Investigate how to optimally extend the PS methodology and identify what works best when

Aim 2: Develop a novel sensitivity analysis approach

Aim 3: Identify valid and most efficient PS methods for two existing CER studies.

We will use Monte Carlo computer simulation studies and real data including two existing CER studies. The real data examples will provide clinically plausible and interesting hierarchical data contexts and inform the design of the computer simulation studies about various types of outcomes that comprehend typical features of patient reported outcomes (PROs).

MTF 168
02/04/2016 1:30 PM - 2:30 PM

Statistical Learning, Inference and Models for Big Data

Nancy Reid, PhD

University Professor of Statistical Sciences,Canada Research Chair in Statistical Theory and Applications Director, Canadian Statistical Sciences Institute Department of Statistical Sciences, University of Toronto

Biography: Dr. Nancy Reid is University Professor and Canada Research Chair in Statistical Methodology at the University of Toronto. Her research interests are in statistical theory, likelihood inference, and design of studies.  Along with her colleagues she has developed higher order asymptotic methods both for use in applications, and as a means to study theoretical aspects of the foundations of inference, including the interface between Bayesian and frequentist methods.  She is the Director of the Canadian Statistical Sciences Institute.

Dr. Reid received her PhD from Stanford University, under the supervision of Rupert Miller. She taught at the University of British Columbia before moving to the University of Toronto, and has held visiting positions at the Harvard School of Public Health, University of Texas at Austin, Ecole Polytechnique Federale de Lausanne, and University College London.

She has been President of the Institute of Mathematical Statistics and the Statistical Society of Canada, and Vice-President of the International Statistical Institute. She is a Fellow of the American Association for the Advancement of Science, the Royal Society of Canada and the Royal Society of Edinburgh. In December 2014 she was appointed Officer of the Order of Canada.

Abstract: The Canadian Statistical Sciences Institute and the Fields Institute for Research in the Mathematical Sciences recently completed a six month thematic research program with this title.  I will give an overview of the topics covered with emphasis on linkages between different areas, common problems, and common strategies.  While the program was only able to cover a small fraction of the world of “Big Data”, the breadth of the material covered by the large number of speakers was very stimulating.

MET 141
12/02/2015 1:00 PM - 2:00 PM

Identifying treatment effect heterogeneity using propensity score based quantile regression

Matthew Cefalu, PhD

Associate Statistician, RAND Corporation

Biography: Dr. Matthew Cefalu is an Associate Statistician at the RAND Corporation, where his research is primarily focused on the development and application of novel methods for causal inference. Examples of past and present research projects include the Health-Related Behaviors Survey of Military Personnel, an independent assessment of the VA healthcare system, and the CAHPS Hospital Survey. Dr. Cefalu received his PhD in Biostatistics from Harvard University in 2013.

Abstract: There is a vast literature on estimating causal effects from observational data, and the majority of these methods focus on estimating marginal treatment effects (i.e. treatment effects in the entire population). However, it is often of interest to identify subpopulations for whom the treatment is most effective. We will use locally weighted quantile regression, where locality is based on the propensity score, to identify if treatment effect heterogeneity is present. This method will be illustrated using data from a study assessing the efficacy of Motivational Enhancement Therapy-Cognitive Behavioral Therapy 5 in treating adolescents with cannabis-related disorders.

MTF 168
11/04/2015 1:00 PM - 2:00 PM

REVEALER: Mapping Genomic Alterations to Functional Profiles of Pathway Activation, Gene Dependency and Drug Sensitivity

Pablo Tamayo, PhD

Professor, Division of medical Genetics, Department of Medicine, UC San Diego Medical School, Moores Cancer Center at UC San Diego Health

Biography: Dr. Pablo Tamayo is a Cancer Researcher at UC San Diego Moores Cancer Center and a Professor at the UCSD School of Medicine. Prior to UCSD he worked as a senior computational biologist at the Broad Institute of MIT and Harvard, as a consulting member of staff for the Advanced Analytics group at Oracle Corp., as senior researcher and chief scientist of Thinking Machines Corp., at the Theoretical Division (T-8) of the Los Alamos National Laboratory and as a research assistant at Boston University. He obtained a Ph.D. in Statistical Physics and a B.S. in Physics Engineering. During the last two decades he has worked on the study of cancer pathways, models of oncogene activation, models of pharmacological response, discovery of disease subtypes and integrated models to delineate and characterize cellular cancer states. He has been an original contributor to the development of many genomic data analysis methods including Gene Set Enrichment Analysis (GSEA), the Molecular Signatures Database (MSigDB) and the GenePattern Analysis Environment. His most recent work has focused on the development of experimental and computational models of oncogenic transformation, cancer vulnerabilities and catalogs of oncogenic states. He has also worked on an information-theoretic approach to find associations and co-analyze diverse types of cancer data with different statistical properties. He has published over 130 articles with over 35,000 citations. His publication list can be found in:

Abstract: Systematic efforts to sequence the cancer genome have identified many of the recurrent mutations and copy number alterations in tumors. However, in many cases the role(s) played by these alterations is not obvious and necessitates an effective functional characterization of the pathways and networks that these genomic alterations regulate. Here we introduce REVEALER (Repeated Evaluation of VariablEs conditionAL Entropy and Redundancy), an analysis method that enables the discovery of an ensemble of mutually exclusive genomic alterations correlated with “functional” phenotypes, e.g., the activation or dependency of oncogenic pathways. We use REVEALER to identify complementary genomic alterations that account for a large fraction of the ”activated” or “dependent” samples with respect to four targets: the transcriptional activation of β-catenin and NRF2, MEK-inhibitor sensitivity, and KRAS dependency. REVEALER was able to “re-discover” several known features, as well as identify a number of novel associations, demonstrating the power of using information-theoretic association metrics to combine functional profiles with extensive characterization of alterations in cancer genomes.

MTF 168
10/07/2015 1:00 PM - 2:00 PM

Exome Sequencing and Analysis of Phenotypic Extremes to Identify Genetic Modifiers of Iron Status in Hemochromatosis HFE C282Y Homozygotes

Christine McLaren, PhD

Professor, Epidemiology, School of Medicine,Vice Chair for Academic Affairs, Epidemiology, Scientific Member, Genetic Epidemiology Research Institute, Director of Biostatistics, Chao Family Comprehensive Cancer Center, University of California, Irvine

Biography: Dr. Christine McLaren is Professor and Vice Chair of the Department of Epidemiology in the School of Medicine at the University of California, Irvine. Dr. McLaren is also co-Leader of the Program in Cancer Prevention, Outcomes, and Survivorship and a member of the Biostatistics Shared Resource of the Chao Family Comprehensive Cancer Center, at UC Irvine. Dr. McLaren has focused on statistical modeling research and has concentrated on two important areas: (1) statistical modeling of biomedical data and (2) collaborative research in cancer. She is an elected Fellow of the American Statistical Association, in part for “innovative research in biology and medicine”. Dr. McLaren is Principal Investigator of the NIH/NDDK R24 grant, “Genetic Modifiers of Iron Status in Hemochromatosis HFE C282Y Homozygotes”.

Abstract: Approximately one million people in the United States are at risk for development of iron overload, attributable primarily to the genetic disorder known as hemochromatosis. In the NIH-funded Hemochromatosis and Iron Overload Screening (HEIRS) Study, 101,168 multi-ethnic participants in primary care were screened for iron overload and hemochromatosis. Dr. McLaren will describe her role as PI of a Field Center for the HEIRS Study and her contributions to study design and analyses. Her team enrolled over 20,000 primary-care patients in UC Irvine primary-care clinics and in community clinics throughout Orange County. She will also describe subsequent statistical studies designed to answer the question “What role do genetic modifiers play in determining iron accumulation in persons homozygous for the HFE C282Y genotype”.

MET 223
09/02/2015 1:00 PM - 2:00 PM

Case series analysis of infection-cardiovascular risk in patients on dialysis with exposure onset measurement error

Danh V. Nguyen, PhD

Professor, Medicine (Biostatistics), Department of Medicine, Division of General Internal Medicine, Director, Biostatistics, Epidemiology & Research Design Unit, UCI Institute for Clinical and Translational Science, University of California, Irvine

Biography: Danh Nguyen, PhD, is Professor in the Department of Medicine, Division of General Internal Medicine and Director of the Biostatistics, Epidemiology & Research Design (BERD) Unit in the Institute for Clinical and Translational Science, University of California Irvine. Prior to joining UC Irvine in 2013, he was Professor in the Division of Biostatistics, Department of Public Health Sciences, at UC Davis from 2003

Abstract: Cardiovascular disease and infection are major factors for morbidity and mortality in patients on dialysis. Hospitalization data from United States Renal Data System (USRDS) captures nearly all (> 95%) patients with end-stage renal disease in the U.S., the largest source of research data available for this population. Although the precise mechanisms by which infection may affect cardiovascular events are not fully known, infections may affect vascular endothelium, create a chronic sub-clinical inflammatory state that affects atherosclerosis, or may create a procoagulant state. Thus, we hypothesize that the time period following infection are associated with increased cardiovascular event risk. The self-controlled case series, or simply called case series, design/method and analysis of infection-cardiovascular risk in patients on dialysis using USRDS data presents several unique challenges, including (1) the timing of infection (or exposure) onset error since the time of infection is not known precisely, (2) misspecification of risk period, (3) and other inferential challenges, such as formal hypothesis testing. In this talk I will discuss current resolutions/developments for some of these challenges related to case series analysis and open topics in other areas of applications.

MTF 168

07/01/2015 1:00 PM - 2:00 PM

Identifying Longitudinal Trends within EEG Experiments

Damla Senturk, PhD

Associate Professor, Department of Biostatistics, School of Public Health University of California, Los Angeles

Abstract: Differential brain response to sensory stimuli is very small (a few microvolts) compared to the overall magnitude of spontaneous electroencephalographam (EEG), yielding a low signal-to-noise ratio (SNR) in studies of event-related potentials (ERP). To cope with this phenomenon, stimuli are applied repeatedly and the ERP signals arising from the individual trials are averaged at the subject level. This results in loss of information about potentially important changes in the magnitude and form of ERP signals over the course of the experiment. In this paper, we develop a meta-preprocessing step utilizing a moving average of ERP across sliding trial windows, to capture such longitudinal trends. We embed this procedure in a weighted linear mixed effects model to describe longitudinal trends in features such as ERP peak amplitude and latency across trials while adjusting for the inherent heteroskedasticity created at the meta-preprocessing step. The proposed unified framework, including the meta-processing and the weighted linear mixed effects modeling steps, is referred to as MAP-ERP (Moving-Averaged-Processed ERP). We perform simulation studies to assess the performance of MAP-ERP in reconstructing existing longitudinal trends and apply MAP-ERP to data from young children with autism spectrum disorder (ASD) and their typically developing counter parts to examine differences in patterns of implicit learning, providing novel insights about the mechanisms underlying social and/or cognitive deficits in this disorder.

Biography: Dr. Damla Senturk received her Ph.D. degree in Statistics from UC Davis in 2004 and joined the faculty in the Department of Statistics at Pennsylvania State University. She joined the faculty of the UCLA Department of Biostatistics in 2011 where she has been an Associate Professor in Residence since July 1st of 2013. Her main areas of statistical methodology research are longitudinal and functional data analysis, semiparametric adjustments in regression modeling and measurement error models. Her main collaborative research areas include psychiatry and nephrology.

MTF 168

06/03/2015 1:00 PM - 2:00 PM

Social Networks and Health: From Observation to Experimentation to Intervention

James Fowler, PhD

Professor, Medical Genetics Division, Department of Medicine Political Science Department, Division of Social Sciences Dept. of Family Medicine & Public Health University of California, San Diego

Abstract: From Framingham to Facebook, we have used a variety of social networks to measure, analyze, and change the effect of social networks on health. In this talk I will discuss a number of papers using different methods to better understand how networks function and what we can do to use them to make people healthier.

Biography: Dr. James Fowler earned a PhD from Harvard in 2003 and is currently a Professor at the University of California, San Diego. His work lies at the intersection of the natural and social sciences, with a focus on social networks, behavior, evolution, politics, genetics, and big data. Dr. Fowler was named a Fellow of the John Simon Guggenheim Foundation, one of Foreign Policy's Top 100 Global Thinkers, TechCrunch's Top 20 Most Innovative People, Politico's 50 Key Thinkers, Doers, and Dreamers, and Most Original Thinker of the year by The McLaughlin Group. He has also appeared on The Colbert Report. His research has been featured in numerous best-of lists including New York Times Magazine's Year in Ideas, Time's Year in Medicine, Discover Magazine's Year in Science, and Harvard Business Review's Breakthrough Business Ideas. Together with Nicholas Christakis, James wrote a book on social networks for a general audience called Connected. Winner of a Books for a Better Life Award, it has been translated into twenty languages, named an Editor's Choice by the New York Times Book Review, and featured in Wired, Oprah's Reading Guide, Business Week's Best Books of the Year, and a cover story in New York Times Magazine.

MTF 168

05/06/2015 1:00 PM - 2:00 PM

Analysis of Longitudinal Data Under Biased Sampling

Yong Chen, PhD

Assistant Professor, Division of Biostatistics, University of Texas School of Public Health

Abstract: Over the past few decades, a dramatic increase in the incidence of obesity has become a worldwide health issue, contributing significantly as a risk factor of many diseases. Many individuals participate in web-based weight loss programs where their weights, physical activities and diets are self-reported. Such web-based program generated data poses new challenges to statistical modeling and inference, including subject-specific self-reporting times and outcome-dependent missingness. These challenges are known as biased sampling problem in statistical literature, and can lead to substantial bias in inference. In this talk, we propose a framework of novel statistical methods to efficiently detect and adjust for sampling bias, and to evaluate both the overall effectiveness of the weight loss program and the subject-specific effects of website usages on weight loss. The proposed methods provide elegant solutions for detecting and eliminating the impacts of biased sampling, and can achieve unbiased inference without fully specifying the complex data-generating mechanism. We apply the proposed methods to evaluate the effectiveness of a web-based program on weight loss, controlling the nonlinear trajectory of weights over time.

MTF 168

04/15/2015 1:00 PM - 2:00 PM

Robust mixed-effects model for clustered failure time data: application to Huntington's disease event measures

Tanya P. Garcia, PhD

Assistant Professor, Texas A&M University, School of Public Health

Biography: Tanya is an Assistant Professor in the Department of Epidemiology and Biostatistics at Texas A&M University, Health Science Center, School of Public Health. Previously, she worked in the Bioinformatics Training Program at Texas A&M University. She received a Ph.D. in Statistics from Texas A&M University in 2011 under the advisement of Prof. Yanyuan Ma. She earned a B.S. in Mathematics from the University of California, Irvine in 2003, an M.S. in Industrial Engineering and Operations Research from the University of California, Berkeley in 2005, and an M.S. in Statistics from the University of Western Ontario in 2006. Her research interests include genetic mixture models, high-dimensional inference, measurement error, mixed models, neurodegenerative diseases, nonparametric models, semiparametric theory, measurement error, and survival analysis.

Abstract: An important goal in clinical and statistical research is estimating the distribution for clustered failure times, which have a natural intra-class dependency and are subject to censoring. We propose to handle these inherent challenges with a novel approach that does not impose restrictive modeling or distributional assumptions. Rather, using a logit transformation, we relate the distribution for clustered failure times to covariates and a random, subject specific effect such that the covariates are modeled with unknown functional forms, and the random effect is distribution-free and potentially correlated with the covariates. Over a range of time points, the model is shown to be reminiscent of an additive logistic mixed effect model. Such a structure allows us to handle censoring via pseudo-value regression and develop semiparametric techniques that completely factors out the unknown random effect. We show both theoretically and empirically that the resulting estimator is consistent for any choice of random effect distributions and for any dependency structure between the random effect and covariates. Lastly, we illustrate the method's utility in an application to the Cooperative Huntington's Observational Research Trial data, where our method provides new insights into differences between motor and cognitive impairment event times in genetically predisposed Huntington patients.

MTF 168

04/06/2015 1:00 PM - 2:00 PM

Population Genetics, Biostatistics and Bioinformatics Issues in Individualized Medicine

Nicholas J. Schork, PhD

Adjunct Professor of Psychiatry and Biostatistics, University of California, San Diego

Biography: Nicholas J. Schork is a Professor and Director of Human Biology at the J. Craig Venter Institute (JCVI) and the Head of Integrated Genomics at Human Longevity, Inc. (HLI). He is also an adjunct Professor of Psychiatry and Family and Preventive Medicine (Division of Biostatistics) at the University of California, San Diego (UCSD). Prior to joining JCVI, Dr. Schork was, from 2007-2013, a Professor, Molecular and Experimental Medicine, at The Scripps Research Institute (TSRI), Director of Biostatistics and Bioinformatics at the Scripps Translational Science Institute (STSI), and Director of Research at Scripps Genomic Medicine, a division of Scripps Health. From 2001-2007 Dr. Schork was a Professor of Biostatistics and Psychiatry, and Co-Director of the Center for Human Genetics and Genomics, at UCSD. From 1994-2000, he was an Associate Professor of Epidemiology and Biostatistics at Case Western Reserve University in Cleveland, Ohio, and an Adjunct Associate Professor of Biostatistics at Harvard University. During 1999 and 2000, Dr. Schork took a sponsored leave of absence from CWRU to conduct research as the Vice President of Statistical Genomics at the French biotechnology company, Genset, where he helped guide efforts to construct the first high-density map of the human genome.

Dr. Schork’s interests and expertise are in quantitative human genetics and integrated approaches to complex biological and medical problems, especially the design and implementation of methodologies to dissect the determinants of complex traits and diseases. He has published over 450 scientific articles and book chapters on the analysis of complex, multifactorial traits and diseases. A member of several scientific journal editorial boards, Dr. Schork is a frequent participant in U.S. National Institutes of Health-related steering committees and review boards, and has founded or served on the advisory boards of ten companies. In addition, he is currently director of the quantitative components of a number of national research consortia, including the NIA-sponsored Longevity Consortium and the NIMH-sponsored Bipolar Consortium. Dr. Schork earned the B.A. in Philosophy, M.A. in Philosophy, M.A. in Statistics, and Ph.D. in Epidemiology, all from the University of Michigan in Ann Arbor. .

Abstract: There is a great deal of attention surrounding ‘individualized,’ ‘personalized,’ and/or ‘precision’ medicine. Much of this attention has been motivated by technological advances in genetic and related molecular assays that have provided researchers with an unprecedented ability to identify and characterize the potentially unique determinants of an individual’s disease susceptibility. However, as promising as these technologies are, their routine use in clinical settings will be hampered until they are appropriately vetted. In this talk, a number of studies are described that consider the use of genomic profiling to further efforts in individualized medicine. Focus is on the very thorny issues these studies have been designed to address, including dealing with patient genetic background heterogeneity, matching drugs to tumor genomic profiles in real-time clinical trial settings, exploring the utility of therapeutic interventions thought to be appropriate for an individual patient based on genomic profiling and monitoring genetically susceptible individuals. There is no doubt that individualized medicine will have a positive impact on health care, but only after some of the challenges it brings have been exposed and dealt with appropriately.

MET 141

03/04/2015 1:00 PM - 2:00 PM

Statistical Issues in the Analysis of Data from RNA-Seq Experiments

David Rocke, PhD

Division of Biostatistics, Department of Public Health Sciences, UC Davis

Abstract: RNA-Seq data are increasingly used for whole-genome differential mRNA expression analysis in lieu of gene expression arrays such as those from Affymetrix and Illumina. We review commonly used methods for this type of analysis, including DESeq, edgeR, and Cuffdiff2, by placing them within a common framework that allows comparisons of components of the methods as well as of the overall results. We also review a number of recent studies comparing these methods in terms of false positives and sensitivity, and add additional results of our own. We show that none of the existing methods is fully satisfactory, with most identifying large numbers of genes as differentially expressed even when there are none, but some will lead to better, more reliable results than others. This area is still early in its intellectual development and is changing rapidly, so there are substantial contributions that can be made.

MET 141

02/04/2015 1:00 PM - 2:00 PM

A multistate model for time to cancer recurrence and death incorporating a cured fraction

Jeremy Taylor, PhD

Department of Biostatistics, Columbia University, University of Michigan

Biography: Jeremy M G Taylor PhD is the Pharmacia Professor of Biostatistics at the University of Michigan. He obtained a Bachelor’s degree in Mathematics and a Diploma in Statistics from Cambridge University and a PhD in Statistics from University of California Berkeley. He was a faculty member in the Department of Biostatistics and the Department of Radiation Oncology at UCLA from 1983 to 1998. He is currently a faculty member in the Department of Biostatistics, the Department of Radiation Oncology and the Department of Computational Medicine and Bioinformatics and the Director of the Center for Cancer Biostatistics at the University of Michigan. He is the winner of the Michael Fry award from the Radiation Research Society and the Mortimer Spiegelman award from the American Public Health Association. He is a former Chair of the Biometrics section of the American Statistical Association and a Fellow of the ASA. He is the former chair of the Biostatistical Methods and Research Design grant review committee for the National Institutes of Health. He is currently one of the coordinating editors of Biometrics. He has 300 publications and research interests in longitudinal and survival data, cure models, methods for missing data, biomarkers, surrogate and auxiliary variables. He has worked extensively in AIDS research but currently mainly focuses on cancer research.

Abstract: Motivated by data from multiple randomized trials of colon cancer, we model time-to-cancer-recurrence and time-to-death using a multi-state model. We incorporate a latent cured state into the model to allow for subjects who will never recur. Parametric models that assume Weibull hazards and include baseline covariates are used. Information from the multiple trials are included using a hierarchical model. Bayesian estimation methods are used. The model is used to assess whether there is improved efficiency in the analysis of the effect of treatment on time-to-death in each trial by using the information provided by earlier cancer recurrence. For subjects who are censored for death, multiple imputation is used to impute death times, where the imputation distribution is derived from the estimated model. Gains in efficiency are possible, although sometimes modest, using the extra information provided by the recurrence time.

MET 204

1/30/2015 1:00 PM - 2:00 PM


Giovanni Motta, PhD

Assistant Professor, Department of Statistics, Columbia University

Abstract: Epilepsy patients who are not able to adequately control their seizures with medications are sometimes treated with a neurosurgical procedure. The goal of this procedure is to remove the abnormal “epileptic” tissue causing seizures, and spare the normal tissue that is critical for brain function. However, current brain mapping technology has limited accuracy for mapping epileptic and normal brain tissue. This is especially problematic in the treatment of patients whose seizures arise from neocortex. To address these problems, we have been developing an experimental optical brain imaging technique for spatially mapping epileptic and normal cortical tissue. Better methods for the statistical analysis of the spatiotemporal optical imaging data are necessary for further development of this technique into a practical and reliable clinical tool.

In this paper we introduce a novel flexible tool, based on spatiotemporal statistical modeling of Optical Imaging, that allows for source localization of the epilepsy regions. The final goal is clustering (dimension reduction) of the pixels in regions, in order to localize the epilepsy regions for the craniectomy. We identify the spatial clusters of the pixels according to the temporal non-stationarity of the observed time series – rather than using spatial information. In a second step, we use non-parametric bootstrap and non-parametric density estimation to obtain the probabilities that a given pixel belongs to each of the clustered regions on the neocortex.

The advantage of our approach compared with previous approaches is twofold. Firstly, we use a non-parametric approach, rather than the (more restrictive) parametric or polynomial-based specification. Secondly, we provide a statistical method that is able to identify the clusters in a data-driven way, rather than the (sometimes arbitrary) ad-hoc currently used approaches.

To demonstrate how our method might be used for intra-operative neurosurgical mapping, we provide an application of the technique to optical data acquired from a single human subject during direct electrical stimulation of the cortex.

MTF 168

12/3/2014 1:00 PM - 2:00 PM

A statistical approach to detecting patterns in behavioral event sequences

Hal S. Stern, PhD

Professor of Statistics and Ted and Janice Smith Family Foundation Dean

Biography: Hal Stern is professor of statistics and dean of the Donald Bren School of Information and Computer Sciences at the University of California, Irvine.  Stern came to UC Irvine in 2002 as the founding chair of the Department of Statistics.  The Department now has 9 faculty and more than 40 graduate students in its MS/PhD programs.  In 2010 he was named Ted and Janice Smith Family Foundation Dean of the Bren School.  Prior to coming to UC Irvine he had faculty appointments at Iowa State and Harvard.

Within statistics he is known for his research work in Bayesian statistical methodology and model assessment techniques.  He is a co-author of the highly-regarded graduate level statistics text Bayesian Data Analysis.  Current areas of interest include applications of statistical methods in psychiatry and human behavior, atmospheric sciences, and forensic science.  He is a Fellow of the American Statistical Association and the Institute for Mathematical Statistics and has served on several expert committees for the National Academies.  Stern received his B.S. degree in Mathematics from the Massachusetts Institute of Technology in 1981 and the M.S. and Ph.D. degrees in Statistics from Stanford University in 1985 and 1987, respectively

Abstract: The identification of recurring patterns within a sequence of events is an important task in behavioral research.  We consider a general probabilistic framework for identifying patterns by distinguishing between events that belong to a pattern and events that occur as part of background processes. Using this framework we develop an inference procedure to detect sequences present in observed data and estimate the parameters governing these sequences. The model is applied to data from a study of the impact of fragmented and unpredictable maternal behavior on cognitive development of adolescents.

MTF 168

11/5/2014 1:00 PM - 2:00 PM

Tree derived Survival risk groups in differentiating care for glioma patients

Annette Molinaro, MA, PhD

Associate Professor in Residence, Department of Epidemiology and Biostatistics, Department of Neurological Surgery, Hellen Diller Family Comprehensive Cancer Center, University of California, San Francisco

Abstract: We recently developed partDSA, a multivariate method that, similarly to CART, utilizes loss functions to select and partition predictor variables to build a tree-like regression model for a given outcome. However, unlike CART, partDSA permits both 'and' and 'or' conjunctions of predictors, elucidating interactions between variables as well as their independent contributions. partDSA thus permits tremendous flexibility in the construction of predictive models and has been shown to supersede CART in both prediction accuracy and stability. As the resulting models continue to take the form of a decision tree, partDSA also provides an ideal foundation for developing a clinician-friendly tool for accurate risk prediction and stratification.

With right-censored outcomes, partDSA currently builds estimators via either the Inverse Probability Censoring Weighted (IPCW) or Brier Score weighting schemes; see Lostritto, Strawderman and Molinaro (2012), where it is shown in numerous simulations that both proposed adaptations for partDSA perform as well, and often considerably better, than two competing tree-based methods. In this talk, various useful extensions of partDSA for right-censored outcomes are described and we show the power of the partDSA algorithm in deriving survival risk groups for glioma patient based on genomic markers.

MTF 168

10/1/2014 1:00 PM - 2:00 PM

The effect of regional deprivation on mortality avoiding compositional bias: A natural experiment

Ursula Berger, PhD

Department for Medical Informatics, Biostatistics and Epidemiology (IBE), Ludwig-Maximilians-University Munich

Abstract: We assess the effect of regional deprivation on individual mortality by making use of a natural experiment: We followed up ethnic German resettlers from Former Soviet Union countries, who were quasi-randomly distributed across the socioeconomically heterogeneous counties of Germany’s federal state North Rhine-Westphalia (NRW). This allows us to disentangle the contextual effect from compositional effects. We use data from the retrospective cohort study ‘AMOR’ on the mortality of resettlers in NRW (n=34 393). Based on the postcode of the last known residence we could link study participants to the municipalities of NRW. After a mean follow-up of 10 years, 2580 resettlers were deceased. When analyzing regional deprivation using in additive survival models, we explore the gain of more precise data on deprivation and of smaller regional entities? Our findings indicate that in terms of mortality, regional deprivation does matter.

MTF 168

9/2/2014 2:00 PM - 3:00 PM


Hernando Ombao, Ph.D.

Professor, Department of Statistics, University of California, Irvine

Biography: Dr Ombao's research interests include:

  1. Time Series Analysis
  2. Spatio-temporal modelling
  3. Statistical Learning
  4. Applications to Brain Science (fMRI, EEG, MEG, EROS)

MTF 168

6/4/2014 1:00 - 2:00 PM

A Hierarchical Model for Simultaneous Detection and Estimation in Multi-subject fMRI Studies

David Degras, Ph.D.

Assistant Professor, Statistics Department of Mathematical Sciences DePaul University College of Science and Health

Abstract: In this paper we introduce a new hierarchical model for the simultaneous detection of brain activation and estimation of the shape of the hemodynamic response in multi-subject fMRI studies. The proposed approach circumvents a major stumbling block in standard multi-subject fMRI data analysis, in that it both allows the shape of the hemodynamic response function to vary across region and subjects, while still providing a straightforward way to estimate population-level activation. An efficient estimation algorithm is presented, as is an inferential framework that not only allows for tests of activation, but also for tests for deviations from some canonical shape. The model is validated through simulations and application to a multi-subject fMRI study of thermal pain.ape. The model is validated through simulations and application to a multi-subject fMRI study of thermal pain.


5/23/2014 1:00 - 2:00 PM


Babak Shahbaba, Ph.D.

Assistant Professor, Department of Statistics and Department of Computer Science, University of California, Irvine

Biography: Dr Shahbaba's research interest is related to developing new Bayesian methods and applying them to real-world problems. He is currently focusing on the following areas:

  1. Scalable Bayesian inference (fast MCMC methods that can be applied to large datasets)
  2. Developing new models that are sufficiently flexible and provide interpretable results
  3. Incorporating appropriate priors into statistical models in order to improve their performance
  4. Applying novel statistical methods to answer research questions in genetics, neuroscience, and cancer studies

MTF 168

5/7/2014 1:00 - 2:00 PM

Successive normalization/standardization of rectangular arrays

Richard Olshen, Ph.D.

Professor and Chief Division of Biostatistics Department of Health Research and Policy Stanford University School of Medicine

When each subject in a study provides a vector of numbers/features for analysis, and one wants to standardize, then for each coordinate of the resulting rectangular array one may subtract the mean by subject and divide by the standard deviation by subject. Each feature then has mean 0 and standard deviation 1. Data from expression arrays and protein arrays often come as such rectangular arrays, where typically column denotes “subject” and the other some measure of “gene.” When analyzing these data one may ask that subjects and genes “be on the same footing.” Thus, there may be a need to standardize across rows and columns of the matrix. We investigate the convergence of a successive approach to standardization, which we learned from colleague Bradley Efron. Limit matrices exist on a Borel set of full measure; these limits have row and column means 0, row and column standard deviations 1. We study implementation on simulated data and data that arose in cardiology. The procedure can be shown not to work with simultaneous standardization. Results make contact with previous work on large deviations of Lipschitz functions of Gaussian vectors and with von Neumann’s algorithm for the distance between two closed, convex subsets of a Hilbert space. New insights regarding inference are enabled. Efforts are joint with colleague Bala Rajaratnam and have been helped by conversations with many others.

Leichtag 205

5/2/2014 1:00 - 2:00 PM

Standardized statistical framework for comparison of biomarkers: techniques from the Alzheimer’s Disease Neuroimaging Initiative

Danielle Harvey, Ph.D.

Associate Professor, Division of Biostatistics, Department of Public Health, University of California, Davis

Alzheimer’s disease (AD) is widespread in the elderly population and clinical trials are ongoing, focused on elderly individuals with AD or at apparent risk for AD, to identify drugs that will help with this disease. Well-chosen biomarkers have the potential to increase the efficiency of clinical trials and drug discovery and should show good precision as well as clinical validity. We propose measures that operationalize the criteria of interest and describe a general family of statistical techniques that can be used for inference-based comparisons of marker performance. The methods are applied to regional volumetric and cortical thickness measures quantified from repeat structural magnetic resonance imaging (MRI) over time of individuals with mild dementia and mild cognitive impairment enrolled in the Alzheimer’s Disease Neuroimaging Initiative. The methodology presented provides a standardized framework for comparison of biomarkers and will help in the search for the most promising biomarkers.

Biography: Dr Harvey received her BA cum laude in mathematics from Pomona College and her PhD in statistics from University of Chicago. Her methodological interests span survival analysis, correlated event times, informative censoring, repeated measures, computational methods, and high-dimensional data as in MRI or PET scans. Collaborative research interests include work on Alzheimer's, cancer, end-of-life care, dosing errors, and health services and public health issues.

MTF 168

4/2/2014 1:00 - 2:00 PM

Local False Discovery Rate and Effect Size Estimation for Highly Polygenic Complex Traits

Wesley K. Thompson, Ph.D.

Assistant Professor In-Residence, Department of Psychiatry, University of California, San Diego

Complex traits and disorders such as schizophrenia are multifactorial and associated with the effects of multiple genes in combination with environmental factors. These disorders often cluster in families, have no clear-cut pattern of inheritance, and have a high fraction of phenotypic variance attributable to genetic variance (high heritability). It is becoming increasingly clear that many genes influence most complex traits and disorders. In such a scenario with a very high number of risk genes (‘polygenic’), each gene has a tiny effect. This makes it difficult to determine an individual’s risk, and to identify disease mechanisms that can be used for development of new effective treatments.

Genome-wide association studies (GWAS) have identified many trait-associated single nucleotide polymorphisms (SNPs), but so far these explain only small portions of the heritability of complex disorders. This “missing heritability” has been attributed to a number of potential causes, including lack of typing of rare variants. However, it has been shown that a large proportion of the missing heritability is available within GWAS data when associations of SNPs are examined in aggregate. This implies the existence of numerous common variants with small genetic (‘polygenic’) effects. These effects cannot be reliably detected with traditional GWAS statistical methods given current sample sizes. Thus, there is a need for innovative statistical approaches to identify polygenetic effects and reduce the proportion of ‘missing heritability’.

In this talk I describe novel statistical tools that enhance gene discovery, improve replication rates of discovered risk gene variants, and improve estimation of polygenic risk scores. The basic framework relies on extensions of a Bayesian two-group mixture model (Efron, 2010) that assumes a large proportion of loci are either null (unassociated with the phenotype of interest) or have very small effects, but that a small proportion have larger (though still small) effect sizes. These models can incorporate a priori information regarding functional roles of SNPs or pleiotropic effects with multiple phenotypes. We demonstrate these methods on GWAS data from large Crohn's disease and Schizophrenia meta-analyses.

Biography: Dr. Thompson earned his Ph.D. in Statistics from Rutgers University in 2003, and his dissertation studies focused on the development of a Bayesian model for sparse functional data. He was appointed Assistant Professor of Statistics and Psychiatry at the University of Pittsburgh in 2005, and he collaborated with several senior investigators on clinical research studies on depression, sleep and sleep disorders, and physical illness across the lifespan. Dr. Thompson joined the UCSD Department in 2008 and he serves as the Director of Biostatistics at the Stein Institute for Research on Aging. Dr. Thompson’s research interests center on the adaptation and application of statistical models of a dynamic covariation of multiple functional processes in order to identify potentially causal relationships between brain function, depression, and physical health. This work is supported by a NIH Career Development Award that Dr. Thompson received in 2006. He is also interested in developing statistical models that may explain the underlying mechanisms of healthy cognitive aging.

MTF 168

3/5/2014 1:00 - 2:00 PM


Donald B. Rubin, Ph.D.

John L. Loeb Professor of Statistics, Department of Statistics, Havard University

Biography: Donald B. Rubin is John L. Loeb Professor of Statistics, Harvard University, where he has been professor since 1983, and Department Chair for 13 of those years. He has been elected to be a Fellow/Member/Honorary Member/Research Fellow of: the Woodrow Wilson Society, John Simon Guggenheim Memorial Foundation, IZA, IAB, Alexander von Humboldt Foundation, American Statistical Association, Institute of Mathematical Statistics, International Statistical Institute, American Association for the Advancement of Science, American Academy of Arts and Sciences, European Association of Methodology, British Academy, and the U.S. National Academy of Sciences. He has authored/coauthored nearly 400 publications (including ten books), has four joint patents, and he has made important contributions to statistical theory and methodology, particularly in causal inference, design and analysis of experiments and sample surveys, treatment of missing data, and Bayesian data analysis. Among his other awards and honors, Professor Rubin has received the Samuel S. Wilks Medal from the American Statistical Association, the Parzen Prize for Statistical Innovation, the Fisher Lectureship and the George W. Snedecor Award of the Committee of Presidents of Statistical Societies. He was named Statistician of the Year, American Statistical Association, Boston and Chicago Chapters. He has served on the editorial boards of many journals, including: Journal of Educational Statistics, Journal of American Statistical Association, Biometrika, Survey Methodology, and Statistica Sinica. Professor Rubin has been, for many years, one of the most highly cited authors in mathematics in the world (ISI Science Watch), as well as in economics (Highly Cited Economists), with approximately 140,000 citations, with nearly 30,000 so far in 2012 and 2013 (according to Google Scholar). For many decades he has given keynote lectures and short courses in the Americas, Europe, Australia and Asia. He has also received honorary doctorate degrees from Otto Friedrich University, Bamberg, Germany and the University of Ljubljana, Ljubljana, Slovenia, and held the Honorary Belle van Zuylen Chair in the Department of Methodology and Statistics at the University of Utrecht, the Netherlands in 2012 -2013.

APM 6402, Halkin Seminar Room

2/21/2014 3:30 - 4:30 PM

Alternative Tumor Measurement-based Phase II Clinical Trial Endpoints for Predicting Overall Survival (OS), using the RECIST 1.1 data warehouse

Ming-Wen An, PhD

Assistant Professor, Department of Mathematics, Vassar College, Poughkeepsie, NY

Biography: Ming-Wen An received her B.A. in mathematics from Carleton College and her Ph.D. in biostatistics from the Johns Hopkins Bloomberg School of Public Health. One of her research interests is in issues of study design for addressing missing data due to "loss to follow-up" (with applications to evaluating HIV treatment programs in Africa). She is also interested in cancer clinical trial methodology, specifically designs for validating biomarkers used in targeted therapy and identification of alternative endpoints for Phase II trials.

MTF 168

2/5/2014 1:00 - 2:00 PM

Gaussian Oracle Inequalities for Structured Selection in Non-Parametric Cox Model

Jelena Bradic, PhD

Assistant Professor, Department of Mathematics, University of California, San Diego

Abstract: In this paper, we study sparse structured estimation in the context of the high-dimensional non-parametric Cox proportional hazard's model with a very general family of group penalties. We study the finite sample oracle risk bounds of such regularized estimator and develop new techniques to do so. Unlike the existing literature, we exemplify differences between bounded and possibly unbounded non-parametric covariate effects. In particular, we show that unbounded effects can lead to larger prediction bounds, compared to simple linear models, in situations where the true parameter is not necessarily sparse. Moreover, we propose a sequence of sparse non-convex group regularizations. Interestingly, we identify a specific regime of the proposed non-convex estimation that allows the group SCAD penalty and the group Lasso penalty to have equivalent prediction errors. Oracle prediction bounds are also discussed for the group $l_0$ penalty. Theoretical results for hierarchical and smoothed estimation in the non-parametric Cox model are also discussed as two examples of the proposed general framework.

Biography: Dr. Bradic received her Ph.D. in Operations Research and Financial Engineering from Princeton in Spring 2011 with a specialization in Statistics and Applied Probability under the direction of Jianqing Fan. Her research is in high dimensional statistics, stochastic optimization, asymptotic theory, robust statistics, functional genomics and biostatistics.

MET 120.27

1/29/2014 1:00 - 2:00 PM

The Markov Chinese Restaurant Process: A Non-parametric Bayesian Cluster Memory Model for Longitudinal Data

Robert Weiss, PhD

Professor, Department of Biostatistics, University of California, Los Angeles

Abstract: We develop a Dirichlet process mixture (DPM) model extension for regularly spaced longitudinal data. In longitudinal data, observations are both subject specific and a function of time. We account for both dependence between sampling densities across time and dependence in observations across time within the same subject. In the cluster memory Dirichlet process mixture (cmDPM) model, we use the inherent clustering properties of the DPM model to carry information from one time point to the next. Observations at baseline are modeled with a DPM. Cluster assignments at future time points depend on the previous assignment. Subjects may retain their cluster membership from the previous time point with nonzero probability. After baseline, given the previous time point, subjects are no longer exchangeable and their observed values depend on their previous clustering history. Clusters that are retained over time evolve through a time dependent process. There are several ways to look at the process including as a dynamic Markov Chinese Restaurant Process. We apply the cmDPM model to model annual tuberculosis (TB) incidence rates across 197 countries in the world from 1990-2010 and examine how the annual distribution of TB incidence rates has changed over time.

This is joint work with Yuda Zhu of Genentech.

MET 145

11/06/2013 1:00 - 2:00 PM

Varying index coefficient models for nonlinear interactions

Shujie Ma, PhD

Professor, University of California, Riverside

It has been a long history of utilizing interactions in regression analysis to investigate interactive effects of covariates on response variables. In this paper we aim to address two kinds of new challenges resulted from the inclusion of such high-order effects in the regression model for complex data. The first kind arises from a situation where interaction effects of individual covariates are weak but those of combined covariates are strong, and the other kind pertains to the presence of nonlinear interactive effects. Generalizing the single index coefficient regression model, we propose a new class of semiparametric models with varying index coefficients, which enables us to model and assess nonlinear interaction effects between grouped covariates on the response variable. As a result, most of the existing semiparametric regression models are special cases of our proposed models. We develop a numerically stable and computationally fast estimation procedure utilizing both profile least squares method and local fitting. We establish both estimation consistency and asymptotic normality for the proposed estimators of index coefficients as well as the oracle property for the nonparametric function estimator. In addition, a generalized likelihood ratio test is provided to test for the existence of interaction effects or the existence of nonlinear interaction effects. Our models and estimation methods are illustrated by both simulation studies and an analysis of body fat dataset.

APM 7421

10/08/2013 2:00 - 3:00 PM

Permutation Tests 101

Joe Romano, PhD

Professor, Stanford University

APM 6402

10/04/2013 3:00 - 4:00 PM

Real-Time Prediction in Clinical Trials: A Statistical History of REMATCH

Daniel F. Heitjan, PhD

Professor, Department of Biostatistics and Epidemiology
Perelman School of Medicine
University of Pennsylvania, Philadelphia, PA

Randomized clinical trials often include one or more planned interim analyses, during which an external monitoring committee reviews the accumulated data and determines whether it is scientifically and ethically appropriate for the study to continue. With survival-time endpoints, it is often desirable to schedule the interim analyses at the times of occurrence of specified landmark events, such as the 50th event, the 100th event, and so on. Because the timing of such events is random, and the interim analyses impose considerable logistical burdens, it is worthwhile to predict the event times as accurately as possible. Prediction methods available prior to 2001 used data only from previous trials, which are often of questionable relevance to the trial for which one wishes to make predictions. With modern data management systems it is often feasible to use data from the trial itself to make these predictions, rendering them far more reliable. This talk will describe work that some colleagues and students and I have done in this area. I will set the methodologic development in the context of the trial that motivated our work: REMATCH, a randomized clinical trial of a heart assist device that ran from 1998 to 2001 and was considered one of the most rigorous and expensive device trials ever conducted.

MET 204

09/09/2013 1:00 - 2:00 PM

Statistical and Geographical Methods Exploring HIV-Risk along the Mexico-U.S. Border

Tommi Gaines, DrPH

Division of Global Public Health
Department of Medicine

The Mexico-U.S. border region is home to an evolving HIV epidemic among vulnerable groups such as injection drug users and female sex workers. Features of one’s environment have been associated with individual health and therefore our objective is to highlight statistical and geographical techniques that examine HIV and risk-related behaviors. We describe the use of geographic information systems (GIS) data to map the location of sex work venues from epidemiologic studies conducted in Tijuana, Mexico and the application of statistical models to empirically assess the role of geography in shaping HIV and other sexually transmitted infections. We discuss the importance of combining statistical methods with GIS data to inform prevention and support services.

MET 215

06/19/2013 1:00 - 2:00 PM

On Hypothesis Testing and Interval Estimation for Monotone Dose-Response Means with a Control Mean

Lin Liu, PhD

Division of Biostatistics and Bioinformatics
Department of Family Medicine and Public Health

In dose-response studies, one of the most important issues is the identification of minimum effective dose (MED), where the MED is defined as the lowest dose such that the mean response is better than the mean response of a zero-dose control by a clinically significant difference. Dose-response curves are sometimes monotonic in nature. A union-intersection type of likelihood ratio test is proposed. One-sided lower confidence bounds can be inverted from the test to detect the differences between the dose-response means and a control mean. The evaluation of the lower confidence bounds is a concave programming problem subject to homogeneous linear inequality constraints. An efficient computing algorithm is proposed. A real data example from a dose-response study is used to illustrate the method.

MET 204

06/05/2013 1:00 - 2:00 PM

Statistical and Bioinformatics Challenges in Systems Biology Research for Influenza Infection

Jaroslaw Harezlak, PhD

Assistant Professor, Department of Biostatistics
Fairbanks School of Public Health and School of Medicine
Indiana University, Indianapolis, IN

Collection of functional data has vastly grown in the past decade, including functional data collected longitudinally. For example, in the HIV Neuroimaging Consortium (HIVNC) study, metabolite spectra were obtained using magnetic resonance spectroscopy (MRS) from multiple brain regions at a number of study time points. Analysis of such data usually follows a two-step procedure: (1) metabolite concentration extraction and (2) association study of extracted features and outcome of interest.

Our approach does not rely on this frequently unreliable feature extraction. Instead, it incorporates prior scientific knowledge to estimate regression function associating the whole functional profile with the outcome without explicitly extracting the feature characteristics. Specifically, we propose a method for functional linear model estimation using partially empirical eigenvectors for regression (PEER) in the longitudinal data setting. Our method allows the regression function to vary across both time and space. We derive the estimator's statistical properties and discuss their connections to the generalized singular value decomposition (GSVD). The results of the simulation studies and an application to the analysis of HIV patients' neurocognitive impairment as a function of the metabolite profiles are presented.

Joint work with Madan G. Kundu and Timothy W. Randolph

MET 204

05/08/2013 1:00 - 2:00 PM

Statistical and Bioinformatics Challenges in Systems Biology Research for Influenza Infection

Hulin Wu, PhD

Dean’s Professor, Department of Biostatistics and Computational Biology
Director, Center for Integrative Bioinformatics and Experimental Mathematics
University of Rochester School of Medicine and Dentistry

Many systems in engineering and physics can be represented by differential equations, which can be derived from well-established physics laws and theories. However, currently no laws or theories exist to deduce exact quantitative relationships and interactions among the huge amount of elements at different levels in a biological system. It is unclear whether the biological systems follow a mathematical representation such as differential equations, similar to that for a man-made physics or engineering system. Fortunately, recent advances in cutting-edge biomedical technologies allow us to generate intensive high-throughput data to gain insights into biological systems. It is badly needed to develop statistical methods and bioinformatics approaches to test whether a biological system follows a mathematical representation based on experimental data so that quantitative predictions can be made for biomedical interventions in a biological system. In this talk, I will present and discuss how to construct data-driven differential equations (ODE) to describe biological systems, in particular for dynamic gene regulatory network systems. We propose to combine the high-dimensional variable selection approaches and ODE model estimation methods to construct the high-dimensional ODE models based on experimental data. We apply the proposed approaches to study how our immune system responds to influenza infections and vaccination based on the time course high-throughput experimental data.

MET 215

05/01/2013 2:30 - 3:30 PM

Predicting Health Care Costs of Individual Patients

Andrew Zhou, PhD

Professor, Department of Biostatistics, University of Washington Director
Research Career Scientist, Biostatistics Unit, VA Puget Sound Health Care System

The rising cost of health care is one of the most important problems facing the United States. Accurately predicting such costs is an important first step in addressing this problem. However, due to some special distributional features of health care costs, including high skewness, presence of excessive zero values, and heteroscedasticity, it is difficult to obtain an accurate prediction of future health care costs of patients.

In this talk, I will describe some new models for using covariates to predict the future health care costs of patients. These new models include: (1) a parametric heteroscedastic transformation model, (2) a semi-parametric two-part heteroscedastic transformation model, (3) a quantile regression model, (4) a non-parametric heteroscedastic transformation regression model, and (4) a semi-parametric two-part mixed-effects heteroscedastic transformation model.

MET 215

04/15/2013 1:00 - 2:00 PM

Topics in Biostatistics: Trial Design (n=20, p=2), Prognostic Modeling (n=3,000, p=20), and Genomic Data Analysis (n=2, p=3,000)

Karen Messer, PhD

Professor, Family Medicine and Public Health
Director, Moores UCSD Cancer Center Biostatistics/Bioinformatics shared resource

As a biostatistician, one aims to support high-quality inference from experimental or observational data across a wide variety of scientific settings. To this sometimes bewildering array, the discipline of statistics brings a unifying set of tools and objectives which can help sort out what one knows with high confidence, with low confidence, and most especially, not at all. Although the approaches to sound inference may differ with the number of subjects (n- big or small) and the number of variables (p- small or big), the principles of control of Type I error, modeling sources of bias and variation, and quantifying the limits of statistical power provide a helpful framework for a variety of problems. In this talk, I will give examples of approaches to statistical inference from three areas of my work in cancer biostatistics: early phase trial design (small n, small p), prognostic modeling for survival (big n, medium p), and analysis of next generation sequencing data (small n, big p). In the first two topics, some recent approaches to older problems will be presented and in the third, traditional tools will be applied to modern data.

MET 223

03/08/2013 1:00 - 2:00 PM

The breakage fusion bridge, Chromothripsis and other exotic structural variations: combinatorics and cancer genomics

Vineet Bafna, PhD

Professor in the Department of Computer Science at UCSD and in the Bioinformatics PhD program. His research area is Bioinformatics, with a focus on Genomics and Proteomics.

Cancer genomes are marked by genomic instability and massive rearrangements. Recently, many exotic mechanisms have been proposed as mechanistic explanations for these rearrangements. For example, the breakage-fusion-bridge (BFB) mechanism, proposed over seven decades ago, has seen renewed interest as a source of genomic variability and gene amplification in cancer. Here, we formally model and analyze the BFB mechanism, the first rigorous formulation of the mechanism. Using this model, we show that BFB can achieve a surprisingly broad range of amplification patterns, and describe efficient combinatorial algorithms to characterize patterns consistent with BFB. An extensive analysis of simulated, cell-line, and primary tumor data reveals the existence of BFB. Our results also suggest that BFB may be hard to detect under heterogeneity and polyploidy.

As a second example, the model of chromothripsis--extensive shattering followed by regrouping of small parts of a chromosome-- has been proposed to explain the extensive rearrangements seen in some tumors. Time remaining, we will critique this model using 3 different lines of evidence.

(joint work with Shay Zakov, and Marcus Kinsella).

Medical Teaching Facility, Room 175, UCSD School of Medicine

03/06/2013 1:00 - 2:00 PM

Designing and monitoring clinical trials with survival endpoints: statistical issues, proposals, and opportunities

Daniel Gillen, PhD

Associate Professor, Department of Statistics, University of California, Irvine

Researchers frequently elect to evaluate new therapies on the basis of patient survival. For example, clinicians might consider five-year survival when investigating drugs developed for use in childhood cancer, or 28-day survival when investigating the treatment of sepsis in patients suffering traumatic injury. Both of these examples focus on patient responses over a fixed period of time. However, for ethical reasons it is common for data to be periodically analyzed for early indications of efficacy, futility, or harm. In the case of censored survival data, inference is typically based upon a semiparametric model assuming a time-invariant treatment effect and standard group sequential methodology is used to generate multiple criteria for guiding the decision of whether a trial should be stopped early given the observed data. However, it is often the case that a given treatment might have a delayed effect within individuals or that the effect of treatment might dissipate over time. Special issues arise in such settings, mostly due to the dependence of results on the censoring distribution observed in the trial. In this talk, we discuss general issues associated with the sequential testing of a survival endpoint. Specific attention is given to the uncertainty of future observations under a potentially time-varying treatment effect. In this case we propose a method of imputation of future treatment effects based on random walks, which assumes minimally informative Bayesian prior distributions on the smoothness of survival of each comparison group. Imputation of future survival differences is carried out using standard Bayesian predictive distributions, thereby allowing for quantification of uncertainty in future treatment differences.

Commons Corner, 2nd Floor, UCSD Moores Cancer Center

02/06/2013 1:00 - 2:00 PM

Quantitative challenges in advancing the HIV prevention research agenda

Victor DeGruttola, Sc.D.

Professor and Chair, Department of Biostatistics, Harvard School of Public Health

The UC San Diego Center for AIDS Research and AIDS Research Institute are pleased to present Victor DeGruttola, Sc.D.. Dr. DeGruttola will discuss the quantitative challenges in advancing the HIV prevention research agenda.

Leichtag Auditorium

01/13/2013 4:00 - 5:00 PM

Overview of Agreement Statistics for Continuous, Binary, and Ordinal Data

Lawrence Lin, PhD

Dr. Lawrence I. Lin has recently retired after 33 years of distinguished tenure at Baxter International Inc. He is a Principal Consultant at JBS Consulting Services. He is an Adjunct Professor in the Department of Mathematics, Statistics, and Computer Science, University of Illinois at Chicago. Dr. Lin is a Fellow of the American Statistical Association, and an elected member of the International Statistical Institute. He has served on as a referee of many international journals.


This will be a general overview presentation with practical examples and without much statistical formulas. We will introduce the concepts of un-scaled and scaled agreement statistics based on the basic case between two raters with paired samples for continuous, binary, and ordinal data. We will then progress into more complex cases when we have multiple raters and each rater has multiple readings per sample. Here, we can assess intra-rater and inter-rater agreement, compare inter-rater deviation to intra-rater deviation, and compare precision of a rater against another. We will explore the meaning of the two-stage criteria presented in the FDA guidance UCM070244: Statistical Approaches to Establishing Bioequivalence. The content is largely based on the materials presented in the newly published book by Springer, entitled “Statistical Tools for Assessing Agreement”.

Leichtag Building, Room 205

11/14/2012 1:00 - 2:00 PM

Some problems in the analysis of high-dimensional models

Anthony Gamst, Ph.D

Division of Biostatistics and Bioinformatics, Department of Family Medicine and Public Health, UCSD

Dr. Gamst is a Professor in the Division of Biostatistics and Bioinformatics, Department of Family Medicine and Public Health at UCSD. He is the author of over 90 papers in various areas of biostatistical applications and methodology, with predilection in imaging analysis and Alzheimer's disease.


Models with large numbers of nuisance parameters are common in modern statistics, having applications in laboratory medicine, genomics, clinical trials, medical imaging, epidemiology, and many other areas. Classical techniques, including Bayes and Maximum Likelihood, tend to produce sub-optimal or even inconsistent estimates of the parameters of interest in these models, when naively applied, while approximately unbiased estimating equations work rather generally. We study several such models, identify the sources of bias and spurious correlation which lead to inconsistency or sub-optimality, and compute the minimal smoothness required for the existence of root-n consistent (and efficient) parameter estimates. We also examine simultaneous estimation of nuisance parameters and parameters of interest. The results of the study are related to every-day practice, particularly to the fitting of regression models with many predictors, and some heuristics are given.

MTF 175

04/18/2012 1:00 - 2:00 PM

Integrated Statistical Methodology for the Analysis of High-Dimensional Data with Applications to Translational Cancer Research

Kim-Anh Do, Ph.D

Department of Biostatistics, The University of Texas MD Anderson Cancer Center, Houston


Early detection is critical in disease control and prevention. The long term translational research goal is that if drugs can be targeted to specific tissues in the body, then dosage can be altered to achieve the desired effect while minimizing side effects such as toxicity. Motivated by specific problems involving high throughput data in the form of phage peptides, we have developed nonparametric and semiparametric mixture models for real-time analysis in the context of correlated phage experiments. Our main focus is to address the multiplicity issue automatically by incorporating a false discovery rate or utility function. We will highlight direct applications of both frequentist and Bayesian methods to cancer research challenges that address our long term translational goal. Specifically, the developed statistical methodology can assist in isolating ligand peptides and identify their corresponding tissue-specific receptors in rodent models and in patients, including discovery and validation of a ligand-receptor tumor targeting system in human metastatic prostate cancer.

MET-MedEd 120.27 (MedEd Dean's Conference Room, new Telemedicine building)

04/06/2012 3:00 - 4:00 PM

Residual Life; A Useful Summary Measure for Survival Data?

Jong-Hyeon Jeong, PhD

Department of Biostatistics, University of Pittsburgh


The hazard function is a popular summary measure of time-to-event or survival data from medical studies. However, translation of the study results based on the hazard function might not be straightforward for the stakeholders like patients and physicians. Therefore, consideration of the remaining life years to events of interest might be more useful. In time-to-event data, the issue of competing risks is often encountered, whenever the events of interest are precluded from being observed, due to some competing events. In this talk, statistical methods that recently have been developed to infer quantile residual life under competing risks will be presented. Some issues to be overcome for further generalization of the proposed methods will be also discussed. The proposed methods will be illustrated with a real dataset from a phase III clinical study on breast cancer with a long-term follow-up of more than 30 years.

BSB Dean’s Conference Room, UCSD SOM Campus

03/23/2012 11:00 AM - 12:00 PM

Bayesian Survival Trees for Clustered Observations, Applied to Tooth Prognosis.

Richard Levine, PhD

Professor and Chair, Department of Mathematics and Statistics, San Diego State University


Tooth loss from periodontal disease or dental caries (decay) afflicts most adults over the course of their lives. Survival tree methods for correlated observations have shown potential for developing objective tooth prognosis systems, however the current technology suffers either from prohibitive computational expense or unrealistic simplifying assumptions to overcome computational demands. In this talk Bayesian tree methods are developed for correlated survival data, relying on a computationally feasible, yet flexible, frailty model with piecewise constant hazard function. Bayesian stochastic search methods, using a Laplace approximated marginal likelihood, are detailed for tree construction and posterior ensemble averaged variable importance ranking and amalgamation procedures are developed to identify indicators of tooth prognostic groups from a forest of trees. The proposed methods are used to assign each tooth from the VA Dental Longitudinal Study to one of five prognosis categories and evaluate the effects of clinical factors and genetic polymorphisms in predicting tooth loss. The prognostic rules established may be used in clinical practice to optimize tooth retention and devise periodontal treatment plans.

MTF 175, UCSD SOM Campus

03/21/2012 1:00-2:00 PM

Estimating Abundances of Retroviral Insertion Sites from DNA Fragment Length Data.

Chuck Berry, PhD

Chuck Berry is Professor Emeritus and Interim Division Chief of the Division of Biostatistics and Bioinformatics, Department of Family Medicine and Public Health at UCSD. He authored over 180 papers on the methodology and applications of Biostatistics in Medical Sciences, and he is actively involved in several research projects, with a particular emphasis on statistical genetics.


The relative abundance of retroviral insertions in a host genome is important in understanding the persistence and pathogenesis of both natural retroviral infections and retroviral gene therapy vectors. When host genomic DNA is randomly broken via sonication and then amplified, amplicons of varying lengths are produced. A likelihood function is proposed for these lengths along with a hybrid Expectation-Maximization algorithm. Patient data illustrate the method and simulations show that relative abundance can be estimated with little bias, but that variation in highly abundant sites can be large. Reference: Charles C. Berry, Nicolas A. Gillet, Anat Melamed, Niall Gormley, Charles R.M Bangham, and Frederic Bushman Estimating Abundances of Retroviral Insertion Sites from DNA Fragment Length Data. Bioinformatics. first published online January 11, 2012

Leichtag 2A05, UCSD SOM Campus

03/07/2012 1:00-2:00 PM

Misspecifying the shape of a random effects distribution: Why getting it wrong may not matter.

Charles E. McCulloch, PhD

Professor and Head, Division of Biostatistics, Dept. of Epidemiology and Biostatistics, University of California at San Francisco
Joint work with John M. Neuhaus


Statistical models that include random effects are commonly used to analyze longitudinal and correlated data, often with strong and parametric assumptions about the random effects distribution. There is marked disagreement in the literature as to whether such parametric assumptions are important or innocuous. In the context of generalized linear mixed models used to analyze clustered or longitudinal data, we examine the impact of random effects distribution misspecification on a variety of inferences, including prediction, inference about covariate effects, prediction of random effects, and estimation of random effects variances. We describe examples, theoretical calculations, and simulations to elucidate situations in which the specification is and is not important. A key conclusion is the large degree of robustness of maximum likelihood for a wide variety of commonly encountered situations.

Stein 247

06/29/2011 04:00 PM

New Findings from Terrorism Data: Dirichlet Process Random Effects Models for Latent Groups

George Casella, PhD

University of Florida


Data obtained describing terrorist events are particularly difficult to analyze, due to the many problems associated with the both the data collection process, the inherent variability in the data itself, and the usually poor level of measurement coming from observing political actors that seek not to provide reliable data on their activities. Thus, there is a need for sophisticated modeling to obtain reasonable inferences from these data. Here we develop a logistic random effects specification using a Dirichlet process to model the random effects. We first look at how such a model can best be implemented, and then we use the model to analyze terrorism data. We see that the richer Dirichlet process random effects model, as compared to a normal random effects model, is able to remove more of the underlying variability from the data, uncovering latent information that would not otherwise have been revealed.

APM 6402

05/23/2011 02:00 PM

The Journal Clubs are the second and fourth Fridays at 3 pm in Moores Cancer Center Room 3079

01/10/14: Loki Natarajan will be presenting:
George Michailidis Statistical Challenges in Biological Networks (2012) Comp Graph Stat, 21:4, 840-855.

: Rintaro Saito will be presenting: Chuang HY, Lee E, Liu YT, Lee D, Ideker T. (2007)
Network-based classification of breast cancer metastasis. Mol Syst Biol. 3:140.

: Minya Pu will be presenting: Caiyan Li and Hongzhe Li (2008)
Network-constrained regularization and variable selection for analysis of genomic data. Bioinformatics, 24(9): 1175-1182.

For more information please contact
Loki Natarajan or Emily Pittman