Summer@Duke Statistical ScienceDukeIITISI summer projects
Organizing committee: Alexander Volfovsky and Surya Tokdar (Duke University Statistical Science)
The eightten week Summer@DSS program is designed for a select group of students who are considering applying for PhD programs in statistics, biostatistics, data science, and related fields in the following fall. The program will give these students the opportunity to immerse themselves in research, interact with students at Duke and participate in seminars and working groups.
The faculty advisers for this program include standing members of the Duke Statistical Science department who have outlined several research projects, spanning applied, methodological and theoretical subjects. Students will work on assigned projects in close collaboration with the faculty advisers and current Duke graduate students. The final product of the projects will be a write up of the research conducted during the program.
Duke provides an excellent research environment. Students will have space in the Information Initiative at Duke, which cohouses several summer research programs. Throughout the summer there are professional development and research presentations in this building. The program participants will have access to graduate student mentors and Duke computing resources. Duke Statistical Science will provide logistical support for students and will help build community through fun activities and events.
Funding details: We will provide financial support for each participant up to $5,000 to help cover travel, housing, meals and incidentals. Accepted applicants will be provided assistance in obtaining the proper visa for participation in the program.
Application materials required:  Personal statement of interest: please select up to three projects from the proposed list below to work on and describe why they are of interest and your expertise as it pertains to them. Also, describe any previous research you may have conducted.
 One reference letter (to be submitted by the reference writers on mathprograms.org)
 Unofficial transcript that lists relevant STEM and related courses taken.
Applications are due on mathprograms.org by December 15, 2018 to guarantee considerations. DSS will be begin making offers by midJanuary to allow for timely visa processing.
Proposed projects (proposing faculty member in parenthesis):  Nonlinear dimensionality reduction is used routinely in machine learning, data science and statistics. This project will focus on improving upon existing methods building on a spherelets approach for manifold approximation we have developed recently but only applied in somewhat lowdimensional and toy problems. The goal is to scale up to very highdimensions and develop efficient code for routine implementation, providing a competitor to neural networkbased methods for classification, tSNE for data visualization, and LLE for manifold learning. (Dunson, Herring)
 Human brain connectomics studies relationships between human brain structure and human traits, such as intelligence and psychiatric disorders. This project focuses on developing better representations of the brain connectome improving on simple graph/network representations. (Dunson)
 Deep learning in climate research focuses on using deep neural networks and other flexible statistical and machine learning approaches for emulating complex models of atmospheric chemistry. (Dunson)
 Minimax results for adaptive confidence intervals: In multiparameter settings is possible to construct confidence intervals for individual parameters that adapt to measurable structure among the group of parameters (such as their average magnitude). The goal of this project would be to determine theoretically if there are such procedures are always better, on average across parameters, than standard nonadaptive confidence intervals. (Hoff)
 Nonparametric regression and posterior inference over distance matrices: In many applications, a quantity of interest involves a matrix containing dissimilarities between variablesthe covariance is one example. One may consider nonparametrically learning such a quantity, subject to the natural requirement that its entries satisfy the triangle inequality. For instance, metric learning focuses on the case where this is in the form of a Mahalanobis distance. It will be fruitful to develop projectionbased algorithms for estimation in this nonparametric framework. Projecting onto the constraint set the cone of distance matrices has been studied as the metric nearness problem, and can be incorporated into statistical procedures via a generalization of expectationmaximization. Immediate applications include latent space network modeling (related to Hoff's work) and largescale matching for causal inference (related to Volfovsky's work). Building on these estimation tools, we will explore Bayesian approaches (related to Dunson's work on constraint relaxation) and possibly testing frameworks. (Xu)
 Designing experiments for estimating peer influence effects in massive online networks: This project will build on earlier work for estimating direct effects on networks but will flip the scriptwe want to create a design that can ignore any direct effects of treatment while flushing out the indirect or peer effects. This problem is statistically challenging (and likely solutions will involve sampling methods for colorings on graphs) and is of extreme important in political science, sociology, online marketing and other applied venues. (Volfovsky)
 Evaluating disease spread in partially observed networks: The study of disease spread frequently requires the assumption that the paths by which the disease move are known of fixed. In practice however we are never certain that we have observed all possible interactions between individuals and moreover many individuals can be impossible to observe. In this project we will build on methods from network analysis that will be help us account for such uncertainty and provide more accurate estimates of disease spread and intervention efficacy. (Volfovsky)
 Fast Nearest Neighbor Gaussian Process Approximation with kd Trees: Surya Tokdar works on several advanced regression models, each involving estimation of one or more curves and nonlinear hypersurfaces. His work combines mathematical modeling, asymptotic analysis, stochastic computation and methodological development related to a broad range of application areas. One of his focus areas is ultra high dimensional regression smoothing. He have done some theoretical work to show that strong structural assumptions are needed for consistent estimation of an unknown function of p predictor variables when only n noisy observations are available, with n ≪ p (Yang et al., 2015). One such modeling assumption takes the unknown function f(x) to decompose as f(x) = f1(x)+· · ·+fk(x), where each additive component fj is a sparse function, i.e., it depends on only a small subset of the total p predictors. He has been working on a Bayesian estimation model under this framework which assigns each fj a sparse Gaussian process prior (Qamar and Tokdar, 2014). Computation proceeds via Markov chain Monte Carlo. He is working on making this estimation framework scalable with large n (and p might be very large). A potential solution can be found by combining this estimation method with what is known as a nearestneighbor Gaussian process approximation (Datta et al., 2016). However, such an approximation does not easily handle the case where the subset of predictors that fj depends on is not known a priori, but must be learned on the fly as part of the estimation process. To make this extension he is interested in borrowing ideas from kd tree fast nearest neighbor searches. The goal of this work will be to (1) implement a reasonably efficient kd tree based nearest neighbor GP and (2) investigate its suitability in speeding up sparse GP regression without sacrificing accuracy. If possible, we will then extend both goals to the additive sparse GP regression model suitable for ultra highdimensional regression. (Tokdar)
