Statistical Science, Duke University

Program ID: 94-SDSS [#745]
Program Title: Summer@Duke Statistical Science
Program Type: Other
Program Location: Durham, North Carolina 27708-0251, United States
Application Deadline: 2018/12/15 (posted 2018/11/14, listed until 2019/05/14)
Program Description:    

Summer@Duke Statistical Science---Duke-IIT-ISI summer projects

Organizing committee: Alexander Volfovsky and Surya Tokdar (Duke University Statistical Science)

The eight-ten week Summer@DSS program is designed for a select group of students who are considering applying for PhD programs in statistics, biostatistics, data science, and related fields in the following fall. The program will give these students the opportunity to immerse themselves in research, interact with students at Duke and participate in seminars and working groups.

The faculty advisers for this program include standing members of the Duke Statistical Science department who have outlined several research projects, spanning applied, methodological and theoretical subjects. Students will work on assigned projects in close collaboration with the faculty advisers and current Duke graduate students. The final product of the projects will be a write up of the research conducted during the program.

Duke provides an excellent research environment. Students will have space in the Information Initiative at Duke, which co-houses several summer research programs. Throughout the summer there are professional development and research presentations in this building. The program participants will have access to graduate student mentors and Duke computing resources. Duke Statistical Science will provide logistical support for students and will help build community through fun activities and events.

Funding details: We will provide financial support for each participant up to $5,000 to help cover travel, housing, meals and incidentals. Accepted applicants will be provided assistance in obtaining the proper visa for participation in the program. 

Application materials required:
  1. Personal statement of interest: please select up to three projects from the proposed list below to work on and describe why they are of interest and your expertise as it pertains to them. Also, describe any previous research you may have conducted.
  2. One reference letter (to be submitted by the reference writers on mathprograms.org)
  3. Unofficial transcript that lists relevant STEM and related courses taken.
Applications are due on mathprograms.org by December 15, 2018 to guarantee considerations. DSS will be begin making offers by mid-January to allow for timely visa processing.

Proposed projects (proposing faculty member in parenthesis):
  • Nonlinear dimensionality reduction is used routinely in machine learning, data science and statistics. This project will focus on improving upon existing methods building on a spherelets approach for manifold approximation we have developed recently but only applied in somewhat low-dimensional and toy problems. The goal is to scale up to very high-dimensions and develop efficient code for routine implementation, providing a competitor to neural network-based methods for classification, tSNE for data visualization, and LLE for manifold learning. (Dunson, Herring)
  • Human brain connectomics studies relationships between human brain structure and human traits, such as intelligence and psychiatric disorders. This project focuses on developing better representations of the brain connectome improving on simple graph/network representations. (Dunson)
  • Deep learning in climate research focuses on using deep neural networks and other flexible statistical and machine learning approaches for emulating complex models of atmospheric chemistry. (Dunson)
  • Minimax results for adaptive confidence intervals: In multiparameter settings is possible to construct confidence intervals for individual parameters that adapt to measurable structure among the group of parameters (such as their average magnitude). The goal of this project would be to determine theoretically if there are such procedures are always better, on average across parameters, than standard non-adaptive confidence intervals. (Hoff)
  • Non-parametric regression and posterior inference over distance matrices: In many applications, a quantity of interest involves a matrix containing dissimilarities between variables--the covariance is one example. One may consider non-parametrically learning such a quantity, subject to the natural requirement that its entries satisfy the triangle inequality. For instance, metric learning focuses on the case where this is in the form of a Mahalanobis distance. It will be fruitful to develop projection-based algorithms for estimation in this nonparametric framework. Projecting onto the constraint set-- the cone of distance matrices-- has been studied as the metric nearness problem, and can be incorporated into statistical procedures via a generalization of expectation-maximization. Immediate applications include latent space network modeling (related to Hoff's work) and large-scale matching for causal inference (related to Volfovsky's work). Building on these estimation tools, we will explore Bayesian approaches (related to Dunson's work on constraint relaxation) and possibly testing frameworks. (Xu)
  • Designing experiments for estimating peer influence effects in massive online networks: This project will build on earlier work for estimating direct effects on networks but will flip the script---we want to create a design that can ignore any direct effects of treatment while flushing out the indirect or peer effects. This problem is statistically challenging (and likely solutions will involve sampling methods for colorings on graphs) and is of extreme important in political science, sociology, online marketing and other applied venues. (Volfovsky)
  • Evaluating disease spread in partially observed networks: The study of disease spread frequently requires the assumption that the paths by which the disease move are known of fixed. In practice however we are never certain that we have observed all possible interactions between individuals and moreover many individuals can be impossible to observe. In this project we will build on methods from network analysis that will be help us account for such uncertainty and provide more accurate estimates of disease spread and intervention efficacy. (Volfovsky)
  • Fast Nearest Neighbor Gaussian Process Approximation with k-d Trees: Surya Tokdar works on several advanced regression models, each involving estimation of one or more curves and non-linear hyper-surfaces. His work combines mathematical modeling, asymptotic analysis, stochastic computation and methodological development related to a broad range of application areas. One of his focus areas is ultra high dimensional regression smoothing. He have done some theoretical work to show that strong structural assumptions are needed for consistent estimation of an unknown function of p predictor variables when only n noisy observations are available, with n ≪ p (Yang et al., 2015). One such modeling assumption takes the unknown function f(x) to decompose as f(x) = f1(x)+· · ·+fk(x), where each additive component fj is a sparse function, i.e., it depends on only a small subset of the total p predictors. He has been working on a Bayesian estimation model under this framework which assigns each fj a sparse Gaussian process prior (Qamar and Tokdar, 2014). Computation proceeds via Markov chain Monte Carlo. He is working on making this estimation framework scalable with large n (and p might be very large). A potential solution can be found by combining this estimation method with what is known as a nearest-neighbor Gaussian process approximation (Datta et al., 2016). However, such an approximation does not easily handle the case where the subset of predictors that fj depends on is not known a priori, but must be learned on the fly as part of the estimation process. To make this extension he is interested in borrowing ideas from k-d tree fast nearest neighbor searches. The goal of this work will be to (1) implement a reasonably efficient k-d tree based nearest neighbor GP and (2) investigate its suitability in speeding up sparse GP regression without sacrificing accuracy. If possible, we will then extend both goals to the additive sparse GP regression model suitable for ultra high-dimensional regression. (Tokdar)

Application Materials Required:
Submit the following items online at this website to complete your application:
And anything else requested in the program description.

Further Info:
stat.duke.edu
 
Box 90251
Durham, NC 27708-0251

© 2018 MathPrograms.Org, American Mathematical Society. All Rights Reserved.