Duke University, Department of Mathematics

Program ID: Duke-DATA2021 [#1032]
Program Title: Data+ 2021
Program Type: Undergraduate program
Program Location: Durham, North Carolina 27708-0320, United States [map]
Subject Areas: Data Science, Interdisciplinary
Application Deadline: 2021/02/26 11:59PMhelp popup (posted 2020/12/08)
Program Description:    

Data+ is a full-time ten week summer research experience that welcomes Duke undergraduate and masters students interested in exploring new data-driven approaches to interdisciplinary challenges. It is suitable for students from all class years and from all majors.

Students join small project teams (at most 3 undergrads and 1 masters per team), working alongside other teams in a communal environment. They learn how to marshal, analyze, and visualize data, while gaining broad exposure to the modern world of data science. The projects (see below) come from an extremely diverse set of subject areas.i It is our hope that students will be able to both work deeply into their specific project and get a very broad picture of most of the skills needed for modern data science.

Participants will receive a $5,000 stipend, out of which they must arrange their own housing and travel . Funding and infrastructure support are provided by a wide range of departments, schools, and initiatives from across Duke University, as well as by outside industry and community partners. Participants may not accept employment or take classes during the program; this requirement is strictly enforced and non-negotiable..

Data+ is typically a program where students have dedicated workspace within Gross Hall at Duke University. Last summer (2020), Data+ ran entirely remotely due to the pandemic, and was quite successful. We are currently awaiting university guidance for Summer 2021, and hope to update this site shortly as to whether Data+ 2021 will be in-person or remote. In either case, it will happen!

The program runs from Monday, June 1st until Friday, August 6th 2021. Students must be in the program, with no exceptions. The application deadline is Feb. 26, 2021, but we will evaluate applications on a rolling basis, so please get your applications in as soon as you can!

You will find the projects planned for summer 2021 in the numbered list below. Click on the project names to learn more. Please indicate the number of the projects you choose when you apply; you may list up to three choices in ranked order of preference. If you are seeing this page in December 2020, please note that more projects may be added in the coming weeks; there will be eventually be approximately 30 projects listed!

Due to the nature of the data involved in some of the projects, human subjects research training will be required of all participants and will be provided after admission to the program. With each project, we have attempted to list potential majors and/or interests that might be most interested in the project, but these should not be seen as requirements in any way! Quantitative STEM majors like mathematics, computer science, statistics, and electrical engineering are relevant to all.


1) Rubenstein Library Card Catalog Using the digitized cards from the David M. Rubenstein Rare Book and Manuscript Library’s old card catalogs, a team of students will explore extracting structured data to develop searchable and sortable descriptions of manuscript and archival collections. They will use textual analysis tools and natural language processing techniques to prepare an indexed digital collection of structured card catalog metadata for publication in Duke’s Digital Repository. The team will then develop ways to visualize and search this dataset based on different research topics or terms. Library science, Digital Humanities.

2) Rainforest XPRIZE A team of students will collaborate with the Duke Rainforest XPRIZE team led by Martin Brooke (ECE) and Stuart Pimm (Nicolas School) to use machine learning and image processing to automate conversion of the large amount of biodiversity data the team will generate into images and sound bites that will be suitable for iNaturalist community identification of individual species. Energy and Environment, Biology.

3) Mental Health and the Justice System in Durham County A team of students will use data from the Durham County Detention Facility and Duke Health System to examine patterns of health-service utilization in the incarcerated population, including those with and without mental illness diagnoses. The team will use statistical methods such as longitudinal modeling to analyze the effects of the many interventions recently implemented in Durham County, including increased mental health services, medication-assisted treatment for addiction, among others. Public Policy, all social sciences.

4) Mapping the Trajectories of Duke Doctoral Students A team of students led by Francisco Ramos and Edward Balleisen will explore survey data from degree completers and alumni to establish important correlations, document key patterns and longitudinal trends, and develop visualizations that can inform institutional decision-making. Combining statistics with data science to analyze large datasets, this research will provide a deep understanding of how doctoral training and education can be improved across Duke University. Education.

5) The Impact of Virtualized Museum Collections A team of students led by faculty of biology and directors of the largest virtual museum in the world (a Duke-base web repository called MorphoSource) will develop tools for assessing the societal and scholarly impact and importance of museum specimens made available as 3D digital resources over the web. The project provides an opportunity to contribute to a professional resource utilized by almost a thousand museums around the world and to set the trajectory of online museum research practices for years to come. Biology, Art History.

6) Modeling Microbial Growth and Resilience A team of students will apply and extend custom analytics solutions to discover how life remains resilient in extreme environments. The team will work with large datasets generated by the Schmid lab at Duke that track how microbes grow and change their gene expression when faced with extreme stress. Biology.

7) Visualizing Durham Public School Communities A team of students will use existing data sets combined with historic and contemporary city context to better understand the complex and nuanced details of different school communities. This is in collaboration with a inter-institutional Bass Connections team from Duke and North Carolina Central University that is committed to developing more responsible and imaginative ways of partnering with Durham Public Schools. Public Policy, Education, all Social Sciences.

8) From Farm to Fork The Root Causes Fresh Produce Program, led by an interdisciplinary team of graduate and undergraduate students at Duke and UNC, leverages the power of student volunteers and community partnerships to address food insecurity in our community by delivering fresh and locally sourced produce to the doorsteps of patients in the Duke, Lincoln, and Samaritan Community health systems. A team of students will perform data visualization and analysis, to improve the program’s delivery route optimization software, improve the crop assessment and inventory management tools of small farmers and wholesale markets, and develop a dashboard integrating client location data and GIS information to identify location-specific service providers. Nutrition Science, Public Policy, pre-Health, all social sciences.

9) Web Attack Features A team of students led by members of the Duke IT Security Office will develop machine learning features that can be used to identify previously unknown web attacks. The results from this project may be incorporated into Duke's IT security infrastructure to help protect Duke's network in the future.

10) CovIdentify A team of students led by researchers in the BIG IDEAS lab in the biomedical engineering department will build and validate machine learning techniques to classify longitudinal illness trajectories of individuals with infections such as COVID-19 or flu. Students will construct a pipeline to query survey and wearable device data from our newly constructed database in the Microsoft Azure environment and modify existing machine learning and deep learning algorithms for wearables data analysis. pre-Health, Biomedical Engineering.

11) Predicting Baseball Athletic Performance A team of students will use evaluation data collected on baseball athletes to make predictions about their on-field performance in competitive games. The final product will be an application that uses an athlete's assessment results to produce performance summary graphs for the individual compared to other athletes and inferential models for the relationships between assessments and performance. Sports Science.

12) Understanding Voting Patterns and Interactions with Gerrymandering This project seeks to understand voting patterns and their effect on election outcomes across geography and time. This involves examining precinct-level votes across a large array of historical votes (including 2020 and as far back as 2012). Students will employ a variety of techniques in dimension reduction to uncover large-scale voting patterns and investigate the evolution of voting patterns across the decade. This work will help answer questions like "did the suburbs vote with the cities?" The students will use voting patterns to explore the "stability" of gerrymandering as they compare election outcomes under certain maps compared with large ensembles of non-partisan maps. Political Science, Public Policy.

13) Agile Waveform Design A team of students will develop benchmark data pertaining to network performance in the presence of intentional and non-intentional degradation, ranging from sensor failure and additive noise to adversarial interference. The students will analyze the baseline performance of the network, and measure performance of the degraded network with and without the inclusion of robust techniques that shore up robustness.

14) Dashboard for Public Investments A team of students will use visual reporting tools, such as Tableau or Power Bi, to create a dynamic dashboard that will enable investment professionals at Duke University Management Company (DUMAC) to review and better understand fund managers’ exposures and positioning across various dimensions. Students will collaborate with teams at DUMAC to develop an intuitive, visual dashboard to help the investment team review individual and portfolio-level exposures across numerous asset classes. This dashboard will be connected to DUMAC’s data warehouse, which refreshes on a daily basis as new data comes in for DUMAC’s public market accounts. Finance, Economics.

15) Data Visualization for Family Health A team of students led by researchers at Duke University and UC Davis will visualize data on child and family health from Yolo County, California. The visualization dashboard will be used by academic researchers and community service providers in addition to Yolo County community members. The overall goal of the research is to reduce health disparities through strengthening academic-community partnerships. Public Policy, pre-Health, all Social Sciences.

16) NLP in the Investment Office A team of students will explore how using natural language processing and other data focused tools to direct the input of files and depository flow can be used to support the investment office at Duke University Management Company (DUMAC). Students will collaborate with investment professionals to investigate and potentially develop tools to facilitate intuitive keyword search and context extraction from various documents sources, which would support investment analysis and the legal review process. Finance, Economics.

17) Constructing Utopias in Restoration London A team of students led by Nicholas Smolenski (PhD Candidate, Musicology) and Dr. Astrid Giugni (Lecturing Fellow, English) will explore how London was rebuilt into a utopia; by employing topic modeling and applying the resulting lexicon to seventeenth-century architectural sketches, students will demonstrate how a language of progress became inextricably linked to its own image while also exposing the paradoxes entrenched in utopic representation. This project will additionally show how its framework can apply to current political discourse, as the tearing down of statues and monuments over the past three years has highlighted inescapable tensions between a governing power, a nation’s history, and its people. Digital Humanities, History.

18) Pivers Island Coastal Observatory A team of students will perform analyses of a 10+ year oceanographic time‐series dataset sampled near the Duke Marine Laboratory. The team will focus on clustering, classification and forecasting towards the interpretation of variability and trends of key variables measured by the Pivers Island Coastal Observatory. The long term goal of this project is to understand the proximal drivers of variability in coastal marine ecosystems as well as longer term changes associated with climate change. Ecology and the Environment, Marine Biology.

19) Creating Artificial Worlds for Energy Access Along with the Energy Data Analytics Lab and the Sustainable Energy Transitions Initiative, a team of students will explore how machine learning can be developed to better identify and characterize energy infrastructure and scale up its application across geographies. This research will increase the speed and scale of assessment towards an automated, global assessment of energy infrastructure. Energy and the Environment.

20) Racial Disparities in the Child Welfare System A team of students will organize data and evaluate systems to understand disparities affecting children of color within the child welfare system. The team will have the opportunity to collaborate with professionals in the Durham County Department of Social Services Child & Family Services. Public Policy, all social sciences.

21) Detecting and Matching Similar Networks A team of students will implement our recently developed graph matching and similarity scoring algorithms, test their empirical performance in real datasets, and further improve the algorithm design by incorporating domain knowledge. Multiple datasets will be explored such as Facebook ego network, Covid PPI network, Isobase PPI network, computer vision datasets, and Wikipedia article networks. A visualization app will be built to facilitate researchers and practitioners to apply graph matching in relevant applications.

22) Ethical Consumption before Capitalism A team of students will analyze approximately 60,000 Medieval and Renaissance digitized texts by performing topic modelling, which will allow them to track associations between the languages of consumer culture and ethical practice.The team will organize their results in a series of visualizations that trace the development of the ethics of ‘consumerism’ in terms of frequency of use, timeline of relevant events, authorship, location of publication and name of printer/printing press, and genres. Digital Humanities, English, History.

Application Materials Required:
Submit the following items online at this website to complete your application:
And anything else requested in the program description.

Further Info:
Mathematics Department
Duke University, Box 90320
Durham, NC 27708-0320

© 2021 MathPrograms.Org, American Mathematical Society. All Rights Reserved.