Duke University, Department of Mathematics

Program ID: Duke-DATA2022 [#1208]
Program Title: Data+ 2022
Program Type: Undergraduate program
Program Location: Durham, North Carolina 27708-0320, United States [map] sort by distance
Subject Areas: Data Science, Interdisciplinary
Application Deadline: finished (2021/12/08, finished 2022/12/10)
Program Description:    

*** this program has been closed and new applications are no longer accepted. ***

Data+ is a full-time ten week summer research experience that welcomes undergraduate and masters students interested in exploring new data-driven approaches to interdisciplinary challenges. It is suitable for students from all class years and from all majors.

Students join small project teams (at most 3 undergrads and 1 masters per team), working alongside other teams in a communal environment. They learn how to marshal, analyze, and visualize data, while gaining broad exposure to the modern world of data science. The projects (see below) come from an extremely diverse set of subject areas. It is our hope that students will be able to both work deeply into their specific project and get a very broad picture of most of the skills needed for modern data science.

Participants will receive a $5,000 stipend, out of which they must arrange their own housing and travel . Funding and infrastructure support are provided by a wide range of departments, schools, and initiatives from across Duke University, as well as by outside industry and community partners. Participants may not accept employment or take classes during the program; this requirement is strictly enforced and non-negotiable..

Data+ is typically a program where students have dedicated workspace within Gross Hall at Duke University. Last summer (2021), Data+ ran in a hybrid fashion due to the pandemic, and was quite successful. We are currently awaiting university guidance for Summer 2022, and hope to update this site shortly as to whether Data+ 2022 will be in-person, hybrid, or remote. Until we update further, applicants should assume that in-person participation will be expected.

The program runs from Monday, May 23rd until Friday, July 29th 2022. Students must be able to participate in the entire program, with no exceptions. The application deadline is Feb. 25, 2021, but we will evaluate applications on a rolling basis, so please get your applications in as soon as you can!

You will find the projects planned for summer 2022 in the numbered list below. Click on the project names to learn more. When you apply, please indicate the projects you are interested in by their associated number on the list below ; you may list up to five choices in ranked order of preference. If you are seeing this page in December 2021, please note that more projects may be added in the coming weeks; there will be eventually be approximately 30 projects listed!

Due to the nature of the data involved in some of the projects, human subjects research training will be required of all participants and will be provided after admission to the program. With each project, we have attempted to list potential majors and/or interests that might be most interested in the project, but these should not be seen as requirements in any way! Quantitative STEM majors like mathematics, computer science, statistics, and electrical engineering are relevant to all.


1) Visualizing Durham Public School Communities Using data from the Durham Compass and the NC School Report Card among many other sources, this team will continue the development of an interactive R Shiny dashboard that permits exploration of school statistical data. The team aims to explore school zones through an asset-based lens in an effort to support ethical and imaginative partnerships between Duke, North Carolina Central, and Durham Public Schools. Education, Public Policy, all social sciences.

2) AdoptADrain A team of students led by Dr. Liz DeMattia (Duke University Marine Lab) and Dr. Rachel Noble (UNC-IMS) will explore ways to mobilize the raw citizen data collected by the Community Science Initiative’s AdoptADrain project into useable information for participants, the public and policy makers. Ecology, Environmental Science, Public Policy.

3) Cancer Control Equity in China A team of students led by faculty from both Duke and Duke Kunshan will synthesize data from a variety of sources to investigate the social determinants of cancers in local areas, examine the impact of personal behaviors (such as diet, sleeping, exercise, smoking) and community characteristics (such as air/water/soil quality, built environment, social norms, discrimination, marginalization) on cancer-related outcomes, and conduct systematic review and meta-analysis to evaluate the effectiveness of the current cancer prevention and control policies and interventions in China. pre-Med, all social sciences, Public Policy.

4) AI-powered Transcription of Historical Manuscripts A team of students will collaborate with Duke librarians to use AI-powered Handwriting Text Recognition (HTR) tools to transform thousands of pages of handwritten text into machine readable data. Using a large dataset of digitized 19th and early 20th-century women’s travel diaries held in the Rubenstein Library students will test and evaluate various HTR technologies, document methods and constraints for extracting text from historical manuscripts, and build an HTR toolset and proof-of-concept interface that the library can build on for future projects. Library Science, History.

5) Data+Fest A team of students led by Statistical Science professors Mine Çetinkaya-Rundel and Maria Tackett will pull together all data associated with DataFest for the purpose of retrospective archiving and documentation as well as creating a valuable resource that can serve as the one-stop-shop for students interested in participating in DataFest in the future.

6) Policy Surveillance of UHC Financing A team of students led by researchers at the Duke Center for Policy Impact in Global Health (CPIGH) will create a user-friendly interactive visualization tool to track the evolution of Universal Health Coverage (UHC) financing policies in the low- and middle-income countries. This publicly available data visualization will provide an opportunity for policymakers and researchers to conduct a quick comparative cross-country analysis and longitudinal analysis to understand the impact of different policy experiments on UHC progress in different jurisdictions. Global Health, Public Policy, Finance.

7) A City and its River A team of students led by researchers in the Duke River Center will develop a publicly available and accessible website to serve as a portal to explore diverse and extensive datasets detailing the quality of waterways and the effectiveness of management efforts to reduce risks associated with chemical contaminants, stormwater flow, and flooding. The website will be developed in collaboration with our community collaborator, the Ellerbe Creek Watershed Association, and Data+ teammates will have the unique opportunity to engage with a separate NSF-funded workshop focused on teaching tools for visualizing these datasets. Ecology, Environmental Science.

8) Finding Medieval Irish People using Drones A team of students led by Co-Principal Investigators Dr Jenny Immich and Dr Vicky McAlister will develop a geospatial methodology to automate data analysis originating from small unmanned aerial vehicles (SUAV) that seeks to identify the homes of ordinary medieval people within the modern Irish landscape. Known as aerial archaeology, this work investigates archaeological remnants on the surface of the earth without excavation. History, Archaeology.

9) Spatial Patterns in Bacterial Consortia A team of students led by Biomedical Engineering professor Lingchong You will predict pattern formation of bacterial colonies by integrating experimental results with both mechanistic modelling and machine learning methods. Students will contribute to a method for controlling the outcome of colony spatial patterning, which is an important challenge facing the field of synthetic biology. Biology, Biomedical Engineering.

10) this project has been postponed until 2023.

11) Ethical Consumption before Capitalism (year 3) A team of students will extend the analysis of consumption performed in 2020 and 2021 to include the data collected and analyzed by the Bass Connections team from the rare materials archives of Duke’s Rubenstein library and of the University of Alabama Birmingham’s Lister Hill library. These materials include Medieval and Early Modern medical manuals, seventeenth-century mercantile theory texts, as well as treatises and legislative documents on monopolies, corporations, and trading companies. Humanities, History.

12) Gamifying Risk Identification A student team led by researchers at Duke Surgery and Global Health Institute will further develop the computer application - Alcohol Use Behavioral Phenotyping Test (AUBPT) that can help predict alcohol use and alcohol use disorder risks based on personal characteristics and behavioral performance on Research Domain Criteria paradigms/games. Students will build multi-tasking simulated AI agents using computational neuroscience and deep learning methods. Global Health, pre-med.

13) Wetland Carbon Emissions A team of students led by researchers in the Hydroclimatological Lab will comprehensively quantify the wetland carbon emissions in the entire Southeast (SE) US using machine learning techniques and various climate datasets—including in situ measurements, remote sensing data, climate observations, and hydrological model (PIHM-Wetland) outputs. Environmental Science, Ecology.

14) Infection Detection with Wearables A team of students led by researchers in the BIG IDEAs Lab will work to create a cloud-based infection detection platform that populates and translates wearable data from a variety of sources. The ultimate goal of this work is to inform wearable device users of changes in their health condition before more serious symptoms occur. pre-Health, Biomedical Engineering.

15) Small Town Policing Accountability A team of students will analyze and identify patterns in police behavior in the town, work with journalists and other activists to use the data to develop action plans to address problems and enact public safety reform, and help to build an abstracted process and suite of tools that can be used by other small towns and municipalities to analyze their own data and empower police reform efforts across the nation. Public Policy.

16) Earthquake Early Warning in Kathmandu A team of students led by Prof. Henri Gavin will develop AI models for on-site earthquake early warning, in which sensors at a site provide warnings at that site. The Data+ project will integrate into ongoing work on geophone sensors, IOT microcontrollers, and networking. Earth Science.

17) Understanding the Course of Disease A team of students led by Professor Anru Zhang (Duke Biostatistics & Bioinformatics, Computer Science, Mathematics, and Statistical Science) will develop methods to investigate the courses of complex diseases through electronic health records. The team will apply tensor methods to identify key features to register the patient's timeline. This work will provide a basis for researchers at Duke and elsewhere to sufficiently utilize the high-dimensional and longitudinal information in the electronic health record data. Biostatistics, pre-health.

18) Brain-computer Interface A team of researchers associated with the Applied Machine Learning Lab in Duke’s ECE department will lead a team of students in developing novel machine learning techniques that will be used for improving brain computer interfaces (BCIs) using electroencephalography (EEG) data. Students will learn how to pre-process EEG data, extract EEG features, and train machine learning algorithms for character selection in a spelling interface that allows “locked in” individuals, like Stephen Hawking, to communicate with the outside world. Biomedical Engineering.

19) Mapping Trajectories of Duke Doctoral Students (year 2) A team of students led by Courtnea Rainey, David Jamieson-Drake, and Edward Balleisen will explore survey data from completers of the PhD and Duke PhD alumni to establish important correlations, document key patterns and longitudinal trends, and develop visualizations that can inform institutional decision-making. In addition to updating the work of last year’s Data+ team, this year’s group will use text mining and topic modeling to draw out key themes from the textual answers to free response questions. Education.

20) Tracking Climate Change with Satellites and AI A student team working with the Energy Data Analytics Lab will work to democratize access to data relevant to climate change mitigation and adaptation planning as well as the underlying models to acquire those data. This project will work towards building the first “foundation model” specifically for remote sensing imagery for the purpose of extracting climate change relevant content at scale to enable near real-time tracking of climate causes and impacts. Energy and the Environment, Environmental Science.

21) Mental Health and the Justice System A team of students led by researchers Nicole Schramm-Sapyta (Duke Institute for Brain Sciences) and Maria Tackett (Statistical Science) will explore the impacts of community health services and local laws and policies on the justice-involved population in Durham. The team will create a public-facing interactive timeline of the implementation of mental health services, drug laws, and court policies in Durham. Law, Public Policy.

22) Plus Programs Data Exploration Data+ has been in operation for 8 years, and several other linked programs have started up since, including Code+, which focuses on app development and CS+, which focuses on team-based research in Computer Science. A team of students led by John Haws (OIT) will collaborate with Plus Programs administrators to review the data that has been gathered on students since each Plus Program began. The team will then make recommendations and create a single data structure and dashboard for Plus Programs that can be used for years to come to report on participants, suggest program improvements, and develop alumni outreach opportunities. Education.

23) Modeling Microbial Growth An explosion of data has resulted from tracking the growth of bacteria in high throughput devices. These data were generated to understand how microbes grow. Better models that fit and predict these growth data are needed for better treatment of pathogenic bacterial infections, food safety, beer and bread fermentation, and understanding stress resilience of the microbiome. The goal of this Data+ project is to apply and extend custom analytics solutions to understand and predict microbial population growth. Biology, Biomedical Engineering.

24) River Ice Timing and Duration Duke Data+ students, in collaboration with Dr. Emily Bernhardt (faculty advisor) and Audrey Thellman (graduate student) will evaluate how changing ice and snow conditions are impacting river ecosystems through classified ice imagery.The Data+ team will modularize and visualize a classification pipeline to increase accessibility of a data product. Ecology, Environmental Science.

25) Electric Consumption Profiles and Data Privacy team of students led by researchers at Duke and abroad will develop and evaluate machine learning solutions to model behavioral patterns of electric use, emphasizing data privacy. Data collected in different parts of the world will be analyzed to understand the electric patterns that characterize various appliances and how that information can model users' consumption profiles and prevent fraud.

26) Learning to Communicate A team of students will develop machine learning (ML) algorithms that take advantage of special features of new waveforms proposed for 6G wireless communication. The team will be highly interdisciplinary, and will include students from Virginia Tech familiar with wireless communication, as well as students interested in machine learning. Students will design experiments, collect data, and analyze over the air performance, some working onsite using Virginia Tech’s CORNET testbed (https://cornet.wireless.vt.edu/), some virtually using CORENT-based remote lab experiments.

27) Community Safety in Durham A team of students will build data science tools helpful to the City of Durham's newly formed Community Safety Department, whose mission is to identify, implement, and evaluate new approaches to enhance public safety that may not involve a law enforcement response or the criminal justice system. Public Policy, all social sciences.

28) Farm to Fork (Year 2) A team of students collaborating with Duke School of Medicine's Root Causes Fresh Produce Program, community members, and physicians throughout the Duke Health network will help integrate data from food deliveries to Duke Health patients with patient health record data and other available data sources to create a dashboard that can analyze, predict, and manage the Root Causes' "Food as Medicine" program. Public Policy, Public Health, all social sciences.

29) Melting Ice, Shifting Krill A team of students led by researchers at the Duke Marine Lab will explore the changing distribution of krill around the Antarctic Peninsula. Using data from acoustic zooplankton surveys, students will create maps and other products to visualize the spatial distribution of krill over the past 20 summers, then create metrics that allow us to quantify the way that krill distribution around the Antarctic Peninsula is changing as the climate shifts and ice melts. Environmental Science.

Application Materials Required:
Submit the following items online at this website to complete your application:
And anything else requested in the program description.

Further Info:
Mathematics Department
Duke University, Box 90320
Durham, NC 27708-0320