Duke University, Department of Mathematics

Program ID: Duke-DATA2020 [#898]
Program Title: Data+ 2020
Program Type: Undergraduate program
Program Location: Durham, North Carolina 27708-0320, United States [map]
Subject Areas: Data Science, Interdisciplinary
Application Deadline: 2020/02/27help popup (posted 2019/12/05, updated 2019/12/06)
Program Description:    

*** the list date or deadline for this program has passed, and no new applications will be accepted. ***

Data+ is a full-time ten week summer research experience that welcomes Duke undergraduate and masters students interested in exploring new data-driven approaches to interdisciplinary challenges. It is suitable for students from all class years and from all majors.

Students join small project teams (at most 3 undergrads and 1 masters per team), working alongside other teams in a communal environment. They learn how to marshal, analyze, and visualize data, while gaining broad exposure to the modern world of data science. The projects (see below) come from an extremely diverse set of subject areas.i It is our hope that students will be able to both work deeply into their specific project and get a very broad picture of most of the skills needed for modern data science.

Participants will receive a $5,000 stipend, out of which they must arrange their own housing and travel . Funding and infrastructure support are provided by a wide range of departments, schools, and initiatives from across Duke University, as well as by outside industry and community partners. Participants may not accept employment or take classes during the program; this requirement is strictly enforced and non-negotiable..

The program runs from Tuesday, May 26th until Friday, July 31st, 2020. Students must be in residence for the full ten weeks, with no exceptions. The application deadline is Feb. 27, 2020, but we will evaluate applications on a rolling basis, so please get your applications in as soon as you can!

You will find the projects planned for summer 2020 in the numbered list below. Click on the project names to learn more. Please indicate the number of the projects you choose when you apply; you may list up to three choices in ranked order of preference. If you are seeing this page in December 2019, please note that more projects may be added in the coming weeks; there will be eventually be approximately 30 projects listed!

Due to the nature of the data involved in some of the projects, human subjects research training will be required of all participants and will be provided after admission to the program. With each project, we have attempted to list potential majors and/or interests that might be most interested in the project, but these should not be seen as requirements in any way! Quantitative STEM majors like mathematics, computer science, statistics, and electrical engineering are relevant to all.


1) Disease Emergence and Richness in Primates A team of students led by the Nunn lab and its collaborators will investigate the ecological and behavioral factors that determine parasitism in different species of primates. Based on publicly available data and evolutionary trees, students will investigate parasitism by developing a network of primate-parasite relationships. This network will then be used to infer the ecological and behavioral characteristics that best predict parasitism. Biology, Biostatistics, Environmental Science, pre-Health.

2) When Black Stories go Global A team of students led by Humanities Unbounded Fellow Eva Michelle Wheeler will explore how culturally-bound language in African-American literature and film is rendered for international audiences and will map where and into which languages these translations are occurring. Students will use a reference dataset to build and annotate a translation corpus, explore the lexical choices and translation strategies employed by translators, and conduct a macro-level analysis of the geographic and linguistic spread of these types of translations. English, Romance Studies, African and African-American Studies, Film and Visual Studies.

3) Boundary Update Tool for Utility Services A team of students led by researchers from the Internet of Water project at the Nicholas Institute will develop an online tool that allows local water systems to update and verify their service boundaries while maintaining data security and functionality for state regulators. This tool will improve system boundary data that are used for planning and decision-making purposes. Additionally, the tool may include functionality for basic spatial analyses such as overlaying boundaries on sociodemographic, economic, and environmental data. Environmental Science, Ecology.

4) Human Activity Recognition While human activity recognition (HAR) is traditionally performed using accelerometry data, a team of students led by researchers in the BIG IDEAS Lab will explore HAR with physiological data from wrist wearables. Using deep learning methods, students will extract features from wearable sensor data to classify human activity. The student team will develop a reproducible machine learning model that will be integrated into the Big Ideas Lab Digital Biomarker Discovery Pipeline (DBDP), which is a source of code for researchers and clinicians developing digital biomarkers from wearable sensors and mobile health technologies. Biomedical Engineering, pre-Health.

5) Mechanical Failures at Sea A team of students will analyze sensor data from a shipping fleet to develop predictive models to prevent mechanical failures from happening at sea and optimize the best time for replacement. They will have the opportunity to collaborate closely with analytics professionals from Fleet Management Limited, the world’s third-largest ship management company looking after 520+ vessels on behalf of owners. Mechanical Engineering.

6) Taking Electrification on the Road A team of students led by researchers in the Energy Initiative and the Energy Access Project will explore historical data on the U.S. Electric Farm Equipment (EFE) demonstration show that ran between 1939 and 1941, which aimed to increase usage of electricity in rural areas. Students will compile data collected by the Rural Electrification Agency into a machine-readable form, and then use that data to explore and visualize the EFE’s impact. Environmental Science, History.

7) On Being a Blue Devil A team of students, led by University Archivist Valerie Gillispie and Professor Don Taylor, will take a closer look at how the student body at Duke has transformed into a coeducational student body from around the world enrolled in ten different schools. Students will seek to transform digital and historical data into a dynamic visual display which allows viewers to examine changes in the student body over time. History, all Social Sciences, Education.

8) Network Visualization of Foot Traffic Patterns A team of students led by data scientists and engineers from the Office of Information Technology will work to visualize foot traffic patterns in the Bryan Center. Students will be given a large dataset consisting of wifi data, which they will analyze to gain insight into usage patterns of the Bryan Center over various time periods. The work will help to identify areas of the center that experience high wear and tear.

9) Uncovering Latinx Social History A team of students led by History Professor Cecilia Márquez will use census data to understand the long history of Latinxs in the U.S. South. Despite a growing focus of historians and social scientists on the historical and contemporary Latinx South, there has not yet been a thorough data analysis of the historical presence of Latinxs in the South. The Data+ team will search the U.S. Federal Census, immigration records, and marriage records to determine the location of Latinxs in the U.S. South over the course of the late nineteenth and early twentieth centuries. History, Latinx Studies, all social sciences.

10) Protecting American Investors The promoters for modern American capitalism have long encouraged individuals, including those of modest means, to build their wealth through investments. But how have ordinary investors learned about the opportunities and risks of putting their savings to work on Wall Street? A team of students working with History professor Ed Balleisen will delve into the evolving nature of investment advice from the early twentieth-century up to the start of the internet age. Creating datasets from financial advice columns in large circulation American newspapers and magazines, they will use text mining techniques and sentiment analysis to see how advice changed in response to the business cycle, the emergence of new types of investments, financial products, and investors, and the evolution of financial regulation. Economics, History, Finance, Public Policy.

11) Predicting Blindness A team of students led by researchers in the Duke Eye Center and Department of Statistical Science will develop statistical models to assess the risk of legal blindness in glaucoma patients using electronic health records (EHR) from Duke Health. Students will focus on identifying risk factors relevant locally to the Durham county patient population and will enrich the available EHR data with detailed social and environmental data using the Durham Neighborhood Compass. A priority of the research will be to develop an app to make the prediction model accessible, so that real-time decisions about medical care related to blindness can be made. pre-Health, Biomedical Engineering.

12) Forecasting Campus Energy Usage A team of students led by the Data and Analytics Practice at OIT will develop a robust forecasting model for predicting energy usage for different facilities on campus. Students will explore a wide range of real-world time-series data challenges from anomaly detection as well as handling, to benchmarking traditional statistical and modern machine learning models for forecasting. This work will enable several critical analyses for Duke Facilities Management to optimize their operations and significantly reduce costs. Energy, Economics, Environmental Engineering.

13) Finding Space Junk A team of students led by Physics professors Dan Scolnic, Michael Troxel and Chris Walter will build their own algorithms to use images taken as part of The Dark Energy Survey, one of the largest cosmological surveys, to learn more about all the things we find in space that we aren’t looking for. As surveys attempt to measure increasingly difficult and subtle features of the universe, like the imprint of dark energy and dark matter, identification of any kind of artifact will be critical. Physics.

14) Race and Housing in Durham A team of students led by professor of Public Policy William Darity Jr. will chart the evolution of racial inequality in housing in a subset of Durham’s neighborhoods over the course of the 20th century, using census data and Durham County housing records. Public Policy, all Social Sciences, History.

15) Mental Health and the Justice System Mental Illness is over-represented in the incarcerated population, and is correlated with higher rates of re-arrest. In recent years, Durham County has taken many steps to break this unfortunate cycle, including helping incarcerated people to engage with mental health treatment resources. This team will consult with collaborators at the Durham County Detention Facility, the Criminal Justice Resource Center, and the Duke Health System to determine if recently-incarcerated people in Durham are using the resources available to them, and if outcomes are improving. Public Policy, all Social Sciences.

16) Environmental Public Health Tracking The Data+ student team, led by epidemiologist Mike Dolan Fliss and colleagues from the NC Division of Public Health (DPH), will build a pilot Environmental Public Health Tracking (EPHT) tool for NC. Students will analyze and combine spatial health, environmental, and point-source data from NC DPH and other partners, then co-design and prototype visual dashboards for public use. Public Policy, Public Health, Environmental Science, all Social Science.

17) American Predatory Lending A team of students, led by researchers in the Global Financial Markets Center at Duke Law will carry forward the work of a 2019-20 Bass Connections team to better understand the state of the home mortgage market leading up to the financial crisis. The Data+ team will expand the scope of their analysis outside North Carolina and begin the process of developing a complete quantitative portrait on the state of the mortgage market in Sun Belt states. Economics, Finance, Public Policy, all Social Sciences.

18) Predicting Baseball Performance from Vision A team of students led by researchers from the Duke Human Performance Optimization Lab (OptiLab) and the Michael W. Krzyzewski Human Performance Laboratory (K-Lab) will develop an analytic and report generating application to test if baseline vision and movement screening measures are able to predict on-field baseball performance in a cohort of nearly 300 athletes who participated in the USA Baseball Prospect Development Pipeline (PDP). Sports Science, pre-health.

19) For Love of Greed A team of students led by Dr. Astrid Giugni (Duke, English and ISS) and Dr. Jessica Hines (Brimingham-Southern College, English) will address the question of how to trace concepts that slowly developed alongside changing economic and social realities. The team will track a set of related terms (such as consumer, greed, speculation, profit) in order to begin assessing how the ethical, political, and economic language of goods-consumption changed around the Protestant Reformation and the rise of the market economy. English, History.

20) Deep Learning for Rare Energy Infrastructures A team of students led by researchers in the Energy Data Analytics Lab, Electrical & Computer Engineering, and with participation from the Energy Access Project will investigate how to use synthetically-generated satellite imagery to improve the identification of energy infrastructure in satellite imagery. Energy, Environmental Science.

21) Linking Urban Land Use to Aquatic Metabolism Regimes A team of students led by researchers at the Duke River Center will develop tools to link water quality and aquatic ecosystem condition to urban and other land uses by combining existing geospatial data including land cover maps, LiDAR, and remotely-sensed images with time series of estimates of ecosystem metabolism found within the StreamPULSE data portal. Environmental Science, Public Policy.

22) Healthy Eating in Young Children A team of students led by eating disorders expert Nancy Zucker and engineering professor Guillermo Sapiro will develop multimodal computational tools to help improve the nutritional status and food enjoyment of young children with Avoidant/Restrictive Food Intake Disorder (ARFID), children who are not eating enough food or are eating an inadequate variety of food to the degree that it impairs functioning. Students will analyze facial affect and behavior from videos of children trying new foods and will derive sensory profiles based on children’s patterns of food acceptance. pre-Health, Education, Biomedical Engineering.

23) Security Threat Hunting Over the past several months, Duke's Information Technology Security Office (ITSO) has begun applying the MITRE ATT&CK framework as a basis for how the team collects, assesses, identifies and responds to attacker tactics, techniques, and procedures (TTPs). As the team rolls out new processes to "hunt" for attackers, a model that transitions the team's primary functions from defensive/reactive to offensive/proactive, the team will need to incorporate real time and longitudinal data analytics as well as incorporate automated responses based on these data analyses.

24) Churn Models for Duke Athletics One of the major challenges in retaining season ticket holders is understanding which are most likely to churn, i.e. not renew their tickets. A team of students, in conjunction with Duke’s Office of Information Technology and Duke Athletics, will make use of data from Duke’s ticketing system, to build a set of models that seeks to predict the profiles and timing of non-renewal of season ticket holders and annual donors

25) AI in the Investment Office A team of students will explore how artificial intelligence tools can be used to support the investment office at the Duke University Management Company (DUMAC). In particular, the team will investigate natural language processing and other AI methods for supporting the legal review process, investment analysis, and financial reporting. Finance, pre-Law.

26) Retention of College Women in Tech A team of students will explore ways in which data science can help support the mission of Rewriting the Code, a national non-profit organization dedicated to empowering a community of college women with a passion for technology. In particular, students will perform statistical analyzes of past survey data, build out interactive dashboards that help visualize trends in student experience, and help design future survey questions.

27) Computational Approaches to the History of Cartography A team of students will explore new ways of reading pre-modern maps and perspectival views through image tagging, annotation and 3D modeling. Each student will build a typology of icons found in these early maps (for example, houses, churches, roads, rivers, etc.). By extracting, modeling, and cataloging these features, the team will create a library of 2D and 3D objects that will be used to (a) identify patterns in how space and power are represented across these maps, and (b) to create a model for “experiencing” these maps in 3D, using the Unity game engine platform. History, Game Design.

28) Neural Network-Based Self-Adjusting Computational Processors A team of students, led by Electrical and Computer Engineering professor Vahid Tarokh, will develop methods to improve the efficiency of information processing with adaptive decisions according to the structure of new incoming data. Students will have the opportunity to explore data-driven adaptive strategies based on neural networks and statistical learning models, investigate trade-offs between error threshold and computational complexity for various fundamental operations, and implement software prototypes.

