|Data+ is a full-time ten week summer research experience that welcomes Duke undergraduate and masters students interested in exploring new data-driven approaches to interdisciplinary challenges.
Students join small project teams (at most 3 undergrads and 1 masters per team), working alongside other teams in a communal environment. They learn how to marshal, analyze, and visualize data, while gaining broad exposure to the modern world of data science. The projects (see below) come from an extremely diverse set of subject areas. It is our hope that students will be able to both work deeply into their specific project and get a very broad picture of most of the skills needed for modern data science.
Participants will receive a $5,000 stipend, out of which they must arrange their own housing and travel . Funding and infrastructure support are provided by a wide range of departments, schools, and initiatives from across Duke University, as well as by outside industry and community partners. Participants may not accept employment or take classes during the program.
The program runs from May 22 until July 28, 2017. The application deadline is Feb. 20, 2017, but we will evaluate applications on a rolling basis, so please get your applications in as soon as you can!
You will find the projects planned for summer 2017 in the numbered list below. Click on the project names to learn more. Please indicate the number of the projects you choose when you apply; you may list up to three choices in ranked order of preference.
For some projects, human subjects research training may be required and will be provided in advance. With each project, we have attempted to list potential majors and/or interests that might be best suited for the project, but these should not be seen as requirements in any way! Quantitative STEM majors like mathematics, computer science, statistics, and electrical engineering are relevant to all.
PROJECT OFFERINGS FOR SUMMER 2017
1) Data Viz for Long-term Ecological Research and Curricula A team of students led by Biology Professor Emily Bernhardt will develop interactive R Shiny data visualization apps that will allow students and researchers to understand sixty years of ecological data collected at Hubbard Brook Experimental Forest. Students will receive feedback from Gene Likens, co-founder of the Hubbard Brook Ecosystem Study. Biology, Environmental Policy, Chemistry, all quantitative STEM.
2) Electricity Access in Developing Countries from Aerial Imagery A team of students led by researchers in the Energy Data Analytics Lab and the Sustainable Energy Transitions Initiative will develop means to evaluate electricity access in developing countries through machine learning techniques applied to aerial imagery data. Students will identify features of satellite imagery that can be used to demonstrate whether a community has access to electricity, create a reference dataset of key features, and apply machine learning methods to a large dataset. Partially sponsored by the Duke Energy Initiative. Environmental Science and Policy, and all quantitative STEM.
3) Mapping the Ocean Floor A team of students led by ECE Professor Martin Brooke will use ambitious data-processing strategies with the hope of providing a low-cost method of mapping the ocean floor. Most of the work will be done in MATLAB on simulated data generated by acoustic methods using different ocean floor models, but the approaches will also be validated using real SONAR data collected by the Marine Lab in Beaufort, NC. Partially funded by the Duke Marine Lab. Environmental Science and Policy, and all quantitative STEM.
4) Open Data for Tobacco Retailer Mapping A team of students will work with UNC Epidemiology researchers to understand how open source data can shed insight on the tobacco retailer environment. Using R and QGIS, students will prototype a process, involving web-scraping and Amazon Mturk, that will ideally result in a state or national level retailer dataset that health departments can use for retailer store audits and researchers can use for published studies. Students will have the opportunity to receive feedback from Counter Tools, a local non-profit connected to UNC tobacco researchers. Partially funded by Counter Tools. Pre-med, Public Health, Social Sciences, and all quantitative STEM.
5) Open Source Spatial Visualization for Public Health Intelligence A team of students will work on small-area health data mapping, in close collaboration with the Durham Neighborhood Compass, as well as with epidemiology researchers from UNC. Students will help transform health data into curated public health information within the Compass website. Working in R, the team will build tools for data visualization and processing, will learn how to craft public-ready narratives about health data, and will perform statistical analyses of the data. Sociology, Global Health, Pre-med, Public Health, Public Policy, as well as all quantitative STEM.
6) Space, Time, Statistics, Mathematics, and Marriage A team of students will work with Sanford professor Christina Gibson-Davis and Mathematics professor Paul Bendich to build statistical and mathematical tools that will help understand how the distributions of marital and non-marital births differ from each other, and how these differences have changed over time. Working mostly in Python and possibly also in R, students will use spatial-statistics techniques, as well as more cutting-edge mathematical approaches involving computational geometry and information theory, to generate metrics and then learn how to evaluate their social-scientific relevance. Partially funded by the Sanford School of Public Policy. Sociology, Global Health, Pre-med, Public Health, Public Policy, as well as all quantitative STEM.
7) Visualizing Suffering: Tracking Photojournalism and the Syrian Refugee Crisis A team of students led by English professor Astrid Giugni and doctoral candidate Jessica Hines will analyze the context and dissemination of images of the Syrian Refugee Crisis. Students will use the Associated Press (AP) Images database - one of the world’s largest collections of historical and contemporary imagery - to identify key features of suffering within images, consider the distribution of images in the political context of news outlets, and create an interactive visualization of image dissemination. English, Cultural Anthropology, Political Science, Journalism, Visual Studies, all social sciences, and all quantitative STEM.
8) Nutrition Dependent Growth in the Laboratory Rat A team of students led by Duke Biology professor Frederik Nijhout, Northeast Ohio Medical University Anatomy and Neurobiology professor Rebecca German, and doctoral candidates Rick Gawne and Kenneth McKenna will study the impact of diet on organ and bone growth in developing laboratory rats. Working in MATLAB and JMP Pro, students will analyze x-ray images and body measurements to study the influence of protein content on growth, and develop a mathematical model that describes growth trade-off strategies rats display when fed a protein-deficient diet. Biology, Pre-med, and all quantitative STEM.
9) Quantifying Rare Diseases in Duke Health System A team of students led by School of Nursing professor and health informaticist Rachel Richesson, faculty from the Social Science Research Institute and the School of Medicine, and advisors from the National Library of Medicine, will leverage electronic medical record (EMR) data to quantify the number of rare diseases treated at Duke University Health System (DUHS) and to estimate the number and characteristics of patients affected by these conditions. Students will identify the spectrum of rare diseases encountered by DUHS over the past 8 years, analyze the healthcare utilization impact of groups of conditions, patient demographics, or clinical profiles, and develop predictive models and visualizations for identifying individual rare conditions in the EMR. Biology, Biostatistics, pre-med, Global Health, Social Sciences, Economics, and all quantitative STEM.
10) Quantifying Phenotypic Evolution during Tumor Growth A team of students led by Duke mathematician Marc Ryser and University of Southern California Pathology professor Darryl Shibata will characterize phenotypic evolution during the growth of human colorectal tumors. Students will perform an in-depth investigation of phenotypic conservation at multiple functional levels in epigenomic methylation data, identify biologically relevant pathways that are preferentially conserved during growth, and may have the opportunity to develop a clinically impactful epigenomic classifier for tumor aggressiveness and patient outcome. Biology, Biostatistics, Genomics, pre-med, and all quantitative STEM.
11) Validating a Topic Model that Predicts Pancreatic Cancer from Latent Structures in the Electronic Medical Record A team of students led by Biomedical Engineering professor Lisa Satterwhite will further the work of a 2016 Data+ team in predictive modeling of pancreatic cancer from electronic medical record (EMR) data. Students will apply and refine previous statistical models predicting pancreatic cancer focusing on patients with new onset diabetes - a top priority risk population for the Pancreatic Cancer Biomarker Consortium of the National Cancer Institute. Students may also have the opportunity to develop new predictive models for gastrointestinal cancers. Biology, Biostatistics, pre-med, and all quantitative STEM.
12) Visualizing Real Time Data from Mobile Health Technologies A team of students led by School of Nursing professor and health informatician Ryan Shaw will create visualizations of time series mobile health data in diabetes patients. Using Tableau and working with real patient data, students will create patient-facing and provider-facing visualizations for accelerometer, glucometer, weight, medication adherence, laboratory test, and patient demographics data. Students will have the opportunity to speak with actual patients who might use these interfaces, and may have the opportunity to use these data to build predictive models to inform care decisions. Biology, Biostatistics, pre-med, and all quantitative STEM.
13) Ghost Bikes A team of students will work with Cultural Anthropology and Global Health professor Harris Solomon to understand cycling-related injuries and deaths in Durham. Using resources from the Durham Neighborhood Compass, students will analyze traffic and commuter patterns, as well as socioeconomic data such as education and housing. They will attempt to draw associations between these data and cycling accidents. Partially sponsored by an NSF CAREER Award and by the Franklin Humanities Institute Health Humanities Lab. Global Health, Cultural Anthropology, Public Policy, Public Health, Social Sciences, all quantitative STEM.
14) Building a Duke SLED (Duke Surgery Longitudinal Education Database) A team of students led by Dr. Shanna Sprinkle of Duke Surgery will combine success metrics of Duke Surgery residents from a set of databases and create a user interface for residency program directors and possibly residents themselves to view and better understand residency program performance. Using MySQL or Oracle, students will access and aggregate an incredible amount of information about Duke general surgery residents including operative case logs, exam scores, and research publications. Education, Pre-med, Biostatistics, and all quantitative STEM.
15) Comparing Exploration of Majors A team of students will initiate a study exploring the trajectories Duke students choose throughout the Undergraduate curriculum. Working closely with officers from Mathematics and Global Health, the team will analyze de-identified data about the histories of students in each major, report on key patterns and correlations, and build interactive network visualization devices that display common and interesting trajectories students have taken through each major. Depending on data availability, students may also be able to undertake a much broader analysis across many other majors at Duke.
16) Quantified Feminism and the Bechdel Test Using data curated by fivethirtyeight.com, a team of students will analyze the relationship betweenb movies that pass/fail the test and various economic and aesthetic factors like the film’s genre and budget. Students will also attempt to modify these criteria to quantify the representations of African-American, Latinx, and LGBTQ populations. There will be an opportunity to move beyond the simple two-character test, creating network analyses of character relationships via Gephi. Humanities, Social Sciences, and all quantitative STEM
17) Controlled Substance Monitoring Visualization A team of students led by Dr. Rebecca Schroeder of Duke Anesthesiology will create an analytics and visualization tool to leverage pharmacy information management data to identify controlled substance accounting patterns among anesthesia providers consistent with an elevated risk of diversion for personal or alternate use. Using Tableau, students will create a visual interface for de-identified controlled substance administration data from Duke anesthesiology, and apply mathematical and statistics approaches to detect discrepancies between drugs issued and drugs administered, changes in patterns of drug issuance, and unusual administration patterns. Biology, Biostatistics, Psychology, pre-med, and all quantitative STEM.
18) Classification of Vascular Anomalies A team of students led by Duke Civil and Environmental Engineering professor Wilkins Aquino and Duke Vascular Surgeon Leila Mureebe will apply machine learning algorithms to Continuous Wave Doppler (CWD) ultrasound data to classify vasculature of patients with trauma and acute limb ischemia (ALI). Using MATLAB, students will compare various machine learning algorithms for differentiating patients as healthy or unhealthy based on sound data acquired using hand-held Doppler probes, and extend an existing graphical user interface to incorporate training and testing with different algorithms and intuitive visualizations for clinicians.
19) Mental Health Interventions by Durham Police A team of students lead by Duke Institute for Brain Sciences faculty Nicole Schramm-Sapyta will provide analytical consulting support to the Durham Crisis Intervention Team (CIT) Collaborative, a county-wide effort to provide law enforcement and other first responders with specialized training in mental illness and crisis intervention techniques. Working with 911 call data and de-identified incident reports from the Durham Police Department, the team will suggest ways to quantitatively and qualitatively assess the effectiveness of the CIT unit, suggest potential improvements in offered services, and provide advice about future data gathering practices.
20) MyHealth Teams Data Exploration and Visualization A team of students led by Biostatistics and Bioinformatics professor Jessie Tenenbaum will curate and visualize self-reported patient medication data from a social health community site. Students will create a visualization tool showing medication use over time, enabling insights regarding prescription prevalence of different drugs, alternative drugs, and prescription variability over time. Students may also have the opportunity to develop predictive models for response to medication.
21) Alumni Gifts and Data Analysis Building off the work of a 2016 Data+ team, students will investigate commonalities and distinctions in alumni gifts, and attempt to understand and predict motivations for gifts of different types. They will also construct mathematical models to evaluate different strategies for alumni engagement. Students will have the opportunity to consult with the Prospect Research, Management and Analytics team in the development office. Sponsored by Alumni Affairs and Development at Duke. Economics, Psychology, all quantitative STEM.
22) Digital Rejuvenation of Medieval Paintings A team of students led by Mathematics professor Ingrid Daubechies will explore the feasibility of building an app that museum visitors could use to virtually rejuvenate paintings in museums. The app would require photographs taken by the user in the museum, as well as high resolution images provided by the museum website. The app would allow user-assisted image manipulation and interaction with curators and/or conservators at the museum, and provide a platform to which users would upload intermediate results. This project is inspired by the experience of the Duke Bass Connections team that developed the virtual rejuvenation of the Ghissi altarpiece, presently on display at the North Carolina Museum of Art. Art History, Visual Studies, Social Sciences, all quantitative STEM.
23) Understanding Duke Research Based on Large-Scale Faculty Publication Records Leveraging the Scholars@Duke database, which summarizes all publications of all Duke faculty for the last five years, a team of students will work with faculty and staff from the Duke Network Analysis Center and the Office of the Provost to identify and visualize intellectual communities that exist across the university. Using publication names, lists of authors, publication forums, dates of publication, and abstracts, students will have the opportunity to characterize and visualize the landscape of research topics at Duke and their interconnections through faculty publications. This project will help illustrate areas of research with deep strength, as well as more shallow areas of research potentially in need of further university investment.
24) Online Financial Behavior and the Internet of Things A team of students led by ECE professor and iiD director Robert Calderbank will spend ten weeks understanding how insights from "the internet of things" (IOT) can help shed light on analyses of financial decision-making. The team will use a large set of transactional, demographic, and behavioral data supplied by TD Bank, and will propose several potential sources of IOT data and the best approach to merge data sources, as well as propose use cases within the FI sector. There will be opportunities to discuss findings with TD Bank leadership. Economics, Psychology, Environmental Science, Sociology, all quantitative STEM.
25) Analytics for Faculty Success Despite a multitude of opinions about the state of academia, comprehensive studies of hiring, publication, and grant-receiving among academics have only recently begun to emerge. This project will use data on the career histories and academic accomplisments of faculty from across the United States to rigorously investigate the characteristics of successful faculty--measured in multiple ways--and thus contribute empirical knowledge to the debate on faculty success. Students will work with both internal and external stakeholders to develop written and analytical products that address these issues.