Examples of Integrating Data Science and Ethnography Archives

Data Scientist, Anthropologist, and Entrepreneur: Interview with Schaun Wheeler (Interview #2 in the Interview Series)

For my second interview in the Interview Series, I interviewed Schaun Wheeler. Schaun is co-founder of Aampe, a startup that embeds an active learning system into mobile apps to turn push notifications into part of the app’s user interface. Before he co-founded Aampe, Schaun was the data science lead for the award-winning Consumer Graph intelligence product at Valassis, a U.S. ad-tech firm. And before that he founded and directed the data science team at Success Academy Charter Schools in New York City. Then before that, Schaun was one of the first people to champion the use of statistical inference to understand massive unstructured data at the United States Department of the Army. Schaun has a Ph.D. in Cultural Anthropology from the University of Connecticut.

If the audio does not play on your computer, you can download it here:

Schaun-Interview-Audio Download

Over our conversation, we discussed the following:

Schaun’s experiences as both a data scientist and anthropologist
His utilization of anthropology within data science to decipher the right problem before launching into data science solutions
Recommendations for how anthropologists can develop data science and programming skills
His experiences starting a new data science consumer and market-research based company

To learn more about Schaun Wheeler and Aampe, check these out:

LinkedIn (the best way to contact him): https://www.linkedin.com/in/schaunwheeler/

Medium: https://medium.com/@schaun.wheeler

Twitter: https://twitter.com/schaunw

Aampe website: https://www.aampe.com/

Aampe blog: https://www.aampe.com/blog

A User Story, The Data Science Children’s Book: https://www.aampe.com/blog/a-user-story

More Detailed Walkthrough: Clip #1: https://www.youtube.com/playlist?list=PL03WDMCL2PHjRd8Y8USzvVkcIyQM57FMU and Clip #2: https://youtu.be/kwk_Ot8orPY

Previous Interview in the Interview Series: https://ethno-data.com/astrid-interview-1/

Resources on Integrating Data Science and Ethnography

Here is a list of resources about integrating data science and ethnography. Even though it is an up and coming field without a consistent list of publications, several fascinating and insightful resources do exist.

If there are any resources about integrating data science and ethnography that you have found useful, feel free to share them as well.

General Overviews:

Curran, John. “Big Data or ‘Big Ethnographic Data’? Positioning Big Data within the Ethnographic Space.” EPIC (2013). (Found here: https://www.epicpeople.org/big-data-or-big-ethnographic-data-positioning-big-data-within-the-ethnographic-space/)
Patel, Neal. “For a Ruthless Criticism of Everything Existing: Rebellion Against the Quantitative-Qualitative Divide.” EPIC (2013): 43-60.
Nick Seaver. “Bastard Algebra.” Boellstorff, Tom and Bill Maurer. Data, Now Bigger and Better. Chicago: Prickly Paradigm Press, 2015. 27-46.
Slobin, Adrian and Todd Cherkasky. “Ethnography in the Age of Analytics.” EPIC (2010).
Nafus, Dawn and Tye Rattenbury. Data Science and Ethnography: What’s Our Common Ground, and Why Does It Matter? 7 3 2018. <https://www.epicpeople.org/data-science-and-ethnography/>.
Nick Seaver. “The nice thing about context is that everyone has it.” Media, Culture & Society (2015).

Books:

Nafus, Dawn and Hannah Knox. Ethnography for a Data-Saturated World. Manchester: Manchester Univeristy Press, 2018.
Boellstorff, Tom and Bill Maurer. Data, Now Bigger and Better! Chicago: Prickly Paradigm Press, 2015.
Mackenzie, Adrian. Machine Learners: Archaeology of a Data Practice. Cambridge: The MIT Press, 2017.

Examples and Case Studies:

“Autonomous Drive: Teaching Cars Human Behaviour” by Melissa Cefkin on the Youtube Channel DrivingTheNation: https://www.youtube.com/watch?v=6koKuDegHAM
Eslami, Motahhare, et al. “First I “like” it, then I hide it: Folk Theories of Social Feeds.” Curation and Algorithms (2016).
Giaccardi, Elisa, Chris Speed and Neil Rubens. “Things Making Things: An Ethnography of the Impossible.” (2014).

Elish, M. “The Stakes of Uncertainty: Developing and Integrating Machine Learning in Clinical Care.” EPIC (2018).
Madsen, Matte My, Anders Blok and Morten Axel Pedersen. “Transversal collaboration: an ethnography in/of computational social science.” Nafus, Dawn. Ethnography for a Data-saturated World. Manchester: Manchester Univeristy Press, 2018.
Thomas, Suzanne, Dawn Nafus and Jamie Sherman. “Algorithms as fetish: Faith and possibility in algorithmic work.” Big Data & Society (2018): 1-11.

Articles and Blog Posts:

“An Engineering Anthropologist: Why tech companies need to hire software developers with ethnographic skills” by Astrid Countee: http://ethnographymatters.net/blog/2016/06/22/an-engineering-anthropologist-why-tech-companies-need-to-hire-software-developers-with-ethnographic-skills/
“Cross-disciplinary Insights Teams: Integrating Data Scientists and User Researchers at Spotify” by Sara Belt and Peter Gilks: https://www.epicpeople.org/cross-disciplinary-insights-teams-integrating-data-scientists-and-user-researchers-at-spotify/
“Data is a stakeholder” by Schaun Wheeler: https://towardsdatascience.com/data-is-a-stakeholder-31bfdb650af0
“Why Big Data Needs Thick Data” by Tricia Wang: https://medium.com/ethnography-matters/why-big-data-needs-thick-data-b4b3e75e3d7

My Own Articles on This Website:

Podcasts and Lectures:

“Computational Anthropology: Quali-quantitative Analyses of Attention Economies during the Covid-19 Lockdown” by Morten Axel Pedersen: https://www.material.city/recordings/mortenaxelpedersen
“Human-Driven Machine Learning with Saleema Amershi”: https://datastori.es/115-human-driven-machine-learning-with-saleema-amershi/#t=29:00.204
“Welcome to Dataworld, by Alexander Taylor”: https://player.fm/series/camthropod/episode-13-welcome-to-dataworld-by-alex-taylor
“Machine Learning for Artists with Gene Kogan”: https://datastori.es/114-machine-learning-for-artists-with-gene-kogan/#t=34:28.738

Ethical Considerations:

“Caroline Sinders on Ethical Product Design for Machine Learning”: https://design.blog/2017/03/23/caroline-sinders-on-ethical-product-design-for-machine-learning/
“The Trouble with Bias” by Kate Crawford: https://www.youtube.com/watch?v=fMym_BKWQzk
“Justice for ‘Data Janitors’” by Lilly Irani: http://www.publicbooks.org/justice-for-data-janitors/
Elish, Madeleine. “Moral Crumple Zones: Cautionary Tales in Human-Robot Interaction.” Engaging Science, Technology, and Society (2019).
boyd, danah and Kate Crawford. “Critical Questions for Big Data: Provocations for a Cultural, Technological, and Scholarly Phenomenon.” Information, Communication, & Society (2012): 662-679.

Four Innovative Projects that Integrated Data Science and Ethnography

In a previous article, I have discussed the value of integrating data science and ethnography. On LinkedIn, people commented that they were interested and wanted to hear more detail on potential ways to do this. I replied, “I have found explaining how to conduct studies that integrate the two practically is easier to demonstrate through example than abstractly since the details of how to do it vary based on the specific needs of each project.”

In this article, I intend to do exactly that: analyze four innovative projects that in some way integrated data science and ethnography. I hope these will spur your creative juices to help think through how to creatively combine them for whatever project you are working on.

Synopsis:

Project:	How It Integrated Data Science and Ethnography:	Link to Learn More:
No Show Model	Used ethnography to design machine learning software	https://ethno-data.com/show-rate-predictor/
Cybersensitivity Study	Used machine learning to scale up the scope of an ethnographic inquiry to a larger population	https://ethno-data.com/masters-practicum-summary/
Facebook Newsfeed Folk Theories	Used ethnography to understand how users make sense of and behave towards a machine learning system they encounter and how this, in turn, shapes the development of the machine learning algorithm(s)	https://dl.acm.org/doi/10.1145/2858036.2858494
Thing Ethnography	Used machine learning to incorporate objects’ interactions into ethnographic research	https://dl.acm.org/doi/10.1145/2901790.2901905 and https://www.semanticscholar.org/paper/Things-Making-Things%3A-An-Ethnography-of-the-Giaccardi-Speed/2db5feac9cc743767fd23aeded3aa555ec8683a4?p2df

Project 1: No Show Model

A medical clinic at a hospital system in New York City asked me to use machine learning to build a show rate predictor in order to inform an improve its scheduling practices. During the initial construction phase, I used ethnography to both understand in more depth understand the scheduling problem the clinic faced and determine an appropriate interface design.

Through an ethnographic inquiry, I discovered the most important question(s) schedulers ask when scheduling their appointments. This was, “Of the people scheduled for a given doctor on a particular day, how many of them are likely to actually show up?” I then built a machine learning model to answer this exact question. My ethnographic inquiry provided me the design requirements for the data science project.

In addition, I used my ethnographic inquiries to design the interface. I observed how schedulers interacted with their current scheduling software, which gave me a sense for what kind of visualizations would work or not work for my app.

This project exemplifies how ethnography can be helpful both in the development stage of a machine learning project to determine machine learning algorithm(s) needs and on the frontend when communicating the algorithm(s) to and assessing its successfulness with its users.

As both an ethnographer and a data scientist, I was able to translate my ethnographic insights seamlessly into machine learning modeling and API specifications and also conducted follow-up ethnographic inquiries to ensure that what I was building would meet their needs.

Project 2: Cybersensitivity Study

I conducted this project with Indicia Consulting. Its goal was to explore potential connections between individuals’ energy consumption and their relationship with new technology. This is an example of using ethnography to explore and determine potential social and cultural patterns in-depth with a few people and then using data science to analyze those patterns across a large population.

We started the project by observing and interviewing about thirty participants, but as the study progressed, we needed to develop a scalable method to analyze the patterns across whole communities, counties, and even states.

Ethnography is a great tool for exploring a phenomenon in-depth and for developing initial patterns, but it is resource-intensive and thus difficult to conduct on a large group of people. It is not practical for saying analyzing thousands of people. Data science, on the other hand, can easily test the validity across an entire population of patterns noticed in smaller ethnographic studies, yet because it often lacks the granularity of ethnography, would often miss intricate patterns.

Ethnography is also great on the back end for determining whether the implemented machine learning models and their resulting insights make sense on the ground. This forms a type of iterative feedback loop, where data science scales up ethnographic insights and ethnography contextualizes data science models.

Thus, ethnography and data science cover each other’s weaknesses well, forming a great methodological duo for projects centered around trying to understand customers, users, colleagues, or other users in-depth.

Project 3: Facebook Newsfeed Folk Theories

In their study, Motahhare Eslami and her team of researchers conducted an ethnographic inquiry into how various Facebook users conceived of how the Facebook Newsfeed selects which posts/stories rise to the top of their feeds. They analyze several different “folk theories” or working theories by everyday people for the criteria this machine learning system uses to select top stories.

How users think the overall system works influences how they respond to the newsfeed. Users who believe, for example, that the algorithm will prioritize the posts of friends for whom they have liked in the past will often intentionally like the posts of their closest friends and family so that they can see more of their posts.

Users’ perspectives on how the Newsfeed algorithm works influences how they respond to it, which, in turn, affects the very data the algorithm learns from and thus how the algorithm develops. This creates a cyclic feedback loop that influences the development of the machine learning algorithmic systems over time.

Their research exemplifies the importance of understanding how people think about, respond to, and more broadly relate with machine learning-based software systems. Ethnographies into people’s interactions with such systems is a crucial way to develop this understanding.

In a way, many machine learning algorithms are very social in nature: they – or at least the overall software system in which they exist – often succeed or fail based on how humans interact with them. In such cases, no matter how technically robust a machine learning algorithm is, if potential users cannot positively and productively relate to it, then it will fail.

Ethnographies into the “social life” of machine learning software systems (by which I mean how they become a part of – or in some cases fail to become a part of – individuals’ lives) helps understand how the algorithm is developing or learning and determine whether they are successful in what we intended them to do. Such ethnographies require not only in-depth expertise in ethnographic methodology but also an in-depth understanding how machine learning algorithms work to in turn understand how social behavior might be influencing their internal development.

Project 4: Thing Ethnography

Elise Giaccardi and her research team have been pioneering the utilization of data science and machine learning to understand and incorporate the perspective of things into ethnographies. With the development of the internet of things (IOT), she suggests that the data from object sensors could provide fresh insights in ethnographies of how humans relate to their environment by helping to describe how these objects relate to each other. She calls this thing ethnography.

This experimental approach exemplifies one way to use machine learning algorithms within ethnographies as social processes/interactions in of themselves. This could be an innovative way to analyze the social role of these IOT objects in daily life within ethnographic studies. If Eslami’s work exemplifies a way to graft ethnographic analysis into the design cycle of machine learning algorithms, Giaccardi’s research illustrates one way to incorporate data science and machine learning analysis into ethnographies.

Conclusion

Here are four examples of innovative projects that involve integrating data science and ethnography to meet their respective goals. I do not intend these to be the complete or exhaustive account of how to integrate these methodologies but as food for thought to spur further creative thinking into how to connect them.

For those who, when they hear the idea of integrating data science and ethnography, ask the reasonable question, “Interesting but what would that look like practically?”, here are four examples of how it could look. Hopefully, they are helpful in developing your own ideas for how to combine them in whatever project you are working on, even if its details are completely different.

Photo credit #1: StartupStockPhotos at https://pixabay.com/photos/startup-meeting-brainstorming-594090/

Photo credit #2: DarkoStojanovicat at https://pixabay.com/photos/medical-appointment-doctor-563427/

Photo credit #3: NASA at https://unsplash.com/photos/Q1p7bh3SHj8

Photo credit #4: Kon Karampelas at https://unsplash.com/photos/HUBofEFQ6CA

Photo credit #5: Pixabay at https://www.pexels.com/photo/app-business-connection-device-221185/

Using Data Science and Ethnography to Build a Show Rate Predictor

I recently integrated ethnography and data science to develop a Show Rate Predictor for an (anonymous) hospital system. Many readers have asked for real-world examples of this integration, and this project demonstrates how ethnography and data science can join to build machine learning-based software that makes sense to users and meets their needs.

Part 1: Scoping out the Project

A particular clinic in the hospital system was experiencing a large number of appointment no-shows, which produced wasted time, frustration, and confusion for both its patients and employees. I was asked to use data science and machine learning to better understand and improve their scheduling.

I started the project by conducting ethnographic research into the clinic to learn more about how scheduling occurs normally, what effect it was having on the clinic, and what driving problems employees saw. In particular, I observed and interviewed scheduling assistants to understand their day-to-day work and their perspectives on no-shows.

One major lesson I learned through all this was that when scheduling an appointment, schedulers are constantly trying to determine how many people to schedule on a given doctor’s shift to ensure the right number of people show up. For example, say 12-14 patients is a good number of patients for Dr. Rodriguez’s (made up name) Wednesday morning shift. When deciding whether to schedule an appointment for the given patient with Dr. Rodriguez on an upcoming Wednesday, the scheduling assistants try to determine, given the appointments currently scheduled then, whether they can expect 12-14 patients to show up. This was often an inexact science. They would often have to schedule 20-25 patients on a particular doctor’s shift to ensure their ideal window of 12-14 patients would actually come that day. This could create the potential for chaos, however, where too many patients arriving on some days and too few on others.

This question – how many appointments can we expect or predict to occur on a given doctor’s shift – became my driving question to answer with machine learning. After checking in with the various stakeholders at the clinic to make sure this was in fact an important and useful question to answer with machine learning, I started building.

Part 2: Building the Model

Now that I had a driving, answerable question, I decided to break it down into two sequential machine learning models:

The first model learned to predict the probability that a given appointment would occur, learning from the history of occurring or no-show appointments.
The second model, using the appointment probabilities from the first model, estimated how many appointments might occur for every doctors’ shift.

The first model combined three streams of data to assess the no-show probability: appointment data (such as how long ago it was scheduled, type of appointment, etc.); patient information, especially past appointment history; and doctor information. I performed extensive feature selection to determine the best subset of variables to use and tested several types of machine learning models before settling on gradient boosting.

The second model used the probabilities in the first model as input data to predict how many patients to expect to come on each doctors’ shift. I settled on a neural network for the model.

Part 3: Building an App

Next, I worked with the software engineers on my team to develop an app to employ these models in real time and communicate the information to schedulers as they scheduled appointments. My ethnographic research was invaluable for developing how to construct the app.

On the back end, the app calculated the probability that all future appointments would occur, updating with new calculations for newly scheduled or edited appointments. Once a week, it would incorporate that week’s new appointment data and shift attendance to each model’s training data and update those models accordingly.

Through my ethnographic research, I observed how schedulers approached scheduling appointments, including what software they used in the process and how they used each. I used that to determine the best ways to communicate that information, periodically showing my ideas to the schedulers to make sure my strategy would be helpful.

I constructed an interface to communicate the information that would complement the current software they used. In addition to displaying the number of patients expected to arrive, if the machine learning algorithm was predicting that a particular shift was underbooked, it would mark the shift in green on the calendar interface; yellow if the shift was projected to have the ideal number of patients, and red if already expected have too many patients. The color-coding allowed easy visualization of the information in the moment: when trying to find an appointment time for a patient, they could easily look for the green shifts or yellow if they had to, but steer clear of the red. When zooming in on a specific shift, each appointment would be color-coded (likely, unlikely, and in the middle) as well based on the probability that it would occur.

Conclusion

This is one example of a projects that integrates data science and ethnography to build a machine learning app. I used ethnography to construct the app’s parameters and framework. It tethered the app in the needs of the schedulers, ensuring that the machine learning modeling I developed was useful to those who would use it. Frequent check-ins before each step in their development also helped confirm that my proposed concept would in fact help meet their needs.

My data science and machine learning expertise helped guide me in the ethnographic process as well. Being an expert in how machine learning worked and what sorts of questions it could answer allowed me to easily synthesize the insights from my ethnographic inquiries into buildable machine learning models. I understood what machine learning was capable (and not capable) of doing, and I could intuitively develop strategic ways to employ machine learning to address issues they were having.

Hence, my dual role as an ethnography and data scientist benefitted the project greatly. My listening skills from ethnography enabled me to uncover the underlying questions/issues schedulers faced, and my data science expertise gave me the technical skills to develop a viable machine learning solution. Without listening patiently through extensive ethnography, I would not have understood the problem sufficiently, but without my data science expertise, I would have been unable to decipher which questions(s) or issue(s) machine learning could realistically address and how.

This exemplifies why a joint expertise in data science and ethnography is invaluable in developing machine learning software. Two different individuals or teams could complete each separately – an ethnographer(s) analyze the users’ needs and a data scientist(s) then determine whether machine learning modeling could help. But this seems unnecessarily disjointed, potentially producing misunderstanding, confusion, and chaos. By adding an additional layer of people, it can easily lead to either the ethnographer(s) uncovering needs way too broad or complex for a machine learning-based solution to help or the data scientist(s) trying to impose their machine learning “solution” to a problem the users do not have.

Developing expertise in both makes it much easier to simultaneously understand the problems or questions in a particular context and build a doable data science solution.

Photo credit #1: DarkoStojanovic at https://pixabay.com/photos/medical-appointment-doctor-563427/

Photo credit #2: geralt at https://pixabay.com/illustrations/time-doctor-doctor-s-appointment-481445/

Photo credit #3: Pixabay at https://www.pexels.com/photo/light-road-red-yellow-46287/

In a Helicopter Overlooking the Wildfire: A Data Science Perspective in A Frontline Hospital During the Covid-19 Pandemic

I worked as a data scientist at a hospital in New York City during the worst of the covid-19 pandemic. Over the spring and summer, we became overwhelmed as the city turned into (and left) the global hotspot for covid-19. I have been processing everything that happened since.

The pandemic overwhelmed the entire hospital, particularly my physician colleagues. When I met with them, I could often notice the combined effects of physical and emotional exhaustion in their eyes and voices. Many had just arrived from the ICU, where they had spent several hours fighting to keep their patients alive only to witness many of them die in front of them, and I could sense the emotional toll that was taking.

My experiences of the pandemic as a data scientist differed considerably yet were also exhausting and disturbing in their own way. I spent several months day-in and day-out researching how many of our patients were dying from the pandemic and why: trying to determine what factors contributed to their deaths and what we could do as a hospital to best keep people alive. The patient who died the night before in front of the doctor I am currently meeting with became, for me, one a single row in an already way-too-large data table of covid-19 fatalities.

I felt like a helicopter pilot overlooking an out-of-control wildfire.[1] In such wildfires, teams of firefighters (aka doctors) position themselves at various strategic locations on the ground to push back the fire there as best they can. They experience the flames and carnage up close and personal. My placement in the helicopter, on the other hand, removes me from ground zero, instead forcing me to see and analyze the fire in its entirety and its sweeping and massive destruction across the whole forest. My vantage point provides a strategic vantage point to determine the best ways to fight it, shielding me from the immediate destruction. Nevertheless, witnessing the vastness of the carnage from the air had its own challenges, stress, and emotional toll.

Being an anthropologist by training, I am accustomed to being “on the ground.” Anthropology is predicated on the idea that to understand a culture or phenomena, one must understand the everyday experiences of those on the ground amidst it, and my anthropological training has instilled an instinct to go straight to and talk to those in the thick of it.

Yet, this experience has taught me that that perception is overly simplistic: the so-called “ground” has many layers to it, especially for a complex phenomenon like a pandemic. Being in the helicopter is another way to be in the thick of it just as much as standing before the flames.

Many in the United States have made considerable and commendable efforts to support frontline health workers. Yet, as the pandemic progresses, and its societal effects grow in complexity in the coming months I think we need to broaden our understanding of where the “frontlines” are and who a “frontline worker” is worthy of our support.

In actual battlefields where the “frontline” metaphor comes from, militaries also set up layered teams to support the logistical needs of ground soldiers who also must frequently put themselves in harm’s way in the process. The frontline of this pandemic seems no different.

I think we need to expand our conceptions of what it means to be on the frontlines accordingly. Like anthropology, modern journalism, a key source of pandemic information for many of us, can fall into the issue of overfocusing on the “worst of the worst,” potentially ignoring the broader picture and the diversity of “frontline” experiences. For example, interviewing the busiest medical caregivers in the worst affected hospitals in the most affected places in the world likely does promote viewership, but only telling those stories ignores the experiences and sacrifices of thousands of others necessary to keep them going.

To be clear, in this blog, I do not personally care about acknowledgement of my own work nor do I think we should ignore the contributions of these medical professional “ground troops” in any way. Rather, in the spirt of “yes and,” we should extend our understanding of the “frontline workers” to acknowledge and celebrate the contributions of many other essential professionals during this crisis, such as transportation services, food distribution, postal workers, etc. I related my own experiences as a data scientist because they helped me learn this, not for any desire for recognition.

This might help us appreciate the complexity of this crisis and its social effects, and the various types of sacrifices people have been making to address it. As it is becoming increasingly clear that this pandemic is not likely to go anywhere anytime soon, appreciating the full extent of both could help us come together to buckle down and fight it.

[1] This video helped me understand the logistics of fighting wildfires, a fascinating topic in itself: https://www.youtube.com/watch?v=EodxubsO8EI. Feel free to check it out to understand my analogy in more depth.

Photo Credit #1: ReinhardThrainer at https://pixabay.com/photos/fire-forest-helicopter-forest-fire-5457829/

Photo Credit #2: Pixabay at https://www.pexels.com/photo/backlit-breathing-apparatus-danger-dangerous-279979/

Photo Credit #3: Pixabay at https://www.pexels.com/photo/scenic-view-of-rice-paddy-247599/

Interdisciplinary Anthropology and Data Science Master’s Thesis: A Quick and Dirty Project Summary

This is a quick and dirty summary of my master’s practicum research project with Indicia Consulting over the summer of 2018. For anyone interested in more detail, here is a more detailed report, and here is the final report with Indicia.

Background

My practicum was the sixth stage of a several year-long research project. The California Energy Commission commissioned this larger project to understand the potential relationship between individual energy consumption and technology usage. In stages one through five, we isolated certain clusters of behavior and attitudes around new technology adoption – which Indicia called cybersensitivity – and demonstrated that cybersensitivity tended to associate with a willingness to adopt energy-saving technology like smart meters.

This led to a key question: How can one identify cybersensivity among a broader population such as a community, county, or state? Answering this question was the main goal of my practicum project.

In the past stages of the research project, the team used ethnographic research to establish criteria for whether someone was a cybersensitive based on several hours of interviews and observations about their technology usage. These interviews and observations certainly helped the research team analyze behavioral and attitudinal patterns, determine what patterns were significant, and develop those into the concept of cybersensitivity, but they are too time- and resource-intensive to perform with an entire population. One generally does not have the ability to interview everyone in a community, county, or state. I sought to address this directly in my project.

Task	Timeline	Task Name	Research Technique	Description
Task 1	June 2015-Sept 2018	General Project Tasks	Administrative (N/A)	Developed project scope and timeline, adjusting as the project unfolds
Task 2	July 2015 – July 2016	Documenting and analyzing emerging attitudes, emotions, experiences, habits, and practices around technology adoption	Survey	Conducted survey research to observe patterns of attitudes and behaviors among cybersensitives/awares.
Task 3	Sept 2016 – Dec 2016	Identifying the attributes and characteristics and psychological drivers of cybersensitives	Interviews and Participant-Observation	Conducted in-depth interviews and observations coding for psych factor, energy consumption attitudes and behaviors, and technological device purchasing/usage.
Task 4*	Sept 2016 – July 2017	Assessing cybersensitives’ valence with technology	Statistical Analysis	Tested for statistically significant differences in demographics, behaviors, and beliefs/attitudes between cyber status groups
Task 5	Aug 2017 – Dec 2018	Developing critical insights for supporting residential engagement in energy efficient behaviors	Statistical Analysis	Analyzed utility data patterns of study participants, comparing it with the general population.
Task 6	March 2018 – Aug 2018	Recommending an alternative energy efficiency potential model	Decision Tree Modeling	Constructed decision tree models to classify an individual’s cyber status

Project Goal

The overall goal for the project was to produce a scalable method to assess whether someone exhibits cybersensitivity based on data measurable across an entire population. In doing this, the project also helped address the following research needs:

Created a method to further to scale across a larger population, assessing whether cybersensitives were more willing to adopt energy saving technologies across a community, county, or state
Provided the infrastructure to determine how much promoting energy-saving campaigns targeting cybersensitives specifically would reduce energy consumption in California
Helped the California Energy Commission determine the best means to reach cybersensitives for specific energy-saving campaigns

The Project

I used machine learning modeling to create a decision-making flow to isolate cybersensitives in a population. Random forests and decision trees produced the best models for Indicia’s needs: random forests in accuracy and robustness and decision trees in human decipherability. Through them, I created a programmable yet human-comprehensible framework to determine whether an individual is cybersensitive based on behaviors and other characteristics that an organization could be easily assess within a whole population. Thus, any energy organization could easily understand, replicate, and further develop the model since it was both easy for humans to read and encodable computationally. This way organizations could both use and refine it for their purposes.

Conclusion

This is a quick overview of my master’s practicum project. For more details on what modeling I did, how I did it, what results it produced, and how it fit within the wider needs of the multi-year research project, please see my full report.

I really appreciated the opportunity it posed to get my hands dirty integrating ethnography and data science to help address a real-world problem. This summary only scratches the surface of what Indicia did with the Californian Energy Commission to encourage sustainable energy usage societally. Hopefully, though, it will inspire you to integrate ethnography and data science to address whatever complex questions you face. It certainly did for me.

Thank you to Susan Mazur-Stommen and Haley Gilbert for your help in organizing and completing the project. I would like to thank my professorial committee at the University of Memphis – Dr. Keri Brondo, Dr. Ted Maclin, Dr. Deepak Venugopal, and Dr. Katherine Hicks – for their academic support as well.

Anthropology by Data Science: The EPIC Project with Indicia Consulting as an Exploratory Case Study

This is my practicum report with Indicia Consulting. In lieu of a master’s thesis, the University of Memphis Department of Anthropology required that we master’s students conduct a practicum project. For this, we had to partner with an organization and complete a 300+ hour anthropological research project based on the organization’s needs and our skills and interests. My practicum project was Indicia’s EPIC Project with the California Energy Commission (see this link and this link for more details on the EPIC Project). In this report, I outline potential ways to integrate ethnographic/anthropological and data science research in professional settings.

In November 2019, the American Anthropological Association’s Committee for the Anthropology of Science, Technology, and Computing (CASTAC) awarded me the David Hakken Graduate Student Prize for innovative science and technology scholarship.

Full Report:

Loading…

Taking too long?

Reload document

Open in new tab

Download [1.56 MB]

The Anthropology Department also required that you publicly present your practicum research to the University of Memphis campus. This PowerPoint summarizes my practicum project. If you are not keen to read the 99 page full report, this is a much shorter alternative:

Download [1.05 MB]

If you are interested in learning more about the project, please check out the following:

Indicia Consulting’s Final Research Report with the California Energy Commission
My Presentation at the 2019 Memphis Data Conference for Data Scientists Specifically

Share this:

Share this:

Synopsis:

Project 1: No Show Model

Project 2: Cybersensitivity Study

Project 3: Facebook Newsfeed Folk Theories

Project 4: Thing Ethnography

Conclusion

Share this:

Part 1: Scoping out the Project

Part 2: Building the Model

Part 3: Building an App

Conclusion

Share this:

Share this:

Background

Project Goal

The Project

Conclusion

Share this:

Share this: